This is the multi-page printable view of this section. Click here to print.
Queries
- 1: Aggregation functions
- 1.1: Aggregation Functions
- 1.2: arg_max() (aggregation function)
- 1.3: arg_min() (aggregation function)
- 1.4: avg() (aggregation function)
- 1.5: avgif() (aggregation function)
- 1.6: binary_all_and() (aggregation function)
- 1.7: binary_all_or() (aggregation function)
- 1.8: binary_all_xor() (aggregation function)
- 1.9: buildschema() (aggregation function)
- 1.10: count_distinct() (aggregation function) - (preview)
- 1.11: count_distinctif() (aggregation function) - (preview)
- 1.12: count() (aggregation function)
- 1.13: countif() (aggregation function)
- 1.14: dcount() (aggregation function)
- 1.15: dcountif() (aggregation function)
- 1.16: hll_if() (aggregation function)
- 1.17: hll_merge() (aggregation function)
- 1.18: hll() (aggregation function)
- 1.19: make_bag_if() (aggregation function)
- 1.20: make_bag() (aggregation function)
- 1.21: make_list_if() (aggregation function)
- 1.22: make_list_with_nulls() (aggregation function)
- 1.23: make_list() (aggregation function)
- 1.24: make_set_if() (aggregation function)
- 1.25: make_set() (aggregation function)
- 1.26: max() (aggregation function)
- 1.27: maxif() (aggregation function)
- 1.28: min() (aggregation function)
- 1.29: minif() (aggregation function)
- 1.30: percentile(), percentiles()
- 1.31: percentilew(), percentilesw()
- 1.32: stdev() (aggregation function)
- 1.33: stdevif() (aggregation function)
- 1.34: stdevp() (aggregation function)
- 1.35: sum() (aggregation function)
- 1.36: sumif() (aggregation function)
- 1.37: take_any() (aggregation function)
- 1.38: take_anyif() (aggregation function)
- 1.39: tdigest_merge() (aggregation functions)
- 1.40: tdigest() (aggregation function)
- 1.41: variance() (aggregation function)
- 1.42: varianceif() (aggregation function)
- 1.43: variancep() (aggregation function)
- 2: Best practices for KQL queries
- 3: Data types
- 3.1: Null values
- 3.2: Scalar data types
- 3.3: The bool data type
- 3.4: The datetime data type
- 3.5: The decimal data type
- 3.6: The dynamic data type
- 3.7: The guid data type
- 3.8: The int data type
- 3.9: The long data type
- 3.10: The real data type
- 3.11: The string data type
- 3.12: The timespan data type
- 4: Entities
- 4.1: Columns
- 4.2: Databases
- 4.3: Entities
- 4.4: Entity names
- 4.5: Entity references
- 4.6: External tables
- 4.7: Fact and dimension tables
- 4.8: Stored functions
- 4.9: Tables
- 4.10: Views
- 5: Functions
- 5.1: bartlett_test_fl()
- 5.2: binomial_test_fl()
- 5.3: comb_fl()
- 5.4: dbscan_dynamic_fl()
- 5.5: dbscan_fl()
- 5.6: detect_anomalous_new_entity_fl()
- 5.7: factorial_fl()
- 5.8: Functions
- 5.9: Functions library
- 5.10: geoip_fl()
- 5.11: get_packages_version_fl()
- 5.12: kmeans_dynamic_fl()
- 5.13: kmeans_fl()
- 5.14: ks_test_fl()
- 5.15: levene_test_fl()
- 5.16: log_reduce_fl()
- 5.17: log_reduce_full_fl()
- 5.18: log_reduce_predict_fl()
- 5.19: log_reduce_predict_full_fl()
- 5.20: log_reduce_train_fl()
- 5.21: mann_whitney_u_test_fl()
- 5.22: normality_test_fl()
- 5.23: pair_probabilities_fl()
- 5.24: pairwise_dist_fl()
- 5.25: percentiles_linear_fl()
- 5.26: perm_fl()
- 5.27: plotly_anomaly_fl()
- 5.28: plotly_gauge_fl()
- 5.29: plotly_scatter3d_fl()
- 5.30: predict_fl()
- 5.31: predict_onnx_fl()
- 5.32: quantize_fl()
- 5.33: series_clean_anomalies_fl()
- 5.34: series_cosine_similarity_fl()
- 5.35: series_dbl_exp_smoothing_fl()
- 5.36: series_dot_product_fl()
- 5.37: series_downsample_fl()
- 5.38: series_exp_smoothing_fl()
- 5.39: series_fbprophet_forecast_fl()
- 5.40: series_fit_lowess_fl()
- 5.41: series_fit_poly_fl()
- 5.42: series_lag_fl()
- 5.43: series_metric_fl()
- 5.44: series_monthly_decompose_anomalies_fl()
- 5.45: series_moving_avg_fl()
- 5.46: series_moving_var_fl()
- 5.47: series_mv_ee_anomalies_fl()
- 5.48: series_mv_if_anomalies_fl()
- 5.49: series_mv_oc_anomalies_fl()
- 5.50: series_rate_fl()
- 5.51: series_rolling_fl()
- 5.52: series_shapes_fl()
- 5.53: series_uv_anomalies_fl()
- 5.54: series_uv_change_points_fl()
- 5.55: time_weighted_avg_fl()
- 5.56: time_weighted_avg2_fl()
- 5.57: time_weighted_val_fl()
- 5.58: time_window_rolling_avg_fl()
- 5.59: two_sample_t_test_fl()
- 5.60: User-defined functions
- 5.61: wilcoxon_test_fl()
- 6: Geospatial
- 6.1: geo_angle()
- 6.2: geo_azimuth()
- 6.3: geo_distance_2points()
- 6.4: geo_distance_point_to_line()
- 6.5: geo_distance_point_to_polygon()
- 6.6: geo_geohash_neighbors()
- 6.7: geo_geohash_to_central_point()
- 6.8: geo_geohash_to_polygon()
- 6.9: geo_h3cell_children()
- 6.10: geo_h3cell_level()
- 6.11: geo_h3cell_neighbors()
- 6.12: geo_h3cell_parent()
- 6.13: geo_h3cell_rings()
- 6.14: geo_h3cell_to_central_point()
- 6.15: geo_h3cell_to_polygon()
- 6.16: geo_intersection_2lines()
- 6.17: geo_intersection_2polygons()
- 6.18: geo_intersection_line_with_polygon()
- 6.19: geo_intersects_2lines()
- 6.20: geo_intersects_2polygons()
- 6.21: geo_intersects_line_with_polygon()
- 6.22: geo_line_buffer()
- 6.23: geo_line_centroid()
- 6.24: geo_line_densify()
- 6.25: geo_line_length()
- 6.26: geo_line_simplify()
- 6.27: geo_line_to_s2cells()
- 6.28: geo_point_buffer()
- 6.29: geo_point_in_circle()
- 6.30: geo_point_in_polygon()
- 6.31: geo_point_to_geohash()
- 6.32: geo_point_to_h3cell()
- 6.33: geo_point_to_s2cell()
- 6.34: geo_polygon_area()
- 6.35: geo_polygon_buffer()
- 6.36: geo_polygon_centroid()
- 6.37: geo_polygon_densify()
- 6.38: geo_polygon_perimeter()
- 6.39: geo_polygon_simplify()
- 6.40: geo_polygon_to_h3cells()
- 6.41: geo_polygon_to_s2cells()
- 6.42: geo_s2cell_neighbors()
- 6.43: geo_s2cell_to_central_point()
- 6.44: geo_s2cell_to_polygon()
- 6.45: geo_simplify_polygons_array()
- 6.46: geo_union_lines_array()
- 6.47: geo_union_polygons_array()
- 6.48: Geospatial data visualizations
- 6.49: Geospatial grid system
- 7: Graph operators
- 7.1: Best practices for Kusto Query Language (KQL) graph semantics
- 7.2: Graph operators
- 7.3: graph-mark-components operator (Preview)
- 7.4: graph-match operator
- 7.5: graph-shortest-paths Operator (Preview)
- 7.6: graph-to-table operator
- 7.7: Kusto Query Language (KQL) graph semantics overview
- 7.8: make-graph operator
- 7.9: Scenarios for using Kusto Query Language (KQL) graph semantics
- 8: Limits and Errors
- 8.1: Query consistency
- 8.2: Query limits
- 8.3: Partial query failures
- 8.3.1: Kusto query result set exceeds internal limit
- 8.3.2: Overflows
- 8.3.3: Runaway queries
- 9: Plugins
- 9.1: Data reshaping plugins
- 9.1.1: bag_unpack plugin
- 9.1.2: narrow plugin
- 9.1.3: pivot plugin
- 9.2: General plugins
- 9.2.1: dcount_intersect plugin
- 9.2.2: infer_storage_schema plugin
- 9.2.3: infer_storage_schema_with_suggestions plugin
- 9.2.4: ipv4_lookup plugin
- 9.2.5: ipv6_lookup plugin
- 9.2.6: preview plugin
- 9.2.7: schema_merge plugin
- 9.3: Language plugins
- 9.3.1: Python plugin
- 9.3.2: Python plugin packages
- 9.3.3: R plugin (Preview)
- 9.4: Machine learning plugins
- 9.4.1: autocluster plugin
- 9.4.2: basket plugin
- 9.4.3: diffpatterns plugin
- 9.4.4: diffpatterns_text plugin
- 9.5: Query connectivity plugins
- 9.5.1: ai_embed_text plugin (Preview)
- 9.5.2: azure_digital_twins_query_request plugin
- 9.5.3: cosmosdb_sql_request plugin
- 9.5.4: http_request plugin
- 9.5.5: http_request_post plugin
- 9.5.6: mysql_request plugin
- 9.5.7: postgresql_request plugin
- 9.5.8: sql_request plugin
- 9.6: User and sequence analytics plugins
- 9.6.1: active_users_count plugin
- 9.6.2: activity_counts_metrics plugin
- 9.6.3: activity_engagement plugin
- 9.6.4: activity_metrics plugin
- 9.6.5: funnel_sequence plugin
- 9.6.6: funnel_sequence_completion plugin
- 9.6.7: new_activity_metrics plugin
- 9.6.8: rolling_percentile plugin
- 9.6.9: rows_near plugin
- 9.6.10: sequence_detect plugin
- 9.6.11: session_count plugin
- 9.6.12: sliding_window_counts plugin
- 9.6.13: User Analytics
- 10: Query statements
- 10.1: Alias statement
- 10.2: Batches
- 10.3: Let statement
- 10.4: Pattern statement
- 10.5: Query parameters declaration statement
- 10.6: Query statements
- 10.7: Restrict statement
- 10.8: Set statement
- 10.9: Tabular expression statements
- 11: Reference
- 11.1: JSONPath syntax
- 11.2: KQL docs navigation guide
- 11.3: Regex syntax
- 11.4: Splunk to Kusto map
- 11.5: SQL to Kusto query translation
- 11.6: Timezone
- 12: Scalar functions
- 12.1: abs()
- 12.2: acos()
- 12.3: ago()
- 12.4: around() function
- 12.5: array_concat()
- 12.6: array_iff()
- 12.7: array_index_of()
- 12.8: array_length()
- 12.9: array_reverse()
- 12.10: array_rotate_left()
- 12.11: array_rotate_right()
- 12.12: array_shift_left()
- 12.13: array_shift_right()
- 12.14: array_slice()
- 12.15: array_sort_asc()
- 12.16: array_sort_desc()
- 12.17: array_split()
- 12.18: array_sum()
- 12.19: asin()
- 12.20: assert()
- 12.21: atan()
- 12.22: atan2()
- 12.23: bag_has_key()
- 12.24: bag_keys()
- 12.25: bag_merge()
- 12.26: bag_pack_columns()
- 12.27: bag_pack()
- 12.28: bag_remove_keys()
- 12.29: bag_set_key()
- 12.30: bag_zip()
- 12.31: base64_decode_toarray()
- 12.32: base64_decode_toguid()
- 12.33: base64_decode_tostring()
- 12.34: base64_encode_fromarray()
- 12.35: base64_encode_fromguid()
- 12.36: base64_encode_tostring()
- 12.37: beta_cdf()
- 12.38: beta_inv()
- 12.39: beta_pdf()
- 12.40: bin_at()
- 12.41: bin_auto()
- 12.42: bin()
- 12.43: binary_and()
- 12.44: binary_not()
- 12.45: binary_or()
- 12.46: binary_shift_left()
- 12.47: binary_shift_right()
- 12.48: binary_xor()
- 12.49: bitset_count_ones()
- 12.50: case()
- 12.51: ceiling()
- 12.52: coalesce()
- 12.53: column_ifexists()
- 12.54: convert_angle()
- 12.55: convert_energy()
- 12.56: convert_force()
- 12.57: convert_length()
- 12.58: convert_mass()
- 12.59: convert_speed()
- 12.60: convert_temperature()
- 12.61: convert_volume()
- 12.62: cos()
- 12.63: cot()
- 12.64: countof()
- 12.65: current_cluster_endpoint()
- 12.66: current_database()
- 12.67: current_principal_details()
- 12.68: current_principal_is_member_of()
- 12.69: current_principal()
- 12.70: cursor_after()
- 12.71: cursor_before_or_at()
- 12.72: cursor_current()
- 12.73: datetime_add()
- 12.74: datetime_diff()
- 12.75: datetime_list_timezones()
- 12.76: datetime_local_to_utc()
- 12.77: datetime_part()
- 12.78: datetime_utc_to_local()
- 12.79: dayofmonth()
- 12.80: dayofweek()
- 12.81: dayofyear()
- 12.82: dcount_hll()
- 12.83: degrees()
- 12.84: dynamic_to_json()
- 12.85: endofday()
- 12.86: endofmonth()
- 12.87: endofweek()
- 12.88: endofyear()
- 12.89: erf()
- 12.90: erfc()
- 12.91: estimate_data_size()
- 12.92: exp()
- 12.93: exp10()
- 12.94: exp2()
- 12.95: extent_id()
- 12.96: extent_tags()
- 12.97: extract_all()
- 12.98: extract_json()
- 12.99: extract()
- 12.100: format_bytes()
- 12.101: format_datetime()
- 12.102: format_ipv4_mask()
- 12.103: format_ipv4()
- 12.104: format_timespan()
- 12.105: gamma()
- 12.106: geo_info_from_ip_address()
- 12.107: gettype()
- 12.108: getyear()
- 12.109: gzip_compress_to_base64_string
- 12.110: gzip_decompress_from_base64_string()
- 12.111: has_any_ipv4_prefix()
- 12.112: has_any_ipv4()
- 12.113: has_ipv4_prefix()
- 12.114: has_ipv4()
- 12.115: hash_combine()
- 12.116: hash_many()
- 12.117: hash_md5()
- 12.118: hash_sha1()
- 12.119: hash_sha256()
- 12.120: hash_xxhash64()
- 12.121: hash()
- 12.122: hll_merge()
- 12.123: hourofday()
- 12.124: iff()
- 12.125: indexof_regex()
- 12.126: indexof()
- 12.127: ingestion_time()
- 12.128: ipv4_compare()
- 12.129: ipv4_is_in_any_range()
- 12.130: ipv4_is_in_range()
- 12.131: ipv4_is_match()
- 12.132: ipv4_is_private()
- 12.133: ipv4_netmask_suffix()
- 12.134: ipv4_range_to_cidr_list()
- 12.135: ipv6_compare()
- 12.136: ipv6_is_in_any_range()
- 12.137: ipv6_is_in_range()
- 12.138: ipv6_is_match()
- 12.139: isascii()
- 12.140: isempty()
- 12.141: isfinite()
- 12.142: isinf()
- 12.143: isnan()
- 12.144: isnotempty()
- 12.145: isnotnull()
- 12.146: isnull()
- 12.147: isutf8()
- 12.148: jaccard_index()
- 12.149: log()
- 12.150: log10()
- 12.151: log2()
- 12.152: loggamma()
- 12.153: make_datetime()
- 12.154: make_timespan()
- 12.155: max_of()
- 12.156: merge_tdigest()
- 12.157: min_of()
- 12.158: monthofyear()
- 12.159: new_guid()
- 12.160: not()
- 12.161: now()
- 12.162: pack_all()
- 12.163: pack_array()
- 12.164: parse_command_line()
- 12.165: parse_csv()
- 12.166: parse_ipv4_mask()
- 12.167: parse_ipv4()
- 12.168: parse_ipv6_mask()
- 12.169: parse_ipv6()
- 12.170: parse_json() function
- 12.171: parse_path()
- 12.172: parse_url()
- 12.173: parse_urlquery()
- 12.174: parse_user_agent()
- 12.175: parse_version()
- 12.176: parse_xml()
- 12.177: percentile_array_tdigest()
- 12.178: percentile_tdigest()
- 12.179: percentrank_tdigest()
- 12.180: pi()
- 12.181: pow()
- 12.182: punycode_domain_from_string
- 12.183: punycode_domain_to_string
- 12.184: punycode_from_string
- 12.185: punycode_to_string
- 12.186: radians()
- 12.187: rand()
- 12.188: range()
- 12.189: rank_tdigest()
- 12.190: regex_quote()
- 12.191: repeat()
- 12.192: replace_regex()
- 12.193: replace_string()
- 12.194: replace_strings()
- 12.195: reverse()
- 12.196: round()
- 12.197: Scalar Functions
- 12.198: set_difference()
- 12.199: set_has_element()
- 12.200: set_intersect()
- 12.201: set_union()
- 12.202: sign()
- 12.203: sin()
- 12.204: split()
- 12.205: sqrt()
- 12.206: startofday()
- 12.207: startofmonth()
- 12.208: startofweek()
- 12.209: startofyear()
- 12.210: strcat_array()
- 12.211: strcat_delim()
- 12.212: strcat()
- 12.213: strcmp()
- 12.214: string_size()
- 12.215: strlen()
- 12.216: strrep()
- 12.217: substring()
- 12.218: tan()
- 12.219: The has_any_index operator
- 12.220: tobool()
- 12.221: todatetime()
- 12.222: todecimal()
- 12.223: toguid()
- 12.224: tohex()
- 12.225: toint()
- 12.226: tolong()
- 12.227: tolower()
- 12.228: toreal()
- 12.229: tostring()
- 12.230: totimespan()
- 12.231: toupper()
- 12.232: translate()
- 12.233: treepath()
- 12.234: trim_end()
- 12.235: trim_start()
- 12.236: trim()
- 12.237: unicode_codepoints_from_string()
- 12.238: unicode_codepoints_to_string()
- 12.239: unixtime_microseconds_todatetime()
- 12.240: unixtime_milliseconds_todatetime()
- 12.241: unixtime_nanoseconds_todatetime()
- 12.242: unixtime_seconds_todatetime()
- 12.243: url_decode()
- 12.244: url_encode_component()
- 12.245: url_encode()
- 12.246: week_of_year()
- 12.247: welch_test()
- 12.248: zip()
- 12.249: zlib_compress_to_base64_string
- 12.250: zlib_decompress_from_base64_string()
- 13: Scalar operators
- 13.1: Bitwise (binary) operators
- 13.2: Datetime / timespan arithmetic
- 13.3: Logical (binary) operators
- 13.4: Numerical operators
- 13.5: Between operators
- 13.5.1: The !between operator
- 13.5.2: The between operator
- 13.6: in operators
- 13.6.1: The case-insensitive !in~ string operator
- 13.6.2: The case-insensitive in~ string operator
- 13.6.3: The case-sensitive !in string operator
- 13.6.4: The case-sensitive in string operator
- 13.7: String operators
- 13.7.1: matches regex operator
- 13.7.2: String operators
- 13.7.3: The case-insensitive !~ (not equals) string operator
- 13.7.4: The case-insensitive !contains string operator
- 13.7.5: The case-insensitive !endswith string operator
- 13.7.6: The case-insensitive !has string operators
- 13.7.7: The case-insensitive !hasprefix string operator
- 13.7.8: The case-insensitive !hassuffix string operator
- 13.7.9: The case-insensitive !in~ string operator
- 13.7.10: The case-insensitive !startswith string operators
- 13.7.11: The case-insensitive =~ (equals) string operator
- 13.7.12: The case-insensitive contains string operator
- 13.7.13: The case-insensitive endswith string operator
- 13.7.14: The case-insensitive has string operator
- 13.7.15: The case-insensitive has_all string operator
- 13.7.16: The case-insensitive has_any string operator
- 13.7.17: The case-insensitive hasprefix string operator
- 13.7.18: The case-insensitive hassuffix string operator
- 13.7.19: The case-insensitive in~ string operator
- 13.7.20: The case-insensitive startswith string operator
- 13.7.21: The case-sensitive != (not equals) string operator
- 13.7.22: The case-sensitive !contains_cs string operator
- 13.7.23: The case-sensitive !endswith_cs string operator
- 13.7.24: The case-sensitive !has_cs string operator
- 13.7.25: The case-sensitive !hasprefix_cs string operator
- 13.7.26: The case-sensitive !hassuffix_cs string operator
- 13.7.27: The case-sensitive !in string operator
- 13.7.28: The case-sensitive !startswith_cs string operator
- 13.7.29: The case-sensitive == (equals) string operator
- 13.7.30: The case-sensitive contains_cs string operator
- 13.7.31: The case-sensitive endswith_cs string operator
- 13.7.32: The case-sensitive has_cs string operator
- 13.7.33: The case-sensitive hasprefix_cs string operator
- 13.7.34: The case-sensitive hassuffix_cs string operator
- 13.7.35: The case-sensitive in string operator
- 13.7.36: The case-sensitive startswith string operator
- 14: Special functions
- 14.1: cluster()
- 14.2: Cross-cluster and cross-database queries
- 14.3: database()
- 14.4: external_table()
- 14.5: materialize()
- 14.6: materialized_view()
- 14.7: Query results cache
- 14.8: stored_query_result()
- 14.9: table()
- 14.10: toscalar()
- 15: Tabular operators
- 15.1: Join operator
- 15.1.1: join flavors
- 15.1.1.1: fullouter join
- 15.1.1.2: inner join
- 15.1.1.3: innerunique join
- 15.1.1.4: leftanti join
- 15.1.1.5: leftouter join
- 15.1.1.6: leftsemi join
- 15.1.1.7: rightanti join
- 15.1.1.8: rightouter join
- 15.1.1.9: rightsemi join
- 15.1.2: Broadcast join
- 15.1.3: Cross-cluster join
- 15.1.4: join operator
- 15.1.5: Joining within time window
- 15.2: Render operator
- 15.2.1: visualizations
- 15.2.1.1: Anomaly chart visualization
- 15.2.1.2: Area chart visualization
- 15.2.1.3: Bar chart visualization
- 15.2.1.4: Card visualization
- 15.2.1.5: Column chart visualization
- 15.2.1.6: Ladder chart visualization
- 15.2.1.7: Line chart visualization
- 15.2.1.8: Pie chart visualization
- 15.2.1.9: Pivot chart visualization
- 15.2.1.10: Plotly visualization
- 15.2.1.11: Scatter chart visualization
- 15.2.1.12: Stacked area chart visualization
- 15.2.1.13: Table visualization
- 15.2.1.14: Time chart visualization
- 15.2.1.15: Time pivot visualization
- 15.2.1.16: Treemap visualization
- 15.2.2: render operator
- 15.3: Summarize operator
- 15.4: as operator
- 15.5: consume operator
- 15.6: count operator
- 15.7: datatable operator
- 15.8: distinct operator
- 15.9: evaluate plugin operator
- 15.10: extend operator
- 15.11: externaldata operator
- 15.12: facet operator
- 15.13: find operator
- 15.14: fork operator
- 15.15: getschema operator
- 15.16: invoke operator
- 15.17: lookup operator
- 15.18: mv-apply operator
- 15.19: mv-expand operator
- 15.20: parse operator
- 15.21: parse-kv operator
- 15.22: parse-where operator
- 15.23: partition operator
- 15.24: print operator
- 15.25: Project operator
- 15.26: project-away operator
- 15.27: project-keep operator
- 15.28: project-rename operator
- 15.29: project-reorder operator
- 15.30: Queries
- 15.31: range operator
- 15.32: reduce operator
- 15.33: sample operator
- 15.34: sample-distinct operator
- 15.35: scan operator
- 15.36: search operator
- 15.37: serialize operator
- 15.38: Shuffle query
- 15.39: sort operator
- 15.40: take operator
- 15.41: top operator
- 15.42: top-hitters operator
- 15.43: top-nested operator
- 15.44: union operator
- 15.45: where operator
- 16: Time series analysis
- 16.1: Example use cases
- 16.1.1: Analyze time series data
- 16.1.2: Anomaly diagnosis for root cause analysis
- 16.1.3: Time series anomaly detection & forecasting
- 16.2: make-series operator
- 16.3: series_abs()
- 16.4: series_acos()
- 16.5: series_add()
- 16.6: series_atan()
- 16.7: series_cos()
- 16.8: series_cosine_similarity()
- 16.9: series_decompose_anomalies()
- 16.10: series_decompose_forecast()
- 16.11: series_decompose()
- 16.12: series_divide()
- 16.13: series_dot_product()
- 16.14: series_equals()
- 16.15: series_exp()
- 16.16: series_fft()
- 16.17: series_fill_backward()
- 16.18: series_fill_const()
- 16.19: series_fill_forward()
- 16.20: series_fill_linear()
- 16.21: series_fir()
- 16.22: series_fit_2lines_dynamic()
- 16.23: series_fit_2lines()
- 16.24: series_fit_line_dynamic()
- 16.25: series_fit_line()
- 16.26: series_fit_poly()
- 16.27: series_floor()
- 16.28: series_greater_equals()
- 16.29: series_greater()
- 16.30: series_ifft()
- 16.31: series_iir()
- 16.32: series_less_equals()
- 16.33: series_less()
- 16.34: series_log()
- 16.35: series_magnitude()
- 16.36: series_multiply()
- 16.37: series_not_equals()
- 16.38: series_outliers()
- 16.39: series_pearson_correlation()
- 16.40: series_periods_detect()
- 16.41: series_periods_validate()
- 16.42: series_seasonal()
- 16.43: series_sign()
- 16.44: series_sin()
- 16.45: series_stats_dynamic()
- 16.46: series_stats()
- 16.47: series_subtract()
- 16.48: series_sum()
- 16.49: series_tan()
- 16.50: series_asin()
- 16.51: series_ceiling()
- 16.52: series_pow()
- 17: Window functions
- 17.1: next()
- 17.2: prev()
- 17.3: row_cumsum()
- 17.4: row_number()
- 17.5: row_rank_dense()
- 17.6: row_rank_min()
- 17.7: row_window_session()
- 17.8: Window functions
- 18: Add a comment in KQL
- 19: Debug Kusto Query Language inline Python using Visual Studio Code
- 20: Set timeouts
- 21: Syntax conventions for reference documentation
- 22: T-SQL
1 - Aggregation functions
1.1 - Aggregation Functions
An aggregation function performs a calculation on a set of values, and returns a single value. These functions are used in conjunction with the summarize operator. This article lists all available aggregation functions grouped by type. For scalar functions, see Scalar function types.
Binary functions
Function Name | Description |
---|---|
binary_all_and() | Returns aggregated value using the binary AND of the group. |
binary_all_or() | Returns aggregated value using the binary OR of the group. |
binary_all_xor() | Returns aggregated value using the binary XOR of the group. |
Dynamic functions
Function Name | Description |
---|---|
buildschema() | Returns the minimal schema that admits all values of the dynamic input. |
make_bag(), make_bag_if() | Returns a property bag of dynamic values within the group without/with a predicate. |
make_list(), make_list_if() | Returns a list of all the values within the group without/with a predicate. |
make_list_with_nulls() | Returns a list of all the values within the group, including null values. |
make_set(), make_set_if() | Returns a set of distinct values within the group without/with a predicate. |
Row selector functions
Function Name | Description |
---|---|
arg_max() | Returns one or more expressions when the argument is maximized. |
arg_min() | Returns one or more expressions when the argument is minimized. |
take_any(), take_anyif() | Returns a random non-empty value for the group without/with a predicate. |
Statistical functions
Function Name | Description |
---|---|
avg() | Returns an average value across the group. |
avgif() | Returns an average value across the group (with predicate). |
count(), countif() | Returns a count of the group without/with a predicate. |
count_distinct(), count_distinctif() | Returns a count of unique elements in the group without/with a predicate. |
dcount(), dcountif() | Returns an approximate distinct count of the group elements without/with a predicate. |
hll() | Returns the HyperLogLog (HLL) results of the group elements, an intermediate value of the dcount approximation. |
hll_if() | Returns the HyperLogLog (HLL) results of the group elements, an intermediate value of the dcount approximation (with predicate). |
hll_merge() | Returns a value for merged HLL results. |
max(), maxif() | Returns the maximum value across the group without/with a predicate. |
min(), minif() | Returns the minimum value across the group without/with a predicate. |
percentile() | Returns a percentile estimation of the group. |
percentiles() | Returns percentile estimations of the group. |
percentiles_array() | Returns the percentile approximates of the array. |
percentilesw() | Returns the weighted percentile approximate of the group. |
percentilesw_array() | Returns the weighted percentile approximate of the array. |
stdev(), stdevif() | Returns the standard deviation across the group for a population that is considered a sample without/with a predicate. |
stdevp() | Returns the standard deviation across the group for a population that is considered representative. |
sum(), sumif() | Returns the sum of the elements within the group without/with a predicate. |
tdigest() | Returns an intermediate result for the percentiles approximation, the weighted percentile approximate of the group. |
tdigest_merge() | Returns the merged tdigest value across the group. |
variance(), varianceif() | Returns the variance across the group without/with a predicate. |
variancep() | Returns the variance across the group for a population that is considered representative. |
1.2 - arg_max() (aggregation function)
Finds a row in the table that maximizes the specified expression. It returns all columns of the input table or specified columns.
Syntax
arg_max
(
ExprToMaximize,
* | ExprToReturn [,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
ExprToMaximize | string | ✔️ | The expression for which the maximum value is determined. |
ExprToReturn | string | ✔️ | The expression determines which columns’ values are returned, from the row that has the maximum value for ExprToMaximize. Use a wildcard * to return all columns. |
Returns
Returns a row in the table that maximizes the specified expression ExprToMaximize, and the values of columns specified in ExprToReturn.
Examples
Find maximum latitude
The following example finds the maximum latitude of a storm event in each state.
StormEvents
| summarize arg_max(BeginLat, BeginLocation) by State
Output
The results table displays only the first 10 rows.
State | BeginLat | BeginLocation |
---|---|---|
MISSISSIPPI | 34.97 | BARTON |
VERMONT | 45 | NORTH TROY |
AMERICAN SAMOA | -14.2 | OFU |
HAWAII | 22.2113 | PRINCEVILLE |
MINNESOTA | 49.35 | ARNESEN |
RHODE ISLAND | 42 | WOONSOCKET |
INDIANA | 41.73 | FREMONT |
WEST VIRGINIA | 40.62 | CHESTER |
SOUTH CAROLINA | 35.18 | LANDRUM |
TEXAS | 36.4607 | DARROUZETT |
… | … | … |
Find last state fatal event
The following example finds the last time an event with a direct death happened in each state, showing all the columns.
The query first filters the events to include only those events where there was at least one direct death. Then the query returns the entire row with the most recent StartTime
.
StormEvents
| where DeathsDirect > 0
| summarize arg_max(StartTime, *) by State
Output
The results table displays only the first 10 rows and first three columns.
State | StartTime | EndTime | … |
---|---|---|---|
GUAM | 2007-01-27T11:15:00Z | 2007-01-27T11:30:00Z | … |
MASSACHUSETTS | 2007-02-03T22:00:00Z | 2007-02-04T10:00:00Z | … |
AMERICAN SAMOA | 2007-02-17T13:00:00Z | 2007-02-18T11:00:00Z | … |
IDAHO | 2007-02-17T13:00:00Z | 2007-02-17T15:00:00Z | … |
DELAWARE | 2007-02-25T13:00:00Z | 2007-02-26T01:00:00Z | … |
WYOMING | 2007-03-10T17:00:00Z | 2007-03-10T17:00:00Z | … |
NEW MEXICO | 2007-03-23T18:42:00Z | 2007-03-23T19:06:00Z | … |
INDIANA | 2007-05-15T14:14:00Z | 2007-05-15T14:14:00Z | … |
MONTANA | 2007-05-18T14:20:00Z | 2007-05-18T14:20:00Z | … |
LAKE MICHIGAN | 2007-06-07T13:00:00Z | 2007-06-07T13:00:00Z | … |
… | … | … | … |
Handle nulls
The following example demonstrates null handling.
datatable(Fruit: string, Color: string, Version: int) [
"Apple", "Red", 1,
"Apple", "Green", int(null),
"Banana", "Yellow", int(null),
"Banana", "Green", int(null),
"Pear", "Brown", 1,
"Pear", "Green", 2,
]
| summarize arg_max(Version, *) by Fruit
Output
Fruit | Version | Color |
---|---|---|
Apple | 1 | Red |
Banana | Yellow | |
Pear | 2 | Green |
Comparison to max()
The arg_max() function differs from the max() function. The arg_max() function allows you to return other columns along with the maximum value, and max() only returns the maximum value itself.
Examples
arg_max()
Find the last time an event with a direct death happened, showing all the columns in the table.
The query first filters the events to only include events where there was at least one direct death. Then the query returns the entire row with the most recent (maximum) StartTime.
StormEvents
| where DeathsDirect > 0
| summarize arg_max(StartTime, *)
The results table returns all the columns for the row containing the highest value in the expression specified.
| StartTime | EndTime | EpisodeId | EventId | State | EventType | … | |–|–|–|–| | 2007-12-31T15:00:00Z | 2007-12-31T15:00:00 | 12688 | 69700 | UTAH | Avalanche | … |
max()
Find the last time an event with a direct death happened.
The query filters events to only include events where there is at least one direct death, and then returns the maximum value for StartTime.
StormEvents
| where DeathsDirect > 0
| summarize max(StartTime)
The results table returns the maximum value of StartTime, without returning other columns for this record.
max_StartTime |
---|
2007-12-31T15:00:00Z |
Related content
1.3 - arg_min() (aggregation function)
Finds a row in the table that minimizes the specified expression. It returns all columns of the input table or specified columns.
Syntax
arg_min
(
ExprToMinimize,
* | ExprToReturn [,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
ExprToMinimize | string | ✔️ | The expression for which the minimum value is determined. |
ExprToReturn | string | ✔️ | The expression determines which columns’ values are returned, from the row that has the minimum value for ExprToMinimize. Use a wildcard * to return all columns. |
Null handling
When ExprToMinimize is null for all rows in a table, one row in the table is picked. Otherwise, rows where ExprToMinimize is null are ignored.
Returns
Returns a row in the table that minimizes ExprToMinimize, and the values of columns specified in ExprToReturn. Use or *
to return the entire row.
Examples
Find the minimum latitude of a storm event in each state.
StormEvents
| summarize arg_min(BeginLat, BeginLocation) by State
Output
The results table shown includes only the first 10 rows.
State | BeginLat | BeginLocation |
---|---|---|
AMERICAN SAMOA | -14.3 | PAGO PAGO |
CALIFORNIA | 32.5709 | NESTOR |
MINNESOTA | 43.5 | BIGELOW |
WASHINGTON | 45.58 | WASHOUGAL |
GEORGIA | 30.67 | FARGO |
ILLINOIS | 37 | CAIRO |
FLORIDA | 24.6611 | SUGARLOAF KEY |
KENTUCKY | 36.5 | HAZEL |
TEXAS | 25.92 | BROWNSVILLE |
OHIO | 38.42 | SOUTH PT |
… | … | … |
Find the first time an event with a direct death happened in each state, showing all of the columns.
The query first filters the events to only include those where there was at least one direct death. Then the query returns the entire row with the lowest value for StartTime.
StormEvents
| where DeathsDirect > 0
| summarize arg_min(StartTime, *) by State
Output
The results table shown includes only the first 10 rows and first 3 columns.
State | StartTime | EndTime | … |
---|---|---|---|
INDIANA | 2007-01-01T00:00:00Z | 2007-01-22T18:49:00Z | … |
FLORIDA | 2007-01-03T10:55:00Z | 2007-01-03T10:55:00Z | … |
NEVADA | 2007-01-04T09:00:00Z | 2007-01-05T14:00:00Z | … |
LOUISIANA | 2007-01-04T15:45:00Z | 2007-01-04T15:52:00Z | … |
WASHINGTON | 2007-01-09T17:00:00Z | 2007-01-09T18:00:00Z | … |
CALIFORNIA | 2007-01-11T22:00:00Z | 2007-01-24T10:00:00Z | … |
OKLAHOMA | 2007-01-12T00:00:00Z | 2007-01-18T23:59:00Z | … |
MISSOURI | 2007-01-13T03:00:00Z | 2007-01-13T08:30:00Z | … |
TEXAS | 2007-01-13T10:30:00Z | 2007-01-13T14:30:00Z | … |
ARKANSAS | 2007-01-14T03:00:00Z | 2007-01-14T03:00:00Z | … |
… | … | … | … |
The following example demonstrates null handling.
datatable(Fruit: string, Color: string, Version: int) [
"Apple", "Red", 1,
"Apple", "Green", int(null),
"Banana", "Yellow", int(null),
"Banana", "Green", int(null),
"Pear", "Brown", 1,
"Pear", "Green", 2,
]
| summarize arg_min(Version, *) by Fruit
Output
Fruit | Version | Color |
---|---|---|
Apple | 1 | Red |
Banana | Yellow | |
Pear | 1 | Brown |
Comparison to min()
The arg_min() function differs from the min() function. The arg_min() function allows you to return additional columns along with the minimum value, and min() only returns the minimum value itself.
Examples
arg_min()
Find the first time an event with a direct death happened, showing all the columns in the table.
The query first filters the events to only include those where there was at least one direct death. Then the query returns the entire row with the lowest value for StartTime.
StormEvents
| where DeathsDirect > 0
| summarize arg_min(StartTime, *)
The results table returns all the columns for the row containing the lowest value in the expression specified.
| StartTime | EndTime | EpisodeId | EventId | State | EventType | … | |–|–|–|–| | 2007-01-01T00:00:00Z | 2007-01-22T18:49:00Z | 2408 | 11929 | INDIANA | Flood | … |
min()
Find the first time an event with a direct death happened.
The query filters events to only include those where there is at least one direct death, and then returns the minimum value for StartTime.
StormEvents
| where DeathsDirect > 0
| summarize min(StartTime)
The results table returns the lowest value in the specific column only.
min_StartTime |
---|
2007-01-01T00:00:00Z |
Related content
1.4 - avg() (aggregation function)
Calculates the average (arithmetic mean) of expr across the group.
Syntax
avg(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for aggregation calculation. Records with null values are ignored and not included in the calculation. |
Returns
Returns the average value of expr across the group.
Example
The following example returns the average number of damaged crops per state.
StormEvents
| summarize AvgDamageToCrops = avg(DamageCrops) by State
The results table shown includes only the first 10 rows.
State | AvgDamageToCrops |
---|---|
TEXAS | 7524.569241 |
KANSAS | 15366.86671 |
IOWA | 4332.477535 |
ILLINOIS | 44568.00198 |
MISSOURI | 340719.2212 |
GEORGIA | 490702.5214 |
MINNESOTA | 2835.991494 |
WISCONSIN | 17764.37838 |
NEBRASKA | 21366.36467 |
NEW YORK | 5.714285714 |
… | … |
Related content
1.5 - avgif() (aggregation function)
Calculates the average of expr in records for which predicate evaluates to true
.
Syntax
avgif
(
expr,
predicate)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for aggregation calculation. Records with null values are ignored and not included in the calculation. |
predicate | string | ✔️ | The predicate that if true, the expr calculated value is added to the average. |
Returns
Returns the average value of expr in records where predicate evaluates to true
.
Example
The following example calculates the average damage by state in cases where there was any damage.
StormEvents
| summarize Averagedamage=tolong(avg( DamageCrops)),AverageWhenDamage=tolong(avgif(DamageCrops,DamageCrops >0)) by State
Output
The results table shown includes only the first 10 rows.
State | Averagedamage | Averagewhendamage |
---|---|---|
TEXAS | 7524 | 491291 |
KANSAS | 15366 | 695021 |
IOWA | 4332 | 28203 |
ILLINOIS | 44568 | 2574757 |
MISSOURI | 340719 | 8806281 |
GEORGIA | 490702 | 57239005 |
MINNESOTA | 2835 | 144175 |
WISCONSIN | 17764 | 438188 |
NEBRASKA | 21366 | 187726 |
NEW YORK | 5 | 10000 |
… | … | … |
Related content
1.6 - binary_all_and() (aggregation function)
Accumulates values using the binary AND
operation for each summarization group, or in total if a group isn’t specified.
Syntax
binary_all_and
(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | long | ✔️ | The value used for the binary AND calculation. |
Returns
Returns an aggregated value using the binary AND
operation over records for each summarization group, or in total if a group isn’t specified.
Example
The following example produces CAFEF00D
using binary AND
operations:
datatable(num:long)
[
0xFFFFFFFF,
0xFFFFF00F,
0xCFFFFFFD,
0xFAFEFFFF,
]
| summarize result = toupper(tohex(binary_all_and(num)))
Output
result |
---|
CAFEF00D |
Related content
1.7 - binary_all_or() (aggregation function)
Accumulates values using the binary OR
operation for each summarization group, or in total if a group isn’t specified.
Syntax
binary_all_or
(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | long | ✔️ | The value used for the binary OR calculation. |
Returns
Returns an aggregated value using the binary OR
operation over records for each summarization group, or in total if a group isn’t specified.
Example
The following example produces CAFEF00D
using binary OR
operations:
datatable(num:long)
[
0x88888008,
0x42000000,
0x00767000,
0x00000005,
]
| summarize result = toupper(tohex(binary_all_or(num)))
Output
result |
---|
CAFEF00D |
Related content
1.8 - binary_all_xor() (aggregation function)
Accumulates values using the binary XOR
operation for each summarization group, or in total if a group is not specified.
Syntax
binary_all_xor
(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | long | ✔️ | The value used for the binary XOR calculation. |
Returns
Returns a value that is aggregated using the binary XOR
operation over records for each summarization group, or in total if a group isn’t specified.
Example
The following example produces CAFEF00D
using binary XOR
operations:
datatable(num:long)
[
0x44404440,
0x1E1E1E1E,
0x90ABBA09,
0x000B105A,
]
| summarize result = toupper(tohex(binary_all_xor(num)))
Output
results |
---|
CAFEF00D |
Related content
1.9 - buildschema() (aggregation function)
Builds the minimal schema that admits all values of DynamicExpr.
Syntax
buildschema
(
DynamicExpr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
DynamicExpr | dynamic | ✔️ | Expression used for the aggregation calculation. |
Returns
Returns the minimal schema that admits all values of DynamicExpr.
Example
The following example builds a schema based on:
{"x":1, "y":3.5}
{"x":"somevalue", "z":[1, 2, 3]}
{"y":{"w":"zzz"}, "t":["aa", "bb"], "z":["foo"]}
datatable(value: dynamic) [
dynamic({"x":1, "y":3.5}),
dynamic({"x":"somevalue", "z":[1, 2, 3]}),
dynamic({"y":{"w":"zzz"}, "t":["aa", "bb"], "z":["foo"]})
]
| summarize buildschema(value)
Results
schema_value |
---|
{“x”:[“long”,“string”],“y”:[“double”,{“w”:“string”}],“z”:{"indexer ":[“long”,“string”]},“t”:{"indexer ":“string”}} |
Schema breakdown
In the resulting schema:
- The root object is a container with four properties named
x
,y
,z
, andt
. - Property
x
is either type long or type string. - Property
y
is either type double or another container with a propertyw
of type string. - Property
z
is an array, indicated by theindexer
keyword, where each item can be either type long or type string. - Property
t
is an array, indicated by theindexer
keyword, where each item is a string. - Every property is implicitly optional, and any array might be empty.
Related content
1.10 - count_distinct() (aggregation function) - (preview)
Counts unique values specified by the scalar expression per summary group, or the total number of unique values if the summary group is omitted.
If you only need an estimation of unique values count, we recommend using the less resource-consuming dcount
aggregation function.
To count only records for which a predicate returns true
, use the count_distinctif aggregation function.
Syntax
count_distinct
(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | scalar | ✔️ | The expression whose unique values are to be counted. |
Returns
Long integer value indicating the number of unique values of expr per summary group.
Example
The following example shows how many types of storm events happened in each state.
Function performance can be degraded when operating on multiple data sources from different clusters.
StormEvents
| summarize UniqueEvents=count_distinct(EventType) by State
| top 5 by UniqueEvents
Output
State | UniqueEvents |
---|---|
TEXAS | 27 |
CALIFORNIA | 26 |
PENNSYLVANIA | 25 |
GEORGIA | 24 |
NORTH CAROLINA | 23 |
Related content
1.11 - count_distinctif() (aggregation function) - (preview)
Conditionally counts unique values specified by the scalar expression per summary group, or the total number of unique values if the summary group is omitted. Only records for which predicate evaluates to true
are counted.
If you only need an estimation of unique values count, we recommend using the less resource-consuming dcountif aggregation function.
Syntax
count_distinctif
(
expr,
predicate)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | scalar | ✔️ | The expression whose unique values are to be counted. |
predicate | string | ✔️ | The expression used to filter records to be aggregated. |
Returns
Integer value indicating the number of unique values of expr per summary group, for all records for which the predicate evaluates to true
.
Example
The following example shows how many types of death-causing storm events happened in each state. Only storm events with a nonzero count of deaths are counted.
StormEvents
| summarize UniqueFatalEvents=count_distinctif(EventType,(DeathsDirect + DeathsIndirect)>0) by State
| where UniqueFatalEvents > 0
| top 5 by UniqueFatalEvents
Output
State | UniqueFatalEvents |
---|---|
TEXAS | 12 |
CALIFORNIA | 12 |
OKLAHOMA | 10 |
NEW YORK | 9 |
KANSAS | 9 |
Related content
1.12 - count() (aggregation function)
Counts the number of records per summarization group, or total if summarization is done without grouping.
To only count records for which a predicate returns true
, use countif().
Syntax
count()
Returns
Returns a count of the records per summarization group, or in total if summarization is done without grouping.
Example
The following example returns a count of events in states:
StormEvents
| summarize Count=count() by State
Output
State | Count |
---|---|
TEXAS | 4701 |
KANSAS | 3166 |
IOWA | 2337 |
ILLINOIS | 2022 |
MISSOURI | 2016 |
GEORGIA | 1983 |
MINNESOTA | 1881 |
WISCONSIN | 1850 |
NEBRASKA | 1766 |
NEW YORK | 1750 |
… | … |
Related content
1.13 - countif() (aggregation function)
Counts the rows in which predicate evaluates to true
.
Syntax
countif
(
predicate)
Parameters
Name | Type | Required | Description |
---|---|---|---|
predicate | string | ✔️ | The expression used for aggregation calculation. The value can be any scalar expression with a return type of bool. |
Returns
Returns a count of rows in which predicate evaluates to true
.
Examples
Count storms by state
This example shows the number of storms with damage to crops by state.
StormEvents
| summarize TotalCount=count(),TotalWithDamage=countif(DamageCrops >0) by State
The results table shown includes only the first 10 rows.
State | TotalCount | TotalWithDamage |
---|---|---|
TEXAS | 4701 | 72 |
KANSAS | 3166 | 70 |
IOWA | 2337 | 359 |
ILLINOIS | 2022 | 35 |
MISSOURI | 2016 | 78 |
GEORGIA | 1983 | 17 |
MINNESOTA | 1881 | 37 |
WISCONSIN | 1850 | 75 |
NEBRASKA | 1766 | 201 |
NEW YORK | 1750 | 1 |
… | … | … |
Count based on string length
This example shows the number of names with more than four letters.
let T = datatable(name:string, day_of_birth:long)
[
"John", 9,
"Paul", 18,
"George", 25,
"Ringo", 7
];
T
| summarize countif(strlen(name) > 4)
Output
countif_ |
---|
2 |
Related content
1.14 - dcount() (aggregation function)
Calculates an estimate of the number of distinct values that are taken by a scalar expression in the summary group.
Syntax
dcount
(
expr[,
accuracy])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The input whose distinct values are to be counted. |
accuracy | int | The value that defines the requested estimation accuracy. The default value is 1 . See Estimation accuracy for supported values. |
Returns
Returns an estimate of the number of distinct values of expr in the group.
Example
This example shows how many types of storm events happened in each state.
StormEvents
| summarize DifferentEvents=dcount(EventType) by State
| order by DifferentEvents
The results table shown includes only the first 10 rows.
State | DifferentEvents |
---|---|
TEXAS | 27 |
CALIFORNIA | 26 |
PENNSYLVANIA | 25 |
GEORGIA | 24 |
ILLINOIS | 23 |
MARYLAND | 23 |
NORTH CAROLINA | 23 |
MICHIGAN | 22 |
FLORIDA | 22 |
OREGON | 21 |
KANSAS | 21 |
… | … |
Estimation accuracy
Related content
1.15 - dcountif() (aggregation function)
Estimates the number of distinct values of expr for rows in which predicate evaluates to true
.
Syntax
dcountif
(
expr, predicate, [,
accuracy])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
predicate | string | ✔️ | The expression used to filter rows. |
accuracy | int | The control between speed and accuracy. If unspecified, the default value is 1 . See Estimation accuracy for supported values. |
Returns
Returns an estimate of the number of distinct values of expr for rows in which predicate evaluates to true
.
Example
This example shows how many types of fatal storm events happened in each state.
StormEvents
| summarize DifferentFatalEvents=dcountif(EventType,(DeathsDirect + DeathsIndirect)>0) by State
| where DifferentFatalEvents > 0
| order by DifferentFatalEvents
The results table shown includes only the first 10 rows.
State | DifferentFatalEvents |
---|---|
CALIFORNIA | 12 |
TEXAS | 12 |
OKLAHOMA | 10 |
ILLINOIS | 9 |
KANSAS | 9 |
NEW YORK | 9 |
NEW JERSEY | 7 |
WASHINGTON | 7 |
MICHIGAN | 7 |
MISSOURI | 7 |
… | … |
Estimation accuracy
Related content
1.16 - hll_if() (aggregation function)
Calculates the intermediate results of dcount
in records for which the predicate evaluates to true
.
Read about the underlying algorithm (HyperLogLog) and the estimation accuracy.
Syntax
hll_if
(
expr, predicate [,
accuracy])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
predicate | string | ✔️ | The Expr used to filter records to add to the intermediate result of dcount . |
accuracy | int | The value that controls the balance between speed and accuracy. If unspecified, the default value is 1 . For supported values, see Estimation accuracy. |
Returns
Returns the intermediate results of distinct count of Expr for which Predicate evaluates to true
.
Examples
The following query results in the number of unique flood event sources in Iowa and Kansas. It uses the hll_if()
function to show only flood events.
StormEvents
| where State in ("IOWA", "KANSAS")
| summarize hll_flood = hll_if(Source, EventType == "Flood") by State
| project State, SourcesOfFloodEvents = dcount_hll(hll_flood)
Output
State | SourcesOfFloodEvents |
---|---|
KANSAS | 11 |
IOWA | 7 |
Estimation accuracy
Accuracy | Speed | Error (%) | |
---|---|---|---|
0 | Fastest | 1.6 | |
1 | Balanced | 0.8 | |
2 | Slow | 0.4 | |
3 | Slow | 0.28 | |
4 | Slowest | 0.2 |
Related content
1.17 - hll_merge() (aggregation function)
Merges HLL results across the group into a single HLL value.
For more information, see the underlying algorithm (HyperLogLog) and estimation accuracy.
Syntax
hll_merge
(
hll)
Parameters
Name | Type | Required | Description |
---|---|---|---|
hll | string | ✔️ | The column name containing HLL values to merge. |
Returns
The function returns the merged HLL values of hll across the group.
Example
The following example shows HLL results across a group merged into a single HLL value.
StormEvents
| summarize hllRes = hll(DamageProperty) by bin(StartTime,10m)
| summarize hllMerged = hll_merge(hllRes)
Output
The results show only the first five results in the array.
hllMerged |
---|
[[1024,14],["-6903255281122589438","-7413697181929588220","-2396604341988936699",“5824198135224880646”,"-6257421034880415225", …],[]] |
Estimation accuracy
Related content
1.18 - hll() (aggregation function)
The hll()
function is a way to estimate the number of unique values in a set of values. It does so by calculating intermediate results for aggregation within the summarize operator for a group of data using the dcount
function.
Read about the underlying algorithm (HyperLogLog) and the estimation accuracy.
Syntax
hll
(
expr [,
accuracy])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
accuracy | int | The value that controls the balance between speed and accuracy. If unspecified, the default value is 1 . For supported values, see Estimation accuracy. |
Returns
Returns the intermediate results of distinct count of expr across the group.
Example
In the following example, the hll()
function is used to estimate the number of unique values of the DamageProperty
column within each 10-minute time bin of the StartTime
column.
StormEvents
| summarize hll(DamageProperty) by bin(StartTime,10m)
Output
The results table shown includes only the first 10 rows.
StartTime | hll_DamageProperty |
---|---|
2007-01-01T00:20:00Z | [[1024,14],[“3803688792395291579”],[]] |
2007-01-01T01:00:00Z | [[1024,14],[“7755241107725382121”,"-5665157283053373866",“3803688792395291579”,"-1003235211361077779"],[]] |
2007-01-01T02:00:00Z | [[1024,14],["-1003235211361077779","-5665157283053373866",“7755241107725382121”],[]] |
2007-01-01T02:20:00Z | [[1024,14],[“7755241107725382121”],[]] |
2007-01-01T03:30:00Z | [[1024,14],[“3803688792395291579”],[]] |
2007-01-01T03:40:00Z | [[1024,14],["-5665157283053373866"],[]] |
2007-01-01T04:30:00Z | [[1024,14],[“3803688792395291579”],[]] |
2007-01-01T05:30:00Z | [[1024,14],[“3803688792395291579”],[]] |
2007-01-01T06:30:00Z | [[1024,14],[“1589522558235929902”],[]] |
Estimation accuracy
Related content
1.19 - make_bag_if() (aggregation function)
Creates a dynamic
JSON property bag (dictionary) of expr values in records for which predicate evaluates to true
.
Syntax
make_bag_if(
expr,
predicate [,
maxSize])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | dynamic | ✔️ | The expression used for the aggregation calculation. |
predicate | bool | ✔️ | The predicate that evaluates to true , in order for expr to be added to the result. |
maxSize | int | The limit on the maximum number of elements returned. The default and max value is 1048576. |
Returns
Returns a dynamic
JSON property bag (dictionary) of expr values in records for which predicate evaluates to true
. Nondictionary values are skipped.
If a key appears in more than one row, an arbitrary value, out of the possible values for this key, are selected.
Example
The following example shows a packed JSON property bag.
let T = datatable(prop:string, value:string, predicate:bool)
[
"prop01", "val_a", true,
"prop02", "val_b", false,
"prop03", "val_c", true
];
T
| extend p = bag_pack(prop, value)
| summarize dict=make_bag_if(p, predicate)
Output
dict |
---|
{ “prop01”: “val_a”, “prop03”: “val_c” } |
Use bag_unpack() plugin for transforming the bag keys in the make_bag_if() output into columns.
let T = datatable(prop:string, value:string, predicate:bool)
[
"prop01", "val_a", true,
"prop02", "val_b", false,
"prop03", "val_c", true
];
T
| extend p = bag_pack(prop, value)
| summarize bag=make_bag_if(p, predicate)
| evaluate bag_unpack(bag)
Output
prop01 | prop03 |
---|---|
val_a | val_c |
Related content
1.20 - make_bag() (aggregation function)
Creates a dynamic
JSON property bag (dictionary) of all the values of expr in the group.
Syntax
make_bag
(
expr [,
maxSize])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | dynamic | ✔️ | The expression used for the aggregation calculation. |
maxSize | int | The limit on the maximum number of elements returned. The default and max value is 1048576. |
Returns
Returns a dynamic
JSON property bag (dictionary) of all the values of Expr in the group, which are property bags. Nondictionary values are skipped.
If a key appears in more than one row, an arbitrary value, out of the possible values for this key, is selected.
Example
The following example shows a packed JSON property bag.
let T = datatable(prop:string, value:string)
[
"prop01", "val_a",
"prop02", "val_b",
"prop03", "val_c",
];
T
| extend p = bag_pack(prop, value)
| summarize dict=make_bag(p)
Output
dict |
---|
{ “prop01”: “val_a”, “prop02”: “val_b”, “prop03”: “val_c” } |
Use the bag_unpack() plugin for transforming the bag keys in the make_bag() output into columns.
let T = datatable(prop:string, value:string)
[
"prop01", "val_a",
"prop02", "val_b",
"prop03", "val_c",
];
T
| extend p = bag_pack(prop, value)
| summarize bag=make_bag(p)
| evaluate bag_unpack(bag)
Output
prop01 | prop02 | prop03 |
---|---|---|
val_a | val_b | val_c |
Related content
1.21 - make_list_if() (aggregation function)
Creates a dynamic
array of expr values in the group for which predicate evaluates to true
.
Syntax
make_list_if(
expr,
predicate [,
maxSize])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
predicate | string | ✔️ | A predicate that has to evaluate to true in order for expr to be added to the result. |
maxSize | integer | The maximum number of elements returned. The default and max value is 1048576. |
Returns
Returns a dynamic
array of expr values in the group for which predicate evaluates to true
.
If the input to the summarize
operator isn’t sorted, the order of elements in the resulting array is undefined.
If the input to the summarize
operator is sorted, the order of elements in the resulting array tracks that of the input.
Example
The following example shows a list of names with more than 4 letters.
let T = datatable(name:string, day_of_birth:long)
[
"John", 9,
"Paul", 18,
"George", 25,
"Ringo", 7
];
T
| summarize make_list_if(name, strlen(name) > 4)
Output
list_name |
---|
[“George”, “Ringo”] |
Related content
1.22 - make_list_with_nulls() (aggregation function)
dynamic
JSON object (array) which includes null values.Creates a dynamic
array of all the values of expr in the group, including null values.
Syntax
make_list_with_nulls(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression that to use to create the array. |
Returns
Returns a dynamic
JSON object (array) of all the values of expr in the group, including null values.
If the input to the summarize
operator isn’t sorted, the order of elements in the resulting array is undefined.
If the input to the summarize
operator is sorted, the order of elements in the resulting array tracks that of the input.
Example
The following example shows null values in the results.
let shapes = datatable (name:string , sideCount: int)
[
"triangle", int(null),
"square", 4,
"rectangle", 4,
"pentagon", 5,
"hexagon", 6,
"heptagon", 7,
"octagon", 8,
"nonagon", 9,
"decagon", 10
];
shapes
| summarize mylist = make_list_with_nulls(sideCount)
Output
mylist |
---|
[null,4,4,5,6,7,8,9,10] |
1.23 - make_list() (aggregation function)
Creates a dynamic
array of all the values of expr in the group.
Syntax
make_list(
expr [,
maxSize])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | dynamic | ✔️ | The expression used for the aggregation calculation. |
maxSize | int | The maximum number of elements returned. The default and max value is 1048576. |
Returns
Returns a dynamic
array of all the values of expr in the group.
If the input to the summarize
operator isn’t sorted, the order of elements in the resulting array is undefined.
If the input to the summarize
operator is sorted, the order of elements in the resulting array tracks that of the input.
Examples
The examples in this section show how to use the syntax to help you get started.
One column
The following example uses the datatable, shapes
, to return a list of shapes in a single column.
let shapes = datatable (name: string, sideCount: int)
[
"triangle", 3,
"square", 4,
"rectangle", 4,
"pentagon", 5,
"hexagon", 6,
"heptagon", 7,
"octagon", 8,
"nonagon", 9,
"decagon", 10
];
shapes
| summarize mylist = make_list(name)
Output
mylist |
---|
[“triangle”,“square”,“rectangle”,“pentagon”,“hexagon”,“heptagon”,“octagon”,“nonagon”,“decagon”] |
Using the ‘by’ clause
The following example uses the make_list
function and the by
clause to create two lists of objects grouped by whether they have an even or odd number of sides.
let shapes = datatable (name: string, sideCount: int)
[
"triangle", 3,
"square", 4,
"rectangle", 4,
"pentagon", 5,
"hexagon", 6,
"heptagon", 7,
"octagon", 8,
"nonagon", 9,
"decagon", 10
];
shapes
| summarize mylist = make_list(name) by isEvenSideCount = sideCount % 2 == 0
Output
isEvenSideCount | mylist |
---|---|
false | [“triangle”,“pentagon”,“heptagon”,“nonagon”] |
true | [“square”,“rectangle”,“hexagon”,“octagon”,“decagon”] |
Packing a dynamic object
The following examples show how to pack a dynamic object in a column before making it a list. It returns a column with a boolean table isEvenSideCount
indicating whether the side count is even or odd and a mylist
column that contains lists of packed bags int each category.
let shapes = datatable (name: string, sideCount: int)
[
"triangle", 3,
"square", 4,
"rectangle", 4,
"pentagon", 5,
"hexagon", 6,
"heptagon", 7,
"octagon", 8,
"nonagon", 9,
"decagon", 10
];
shapes
| extend d = bag_pack("name", name, "sideCount", sideCount)
| summarize mylist = make_list(d) by isEvenSideCount = sideCount % 2 == 0
Output
isEvenSideCount | mylist |
---|---|
false | [{“name”:“triangle”,“sideCount”:3},{“name”:“pentagon”,“sideCount”:5},{“name”:“heptagon”,“sideCount”:7},{“name”:“nonagon”,“sideCount”:9}] |
true | [{“name”:“square”,“sideCount”:4},{“name”:“rectangle”,“sideCount”:4},{“name”:“hexagon”,“sideCount”:6},{“name”:“octagon”,“sideCount”:8},{“name”:“decagon”,“sideCount”:10}] |
Related content
1.24 - make_set_if() (aggregation function)
Creates a dynamic
array of the set of distinct values that expr takes in records for which predicate evaluates to true
.
Syntax
make_set_if(
expr,
predicate [,
maxSize])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
predicate | string | ✔️ | A predicate that has to evaluate to true in order for expr to be added to the result. |
maxSize | int | The maximum number of elements returned. The default and max value is 1048576. |
Returns
Returns a dynamic
array of the set of distinct values that expr takes in records for which predicate evaluates to true
. The array’s sort order is undefined.
Example
The following example shows a list of names with more than four letters.
let T = datatable(name:string, day_of_birth:long)
[
"John", 9,
"Paul", 18,
"George", 25,
"Ringo", 7
];
T
| summarize make_set_if(name, strlen(name) > 4)
Output
set_name |
---|
[“George”, “Ringo”] |
Related content
1.25 - make_set() (aggregation function)
Creates a dynamic
array of the set of distinct values that expr takes in the group.
Syntax
make_set(
expr [,
maxSize])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
maxSize | int | The maximum number of elements returned. The default and max value is 1048576. |
Returns
Returns a dynamic
array of the set of distinct values that expr takes in the group.
The array’s sort order is undefined.
Example
Set from a scalar column
The following example shows the set of states grouped with the same amount of crop damage.
StormEvents
| summarize states=make_set(State) by DamageCrops
The results table shown includes only the first 10 rows.
DamageCrops | states |
---|---|
0 | [“NORTH CAROLINA”,“WISCONSIN”,“NEW YORK”,“ALASKA”,“DELAWARE”,“OKLAHOMA”,“INDIANA”,“ILLINOIS”,“MINNESOTA”,“SOUTH DAKOTA”,“TEXAS”,“UTAH”,“COLORADO”,“VERMONT”,“NEW JERSEY”,“VIRGINIA”,“CALIFORNIA”,“PENNSYLVANIA”,“MONTANA”,“WASHINGTON”,“OREGON”,“HAWAII”,“IDAHO”,“PUERTO RICO”,“MICHIGAN”,“FLORIDA”,“WYOMING”,“GULF OF MEXICO”,“NEVADA”,“LOUISIANA”,“TENNESSEE”,“KENTUCKY”,“MISSISSIPPI”,“ALABAMA”,“GEORGIA”,“SOUTH CAROLINA”,“OHIO”,“NEW MEXICO”,“ATLANTIC SOUTH”,“NEW HAMPSHIRE”,“ATLANTIC NORTH”,“NORTH DAKOTA”,“IOWA”,“NEBRASKA”,“WEST VIRGINIA”,“MARYLAND”,“KANSAS”,“MISSOURI”,“ARKANSAS”,“ARIZONA”,“MASSACHUSETTS”,“MAINE”,“CONNECTICUT”,“GUAM”,“HAWAII WATERS”,“AMERICAN SAMOA”,“LAKE HURON”,“DISTRICT OF COLUMBIA”,“RHODE ISLAND”,“LAKE MICHIGAN”,“LAKE SUPERIOR”,“LAKE ST CLAIR”,“LAKE ERIE”,“LAKE ONTARIO”,“E PACIFIC”,“GULF OF ALASKA”] |
30000 | [“TEXAS”,“NEBRASKA”,“IOWA”,“MINNESOTA”,“WISCONSIN”] |
4000000 | [“CALIFORNIA”,“KENTUCKY”,“NORTH DAKOTA”,“WISCONSIN”,“VIRGINIA”] |
3000000 | [“CALIFORNIA”,“ILLINOIS”,“MISSOURI”,“SOUTH CAROLINA”,“NORTH CAROLINA”,“MISSISSIPPI”,“NORTH DAKOTA”,“OHIO”] |
14000000 | [“CALIFORNIA”,“NORTH DAKOTA”] |
400000 | [“CALIFORNIA”,“MISSOURI”,“MISSISSIPPI”,“NEBRASKA”,“WISCONSIN”,“NORTH DAKOTA”] |
50000 | [“CALIFORNIA”,“GEORGIA”,“NEBRASKA”,“TEXAS”,“WEST VIRGINIA”,“KANSAS”,“MISSOURI”,“MISSISSIPPI”,“NEW MEXICO”,“IOWA”,“NORTH DAKOTA”,“OHIO”,“WISCONSIN”,“ILLINOIS”,“MINNESOTA”,“KENTUCKY”] |
18000 | [“WASHINGTON”,“WISCONSIN”] |
107900000 | [“CALIFORNIA”] |
28900000 | [“CALIFORNIA”] |
Set from array column
The following example shows the set of elements in an array.
datatable (Val: int, Arr1: dynamic)
[
1, dynamic(['A1', 'A2', 'A3']),
5, dynamic(['A2', 'C1']),
7, dynamic(['C2', 'A3']),
5, dynamic(['C2', 'A1'])
]
| summarize Val_set=make_set(Val), Arr1_set=make_set(Arr1)
Val_set | Arr1_set |
---|---|
[1,5,7] | [“A1”,“A2”,“A3”,“C1”,“C2”] |
Related content
1.26 - max() (aggregation function)
Finds the maximum value of the expression in the table.
Syntax
max(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression for which the maximum value is determined. |
Returns
Returns the value in the table that maximizes the specified expression.
Example
The following example returns the last record in a table by querying the maximum value for StartTime.
StormEvents
| summarize LatestEvent=max(StartTime)
Output
LatestEvent |
---|
2007-12-31T23:53:00Z |
Related content
1.27 - maxif() (aggregation function)
Calculates the maximum value of expr in records for which predicate evaluates to true
.
See also - max() function, which returns the maximum value across the group without predicate expression.
Syntax
maxif(
expr,
predicate)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
predicate | string | ✔️ | The expression used to filter rows. |
Returns
Returns the maximum value of expr in records for which predicate evaluates to true
.
Example
This example shows the maximum damage for events with no casualties.
StormEvents
| extend Damage=DamageCrops + DamageProperty, Deaths=DeathsDirect + DeathsIndirect
| summarize MaxDamageNoCasualties=maxif(Damage, Deaths == 0) by State
Output
The results table shown includes only the first 10 rows.
– | – |
---|---|
TEXAS | 25000000 |
KANSAS | 37500000 |
IOWA | 15000000 |
ILLINOIS | 5000000 |
MISSOURI | 500005000 |
GEORGIA | 344000000 |
MINNESOTA | 38390000 |
WISCONSIN | 45000000 |
NEBRASKA | 4000000 |
NEW YORK | 26000000 |
… | … |
Related content
1.28 - min() (aggregation function)
Finds the minimum value of the expression in the table.
Syntax
min
(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression for which the minimum value is determined. |
Returns
Returns the minimum value of expr across the table.
Example
This example returns the first record in a table.
StormEvents
| summarize FirstEvent=min(StartTime)
Output
FirstEvent |
---|
2007-01-01T00:00:00Z |
Related content
1.29 - minif() (aggregation function)
Returns the minimum of Expr in records for which Predicate evaluates to true
.
- Can be used only in context of aggregation inside summarize
See also - min() function, which returns the minimum value across the group without predicate expression.
Syntax
minif
(
Expr,
Predicate)
Parameters
Name | Type | Required | Description |
---|---|---|---|
Expr | string | ✔️ | Expression that will be used for aggregation calculation. |
Predicate | string | ✔️ | Expression that will be used to filter rows. |
Returns
The minimum value of Expr in records for which Predicate evaluates to true
.
Example
This example shows the minimum damage for events with casualties (Except 0)
StormEvents
| extend Damage=DamageCrops+DamageProperty, Deaths=DeathsDirect+DeathsIndirect
| summarize MinDamageWithCasualties=minif(Damage,(Deaths >0) and (Damage >0)) by State
| where MinDamageWithCasualties >0 and isnotnull(MinDamageWithCasualties)
Output
The results table shown includes only the first 10 rows.
State | MinDamageWithCasualties |
---|---|
TEXAS | 8000 |
KANSAS | 5000 |
IOWA | 45000 |
ILLINOIS | 100000 |
MISSOURI | 10000 |
GEORGIA | 500000 |
MINNESOTA | 200000 |
WISCONSIN | 10000 |
NEW YORK | 25000 |
NORTH CAROLINA | 15000 |
… | … |
Related content
1.30 - percentile(), percentiles()
The percentile()
function calculates an estimate for the specified nearest-rank percentile of the population defined by expr.
The accuracy depends on the density of population in the region of the percentile.
percentiles()
works similarly to percentile()
. However, percentiles()
can calculate multiple percentile values at once, which is more efficient than calculating each percentile value separately.
To calculate weighted percentiles, see percentilesw().
Syntax
percentile(
expr,
percentile)
percentiles(
expr,
percentiles)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression to use for aggregation calculation. |
percentile | int or long | ✔️ | A constant that specifies the percentile. |
percentiles | int or long | ✔️ | One or more comma-separated percentiles. |
Returns
Returns a table with the estimates for expr of the specified percentiles in the group, each in a separate column.
Examples
Calculate single percentile
The following example shows the value of DamageProperty
being larger than 95% of the sample set and smaller than 5% of the sample set.
StormEvents | summarize percentile(DamageProperty, 95) by State
Output
The results table shown includes only the first 10 rows.
State | percentile_DamageProperty_95 |
---|---|
ATLANTIC SOUTH | 0 |
FLORIDA | 40000 |
GEORGIA | 143333 |
MISSISSIPPI | 80000 |
AMERICAN SAMOA | 250000 |
KENTUCKY | 35000 |
OHIO | 150000 |
KANSAS | 51392 |
MICHIGAN | 49167 |
ALABAMA | 50000 |
Calculate multiple percentiles
The following example shows the value of DamageProperty
simultaneously calculated using 5, 50 (median) and 95.
StormEvents | summarize percentiles(DamageProperty, 5, 50, 95) by State
Output
The results table shown includes only the first 10 rows.
State | percentile_DamageProperty_5 | percentile_DamageProperty_50 | percentile_DamageProperty_95 |
---|---|---|---|
ATLANTIC SOUTH | 0 | 0 | 0 |
FLORIDA | 0 | 0 | 40000 |
GEORGIA | 0 | 0 | 143333 |
MISSISSIPPI | 0 | 0 | 80000 |
AMERICAN SAMOA | 0 | 0 | 250000 |
KENTUCKY | 0 | 0 | 35000 |
OHIO | 0 | 2000 | 150000 |
KANSAS | 0 | 0 | 51392 |
MICHIGAN | 0 | 0 | 49167 |
ALABAMA | 0 | 0 | 50000 |
… | … |
Return percentiles as an array
Instead of returning the values in individual columns, use the percentiles_array()
function to return the percentiles in a single column of dynamic array type.
Syntax
percentiles_array(
expr,
percentiles)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression to use for aggregation calculation. |
percentiles | int, long, or dynamic | ✔️ | One or more comma-separated percentiles or a dynamic array of percentiles. Each percentile can be an integer or long value. |
Returns
Returns an estimate for expr of the specified percentiles in the group as a single column of dynamic array type.
Examples
Comma-separated percentiles
Multiple percentiles can be obtained as an array in a single dynamic column, instead of in multiple columns as with percentiles().
TransformedSensorsData
| summarize percentiles_array(Value, 5, 25, 50, 75, 95), avg(Value) by SensorName
Output
The results table displays only the first 10 rows.
SensorName | percentiles_Value | avg_Value |
---|---|---|
sensor-82 | [“0.048141473520867069”,“0.24407515500271132”,“0.48974511106780577”,“0.74160998970950343”,“0.94587903204190071”] | 0.493950914 |
sensor-130 | [“0.049200214398937764”,“0.25735850440187535”,“0.51206374010048239”,“0.74182335059053839”,“0.95210342463616771”] | 0.505111463 |
sensor-56 | [“0.04857779335488676”,“0.24709868149337144”,“0.49668762923789589”,“0.74458470404241883”,“0.94889104840865857”] | 0.497955018 |
sensor-24 | [“0.051507199150534679”,“0.24803904945640423”,“0.50397070213183581”,“0.75653888126010793”,“0.9518782718727431”] | 0.501084379 |
sensor-47 | [“0.045991246974755672”,“0.24644331118208851”,“0.48089197707088743”,“0.74475142784472248”,“0.9518322864959039”] | 0.49386228 |
sensor-135 | [“0.05132897529660399”,“0.24204987641954018”,“0.48470113942206461”,“0.74275730068433621”,“0.94784079559229406”] | 0.494817619 |
sensor-74 | [“0.048914714739047828”,“0.25160926036445724”,“0.49832498850160978”,“0.75257887767110776”,“0.94932261924236094”] | 0.501627252 |
sensor-173 | [“0.048333149363009836”,“0.26084250046756496”,“0.51288012531934613”,“0.74964772791583412”,“0.95156058795294”] | 0.505401226 |
sensor-28 | [“0.048511161184567046”,“0.2547387968731824”,“0.50101318228599656”,“0.75693845702682039”,“0.95243122486483989”] | 0.502066244 |
sensor-34 | [“0.049980293859462954”,“0.25094722564949412”,“0.50914023067384762”,“0.75571549713447961”,“0.95176564809278674”] | 0.504309494 |
… | … | … |
Dynamic array of percentiles
Percentiles for percentiles_array
can be specified in a dynamic array of integer or floating-point numbers. The array must be constant but doesn’t have to be literal.
TransformedSensorsData
| summarize percentiles_array(Value, dynamic([5, 25, 50, 75, 95])), avg(Value) by SensorName
Output
The results table displays only the first 10 rows.
SensorName | percentiles_Value | avg_Value |
---|---|---|
sensor-82 | [“0.048141473520867069”,“0.24407515500271132”,“0.48974511106780577”,“0.74160998970950343”,“0.94587903204190071”] | 0.493950914 |
sensor-130 | [“0.049200214398937764”,“0.25735850440187535”,“0.51206374010048239”,“0.74182335059053839”,“0.95210342463616771”] | 0.505111463 |
sensor-56 | [“0.04857779335488676”,“0.24709868149337144”,“0.49668762923789589”,“0.74458470404241883”,“0.94889104840865857”] | 0.497955018 |
sensor-24 | [“0.051507199150534679”,“0.24803904945640423”,“0.50397070213183581”,“0.75653888126010793”,“0.9518782718727431”] | 0.501084379 |
sensor-47 | [“0.045991246974755672”,“0.24644331118208851”,“0.48089197707088743”,“0.74475142784472248”,“0.9518322864959039”] | 0.49386228 |
sensor-135 | [“0.05132897529660399”,“0.24204987641954018”,“0.48470113942206461”,“0.74275730068433621”,“0.94784079559229406”] | 0.494817619 |
sensor-74 | [“0.048914714739047828”,“0.25160926036445724”,“0.49832498850160978”,“0.75257887767110776”,“0.94932261924236094”] | 0.501627252 |
sensor-173 | [“0.048333149363009836”,“0.26084250046756496”,“0.51288012531934613”,“0.74964772791583412”,“0.95156058795294”] | 0.505401226 |
sensor-28 | [“0.048511161184567046”,“0.2547387968731824”,“0.50101318228599656”,“0.75693845702682039”,“0.95243122486483989”] | 0.502066244 |
sensor-34 | [“0.049980293859462954”,“0.25094722564949412”,“0.50914023067384762”,“0.75571549713447961”,“0.95176564809278674”] | 0.504309494 |
… | … | … |
Nearest-rank percentile
P-th percentile (0 < P <= 100) of a list of ordered values, sorted in ascending order, is the smallest value in the list. The P percent of the data is less or equal to P-th percentile value (from Wikipedia article on percentiles).
Define 0-th percentiles to be the smallest member of the population.
Estimation error in percentiles
The percentiles aggregate provides an approximate value using T-Digest.
Related content
1.31 - percentilew(), percentilesw()
The percentilew()
function calculates a weighted estimate for the specified nearest-rank percentile of the population defined by expr. percentilesw()
works similarly to percentilew()
. However, percentilesw()
can calculate multiple weighted percentile values at once, which is more efficient than calculating each weighted percentile value separately.
Weighted percentiles calculate percentiles in a dataset by giving each value in the input dataset a weight. In this method, each value is considered to be repeated a number of times equal to its weight, which is then used to calculate the percentile. By giving more importance to certain values, weighted percentiles provide a way to calculate percentiles in a “weighted” manner.
To calculate unweighted percentiles, see percentiles().
Syntax
percentilew(
expr,
weightExpr,
percentile)
percentilesw(
expr,
weightExpr,
percentiles)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression to use for aggregation calculation. |
weightExpr | long | ✔️ | The weight to give each value. |
percentile | int or long | ✔️ | A constant that specifies the percentile. |
percentiles | int or long | ✔️ | One or more comma-separated percentiles. |
Returns
Returns a table with the estimates for expr of the specified percentiles in the group, each in a separate column.
Examples
Calculate weighted percentiles
Assume you repetitively measure the time (Duration) it takes an action to complete. Instead of recording every value of the measurement, you record each value of Duration, rounded to 100 msec, and how many times the rounded value appeared (BucketSize).
Use summarize percentilesw(Duration, BucketSize, ...)
to calculate the given
percentiles in a “weighted” way. Treat each value of Duration as if it was repeated BucketSize times in the input, without actually needing to materialize those records.
The following example shows weighted percentiles.
Using the following set of latency values in milliseconds:
{ 1, 1, 2, 2, 2, 5, 7, 7, 12, 12, 15, 15, 15, 18, 21, 22, 26, 35 }
.
To reduce bandwidth and storage, do pre-aggregation to the
following buckets: { 10, 20, 30, 40, 50, 100 }
. Count the number of events in each bucket to produce the following table:
let latencyTable = datatable (ReqCount:long, LatencyBucket:long)
[
8, 10,
6, 20,
3, 30,
1, 40
];
latencyTable
The table displays:
- Eight events in the 10-ms bucket (corresponding to subset
{ 1, 1, 2, 2, 2, 5, 7, 7 }
) - Six events in the 20-ms bucket (corresponding to subset
{ 12, 12, 15, 15, 15, 18 }
) - Three events in the 30-ms bucket (corresponding to subset
{ 21, 22, 26 }
) - One event in the 40-ms bucket (corresponding to subset
{ 35 }
)
At this point, the original data is no longer available. Only the number of events in each bucket. To compute percentiles from this data, use the percentilesw()
function.
For the 50, 75, and 99.9 percentiles, use the following query:
let latencyTable = datatable (ReqCount:long, LatencyBucket:long)
[
8, 10,
6, 20,
3, 30,
1, 40
];
latencyTable
| summarize percentilesw(LatencyBucket, ReqCount, 50, 75, 99.9)
Output
percentile_LatencyBucket_50 | percentile_LatencyBucket_75 | percentile_LatencyBucket_99_9 |
---|---|---|
20 | 20 | 40 |
Return percentiles as an array
Instead of returning the values in individual columns, use the percentilesw_array()
function to return the percentiles in a single column of dynamic array type.
Syntax
percentilesw_array(
expr,
weightExpr,
percentiles)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression to use for aggregation calculation. |
percentiles | int, long, or dynamic | ✔️ | One or more comma-separated percentiles or a dynamic array of percentiles. Each percentile can be an integer or long value. |
weightExpr | long | ✔️ | The weight to give each value. |
Returns
Returns an estimate for expr of the specified percentiles in the group as a single column of dynamic array type.
Examples
Comma-separated percentiles
let latencyTable = datatable (ReqCount:long, LatencyBucket:long)
[
8, 10,
6, 20,
3, 30,
1, 40
];
latencyTable
| summarize percentilesw_array(LatencyBucket, ReqCount, 50, 75, 99.9)
Output
percentile_LatencyBucket | ||
---|---|---|
[20, 20, 40] |
Dynamic array of percentiles
let latencyTable = datatable (ReqCount:long, LatencyBucket:long)
[
8, 10,
6, 20,
3, 30,
1, 40
];
latencyTable
| summarize percentilesw_array(LatencyBucket, ReqCount, dynamic([50, 75, 99.9]))
Output
percentile_LatencyBucket | ||
---|---|---|
[20, 20, 40] |
Related content
1.32 - stdev() (aggregation function)
Calculates the standard deviation of expr across the group, using Bessel’s correction for a small dataset that is considered a sample.
For a large dataset that is representative of the population, use stdevp() (aggregation function).
Formula
This function uses the following formula.
Syntax
stdev(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the standard deviation aggregation calculation. |
Returns
Returns the standard deviation value of expr across the group.
Example
The following example shows the standard deviation for the group.
range x from 1 to 5 step 1
| summarize make_list(x), stdev(x)
Output
list_x | stdev_x |
---|---|
[ 1, 2, 3, 4, 5] | 1.58113883008419 |
1.33 - stdevif() (aggregation function)
Calculates the standard deviation of expr in records for which predicate evaluates to true
.
Syntax
stdevif(
expr,
predicate)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the standards deviation aggregation calculation. |
predicate | string | ✔️ | The predicate that has to evaluate to true in order for expr to be added to the result. |
Returns
Returns the standard deviation value of expr in records for which predicate evaluates to true
.
Example
The following example shows the standard deviation in a range of 1 to 100.
range x from 1 to 100 step 1
| summarize stdevif(x, x % 2 == 0)
Output
stdevif_x |
---|
29.1547594742265 |
1.34 - stdevp() (aggregation function)
Calculates the standard deviation of expr across the group, considering the group as a population for a large dataset that is representative of the population.
For a small dataset that is a sample, use stdev() (aggregation function).
Formula
This function uses the following formula.
Syntax
stdevp(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the standards deviation aggregation calculation. |
Returns
Returns the standard deviation value of expr across the group.
Example
range x from 1 to 5 step 1
| summarize make_list(x), stdevp(x)
Output
list_x | stdevp_x |
---|---|
[ 1, 2, 3, 4, 5] | 1.4142135623731 |
1.35 - sum() (aggregation function)
Calculates the sum of expr across the group.
Syntax
sum(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr string | ✔️ | The expression used for the aggregation calculation. |
Returns
Returns the sum value of expr across the group.
Example
This example returns the total value of crop and property damages by state, and sorted in descending value.
StormEvents
| summarize EventCount=count(), TotalDamages = sum(DamageCrops+DamageProperty) by State
| sort by TotalDamages
Output
The results table shown includes only the first 10 rows.
| State | Eventcount | TotalDamages | | —- | — | | CALIFORNIA | 898 | 2801954600 | | GEORGIA | 1983 | 1190448750 | | MISSOURI | 2016 | 1096887450 | | OKLAHOMA | 1716 | 916557300 | | MISSISSIPPI | 1218 | 802890160 | | KANSAS | 3166 | 738830000 | | TEXAS | 4701 | 572086700 | | OHIO | 1233 | 417989500 | | FLORIDA | 1042 | 379455260 | | NORTH DAKOTA | 905 | 342460100 | | … | … | … |
1.36 - sumif() (aggregation function)
Calculates the sum of expr in records for which predicate evaluates to true
.
You can also use the sum() function, which sums rows without predicate expression.
Syntax
sumif(
expr,
predicate)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
predicate | string | ✔️ | The expression used to filter rows. If the predicate evaluates to true , the row will be included in the result. |
Returns
Returns the sum of expr for which predicate evaluates to true
.
Example showing the sum of damages based on no casualty count
This example shows the sum total damage for storms without casualties.
StormEvents
| summarize DamageNoCasualties=sumif((DamageCrops+DamageProperty),(DeathsDirect+DeathsIndirect)==0) by State
Output
The results table shown includes only the first 10 rows.
State | DamageNoCasualties |
---|---|
TEXAS | 242638700 |
KANSAS | 407360000 |
IOWA | 135353700 |
ILLINOIS | 120394500 |
MISSOURI | 1096077450 |
GEORGIA | 1077448750 |
MINNESOTA | 230407300 |
WISCONSIN | 241550000 |
NEBRASKA | 70356050 |
NEW YORK | 58054000 |
… | … |
Example showing the sum of birth dates
This example shows the sum of the birth dates for all names that have more than 4 letters.
let T = datatable(name:string, day_of_birth:long)
[
"John", 9,
"Paul", 18,
"George", 25,
"Ringo", 7
];
T
| summarize sumif(day_of_birth, strlen(name) > 4)
Output
sumif_day_of_birth |
---|
32 |
1.37 - take_any() (aggregation function)
Arbitrarily chooses one record for each group in a summarize operator, and returns the value of one or more expressions over each such record.
Syntax
take_any(
expr_1 [,
expr_2 …])
take_any(
*)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr_N | string | ✔️ | The expression used for selecting a record. If the wildcard value (* ) is given in place of an expression, all records will be selected. |
Returns
The take_any
aggregation function returns the values of the expressions calculated
for each of the records selected Indeterministically from each group of the summarize operator.
If the *
argument is provided, the function behaves as if the expressions are all columns
of the input to the summarize operator barring the group-by columns, if any.
Remarks
This function is useful when you want to get a sample value of one or more columns per value of the compound group key.
When the function is provided with a single column reference, it will attempt to return a non-null/non-empty value, if such value is present.
As a result of the indeterministic nature of this function, using this function multiple times in
a single application of the summarize
operator isn’t equivalent to using
this function a single time with multiple expressions. The former may have each application
select a different record, while the latter guarantees that all values are calculated
over a single record (per distinct group).
Examples
Show indeterministic State:
StormEvents
| summarize take_any(State)
Output
State |
---|
ATLANTIC SOUTH |
Show all the details for a random record:
StormEvents
| project StartTime, EpisodeId, State, EventType
| summarize take_any(*)
Output
StartTime | EpisodeId | State | EventType |
---|---|---|---|
2007-09-29 08:11:00.0000000 | 11091 | ATLANTIC SOUTH | Waterspout |
Show all the details of a random record for each State starting with ‘A’:
StormEvents
| where State startswith "A"
| project StartTime, EpisodeId, State, EventType
| summarize take_any(*) by State
Output
State | StartTime | EpisodeId | EventType |
---|---|---|---|
ALASKA | 2007-02-01 00:00:00.0000000 | 1733 | Flood |
ATLANTIC SOUTH | 2007-09-29 08:11:00.0000000 | 11091 | Waterspout |
ATLANTIC NORTH | 2007-11-27 00:00:00.0000000 | 11523 | Marine Thunderstorm Wind |
ARIZONA | 2007-12-01 10:40:00.0000000 | 11955 | Flash Flood |
AMERICAN SAMOA | 2007-12-07 14:00:00.0000000 | 13183 | Flash Flood |
ARKANSAS | 2007-12-09 16:00:00.0000000 | 11319 | Lightning |
ALABAMA | 2007-12-15 18:00:00.0000000 | 12580 | Heavy Rain |
Related content
1.38 - take_anyif() (aggregation function)
Arbitrarily selects one record for each group in a summarize operator in records for which the predicate is ’true’. The function returns the value of an expression over each such record.
This function is useful when you want to get a sample value of one column per value of the compound group key, subject to some predicate that is true. If such a value is present, the function attempts to return a non-null/non-empty value.
Syntax
take_anyif(
expr,
predicate )
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for selecting a record. |
predicate | string | ✔️ | Indicates which records may be considered for evaluation. |
Returns
The take_anyif
aggregation function returns the value of the expression calculated
for each of the records randomly selected from each group of the summarize operator. Only records for which predicate returns ’true’ may be selected. If the predicate doesn’t return ’true’, a null value is produced.
Examples
Pick a random EventType from Storm events, where event description has a key phrase.
StormEvents
| summarize take_anyif(EventType, EventNarrative has 'strong wind')
Output
EventType |
---|
Strong Wind |
Related content
1.39 - tdigest_merge() (aggregation functions)
Merges tdigest results across the group.
For more information about the underlying algorithm (T-Digest) and the estimated error, see estimation error in percentiles.
Syntax
tdigest_merge(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
Returns
Returns the merged tdigest values of expr across the group.
Example
StormEvents
| summarize PreAggDamageProperty=tdigest(DamageProperty) by State
| summarize tdigest_merge(PreAggDamageProperty)
Output
merge_tdigests_PreAggDamageProperty |
---|
[[7],[91,30,73667,966,110000000,24428,2500,20000,16500000,6292,40000,123208,1000000,133091,90583,20000000,977000,20007,547000,19000000,1221,9600000,300000,70072,55940,75000,417500,1410000,20400000,331500,15000000,62000000,50222,121690000,160400,6200000,252500,450,11000000,2200000,5700000,11566,12000000,263,50000,200000,3700000,13286,171000,100000000,28200000,65000000,17709,30693,16000000,7938,5200,2875,1500000,3480000,151100000,9800000,18200000,21600000,199,2570000,30000000,38000000,72000,891250,500000000,26385,80092,27000000,35000000,754500,11500000,3262500,113945,5000,62429,175294,9071,6500000,3321,15159,21850000,300000000,22683,3000,10000000,60055,600000,52000000,496000,15000,50000000,10140000,11900000,2100000,62600000,77125,310667,70000000,101000000,2088,1608571,19182,400000,179833,775000,612000,150000000,13500000,2600000,1250000,65400,45000000,297000,2500000,40000000,24846,30000,59067,1893,15762,142571,220666,195000,2000000,355000,2275000,6000000,46000000,38264,50857,4002,97333,27750,1000,1111429,7043,272500,455200,503,37500000,10000,1489,0,1200000,110538,60000000,250000,10730,1901429,291000,698750,649000,2716667,137000000,6400000,29286,41051,6850000,102000,4602,80000000,250000000,371667,8000000,729,8120000,5000000,20830,152400,803300,349667,202000,207000,81150000,48000000,750000,26000000,8900000,239143,75000000,248000,14342,74857,5992,500000,150000,938000,10533333,45248,105000000,7000000,35030,4000000,2000,7692500,3000000,25000000,4500000,87222,12054,100000,25000,9771,4840000,28000000,1307143,32024],[19,1,3,32,1,14,45,572,1,51,126,41,101,11,12,8,2,14,4,1,27,1,58,42,20,177,6,4,1,12,10,2,9,1,5,1,2,28,3,6,1,23,4,30,610,145,1,21,4,2,1,1,24,13,1,153,5,4,26,5,1,6,1,1,28,1,5,1,11,4,1,13,44,2,4,2,1,4,9,1672,7,17,47,2,39,17,2,1,17,666,16,71,21,3,1,530,10,1,1,2,1,4,6,4,1,20,7,11,40,6,2,1,1,2,1,3,5,2,1,21,2,13,271,3,14,23,7,15,2,41,1,2,7,1,27,7,205,3,4,1403,7,69,4,10,215,1,1472,127,45756,10,13,1,198,17,7,1,12,7,6,1,1,14,7,2,2,17,1,2,3,2,48,5,21,10,5,10,21,4,5,1,2,39,2,2,7,1,1,22,7,60,175,119,3,3,40,1,8,101,15,1135,4,22,3,3,9,76,430,611,12,1,2,7,8]] |
1.40 - tdigest() (aggregation function)
Calculates the intermediate results of percentiles()
across the group.
For more information, see the underlying algorithm (T-Digest) and the estimated error.
Syntax
tdigest(
expr [,
weight])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
weight | string | The weights of the values for the aggregation calculation. |
Returns
The Intermediate results of weighted percentiles of *expr*
across the group.
Examples
Results per state
This example shows the results of the tdigest percentiles sorted by state.
StormEvents
| summarize tdigest(DamageProperty) by State
The results table shown includes only the first 10 rows.
State | tdigest_DamageProperty |
---|---|
NEBRASKA | [[7],[800,250,300000,5000,240000,1500000,20000,550000,0,75000,100000,1000,10000,30000,13000,2000000,1000000,650000,125000,35000,7000,2500000,4000000,450000,85000,460000,500000,6000,150000,350000,4000,72500,1200000,180000,400000,25000,50000,2000,45000,8000,120000,200000,40000,1200,15000,55000,3000,250000],[5,1,3,72,1,1,44,1,1351,12,24,17,46,13,6,1,2,1,2,6,8,1,1,1,2,1,4,2,6,1,2,2,1,1,2,26,18,12,2,2,1,7,6,4,28,4,6,6]] |
MINNESOTA | [[7],[700,500,2000000,2500,1200000,12000000,16000,7000000,0,300000,425000,750,6000,30000,10000,22000000,10000000,9600000,600000,50000,4000,27000000,35000000,4000000,400000,5000000,6000000,3000,750000,2500000,2000,250000,11000000,38000000,3000000,20000,120000,1000,100000,5000,500000,1000000,60000,800,15000,200000,1500,1500000,900000],[1,3,1,3,1,2,1,1,1793,1,1,2,2,2,3,1,1,1,2,2,1,1,1,1,2,1,2,1,1,1,6,1,1,1,3,5,1,5,2,5,2,2,1,2,2,2,2,1,1]] |
KANSAS | [[7],[667,200,6000000,3400,80000,300000,18875,210000,0,45857,750000,37500000,10000,81150000,15000000,6400000,2570000,225000,59400,25000,5000,400000,7000000,4500000,2500000,6500000,200000,4500,70000,122500,2785,12000000,1900000,18200000,150000,1150000,27000000,2000,30000,2000000,250000000,75000,26000,1500,1500000,1000000,2500,100000,21600000,50000,335000,600000,175000,500000,160000,51000,40000,20000,15000,252500,7520,350000,250000,3400000,1000,338000,16000000,106000,4840000,305000,540000,337500,9800000,45000,12500,700000,4000000,71000,30000000,35000,3700000,22000,56000],[12,2,2,5,2,3,8,1,2751,7,2,1,37,1,1,1,1,2,5,12,33,8,1,1,1,2,10,1,5,2,7,1,4,1,5,1,1,9,11,4,1,5,2,6,4,8,2,23,1,44,2,3,2,3,1,1,1,18,5,2,5,1,7,1,25,1,1,3,1,1,1,2,6,1,1,2,1,1,1,3,1,1,1]] |
NEW MEXICO | [[7],[600,500,2500000,7000,1500,28000,40000,10000,0,500000,20000,1000,21000,70000,25000,3500000,200000,16500000,50000,100000,15000,4000,5000,2000],[1,3,1,1,1,1,1,7,466,1,7,4,1,1,2,1,1,1,1,2,1,4,10,8]] |
KENTUCKY | [[7],[600,200,700000,5000,400000,12000,15000,100000,0,60000,80000,1000,9000,20000,10000,50000,30000,300000,120000,25000,7000,3000,500000,11500000,75000,35000,8000,6000,150000,1500000,4000,56000,1911,250000,2500000,18000,45000,2000],[6,2,1,42,1,3,9,8,999,2,1,52,1,21,37,25,7,2,3,14,11,35,1,1,6,10,9,10,4,1,13,1,9,3,1,2,1,37]] |
VIRGINIA | [[7],[536,500,125000,3000,100000,7250,8000,60000,0,40000,50000,956,6000,11500,7000,25000,15000,98000,70000,12000,4000,2000,120000,1000000,45000,16000,5000,3500,75000,175000,2500,30000,1000,80000,300000,10000,20000,1500],[7,11,1,48,2,2,2,1,1025,2,6,9,2,2,1,5,16,1,3,5,12,122,1,1,1,1,64,2,2,1,1,7,209,3,2,42,19,6]] |
OREGON | [[7],[5000,1000,60000,434000,20000,50000,100000,500000,0,1500000,20400000,6000,62600000],[8,2,1,1,1,1,3,1,401,1,1,1,1]] |
ALASKA | [[7],[5000,1000,25000,700000,12060,15000,100000,1600000,0,10000],[5,1,1,1,1,2,1,2,242,1]] |
CONNECTICUT | [[7],[5000,1000,2000000,0,50000,750000,6000],[1,1,1,142,1,1,1]] |
NEVADA | [[7],[5000,1000,200000,1000000,30000,40000,297000,5000000,0,10000],[4,2,1,1,1,1,1,1,148,3]] |
Convert pre-existing centroids
The following example shows how one can convert pre-existing T-Digest centroids for long-term storage.
The V
column holds the value of each centroid, and the W
column is its weight (relative count).
The tdigest()
aggregate function is then applied to convert the data in table DT
into the internal
representation, and percentile_tdigest()
is used to demonstrate how ot find the 50-tile value.
let DT=datatable(V:real, W:long) [
1.0, 1,
2.0, 2
];
DT
| summarize TD=tdigest(V, W)
| project P50=percentile_tdigest(TD, 50)
P50 |
---|
2 |
1.41 - variance() (aggregation function)
Calculates the variance of expr across the group, considering the group as a sample.
The following formula is used:
Syntax
variance(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | real | ✔️ | The expression used for the variance calculation. |
Returns
Returns the variance value of expr across the group.
Example
range x from 1 to 5 step 1
| summarize make_list(x), variance(x)
Output
list_x | variance_x |
---|---|
[ 1, 2, 3, 4, 5] | 2.5 |
1.42 - varianceif() (aggregation function)
Calculates the variance of expr in records for which predicate evaluates to true
.
Syntax
varianceif(
expr,
predicate)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression to use for the variance calculation. |
predicate | string | ✔️ | If predicate evaluates to true , the expr calculated value will be added to the variance. |
Returns
Returns the variance value of expr in records for which predicate evaluates to true
.
Example
range x from 1 to 100 step 1
| summarize varianceif(x, x%2 == 0)
Output
varianceif_x |
---|
850 |
1.43 - variancep() (aggregation function)
Calculates the variance of expr across the group, considering the group as a population.
The following formula is used:
Syntax
variancep(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression to use for the variance calculation. |
Returns
Returns the variance value of expr across the group.
Example
range x from 1 to 5 step 1
| summarize make_list(x), variancep(x)
Output
list_x | variance_x |
---|---|
[ 1, 2, 3, 4, 5] | 2 |
2 - Best practices for KQL queries
2.1 - Best practices for Kusto Query Language queries
Here are several best practices to follow to make your query run faster.
In short
Action | Use | Don’t use | Notes |
---|---|---|---|
Reduce the amount of data being queried | Use mechanisms such as the where operator to reduce the amount of data being processed. | For more information on efficient ways to reduce the amount of data being processed, see Reduce the amount of data being processed. | |
Avoid using redundant qualified references | When referencing local entities, use the unqualified name. | For more information, see Avoid using redundant qualified references. | |
datetime columns | Use the datetime data type. | Don’t use the long data type. | In queries, don’t use Unix time conversion functions, such as unixtime_milliseconds_todatetime() . Instead, use update policies to convert Unix time to the datetime data type during ingestion. |
String operators | Use the has operator. | Don’t use contains | When looking for full tokens, has works better, since it doesn’t look for substrings. |
Case-sensitive operators | Use == . | Don’t use =~ . | Use case-sensitive operators when possible. |
Use in . | Don’t use in~ . | ||
Use contains_cs . | Don’t use contains . | Using has /has_cs is preferred to contains /contains_cs . | |
Searching text | Look in a specific column. | Don’t use * . | * does a full text search across all columns. |
Extract fields from dynamic objects across millions of rows | Materialize your column at ingestion time if most of your queries extract fields from dynamic objects across millions of rows. | With this method you only pay once for column extraction. | |
Lookup for rare keys/values in dynamic objects | Use `MyTable | where DynamicColumn has “Rare value” | where DynamicColumn.SomeKey == “Rare value”`. |
let statement with a value that you use more than once | Use the materialize() function. | For more information on how to use materialize() , see materialize(). For more information, see Optimize queries that use named expressions. | |
Apply type conversions on more than one billion records | Reshape your query to reduce the amount of data fed into the conversion. | Don’t convert large amounts of data if it can be avoided. | |
New queries | Use limit [small number] or count at the end. | Running unbound queries over unknown datasets can yield a return of gigabytes of results, resulting in a slow response and a busy environment. | |
Case-insensitive comparisons | Use Col =~ "lowercasestring" . | Don’t use tolower(Col) == "lowercasestring" . | |
Compare data already in lowercase (or uppercase) | Col == "lowercasestring" (or Col == "UPPERCASESTRING" ). | Avoid using case insensitive comparisons. | |
Filtering on columns | Filter on a table column. | Don’t filter on a calculated column. | |
Use `T | where predicate(Expression)` | Don’t use `T | |
summarize operator | Use the hint.shufflekey=<key> when the group by keys of the summarize operator have high cardinality. | High cardinality is ideally more than one million. | |
join operator | Select the table with the fewest rows as the first one (left-most in query). | ||
Use in instead of left semi join for filtering by a single column. | |||
Join across clusters | Run the query on the “right” side of the join across remote environments, such as clusters or Eventhouses, where most of the data is located. | ||
Join when left side is small and right side is large | Use hint.strategy=broadcast. | Small refers to up to 100 megabytes (MB) of data. | |
Join when right side is small and left side is large | Use the lookup operator instead of the join operator | If the right side of the lookup is larger than several tens of MB, the query fails. | |
Join when both sides are too large | Use hint.shufflekey=<key>. | Use when the join key has high cardinality. | |
Extract values on column with strings sharing the same format or pattern | Use the parse operator. | Don’t use several extract() statements. | For example, values like "Time = <time>, ResourceId = <resourceId>, Duration = <duration>, ...." . |
extract() function | Use when parsed strings don’t all follow the same format or pattern. | Extract the required values by using a REGEX. | |
materialize() function | Push all possible operators that reduce the materialized dataset and still keep the semantics of the query. | For example, filters, or project only required columns. For more information, see Optimize queries that use named expressions. | |
Use materialized views | Use materialized views for storing commonly used aggregations. Prefer using the materialized_view() function to query materialized part only. | materialized_view('MV') |
Reduce the amount of data being processed
A query’s performance depends directly on the amount of data it needs to process. The less data is processed, the quicker the query (and the fewer resources it consumes). Therefore, the most important best-practice is to structure the query in such a way that reduces the amount of data being processed.
In order of importance:
Only reference tables whose data is needed by the query. For example, when using the
union
operator with wildcard table references, it’s better from a performance point-of-view to only reference a handful of tables, instead of using a wildcard (*
) to reference all tables and then filter data out using a predicate on the source table name.Take advantage of a table’s data scope if the query is relevant only for a specific scope. The table() function provides an efficient way to eliminate data by scoping it according to the caching policy (the DataScope parameter).
Apply the
where
query operator immediately following table references.When using the
where
query operator, the order in which you place the predicates, whether you use a singlewhere
operator, or multiple consecutivewhere
operators, can have a significant effect on the query performance.Apply predicates that act upon
datetime
table columns first. Kusto includes an efficient index on such columns, often completely eliminating whole data shards without needing to access those shards.Then apply predicates that act upon
string
anddynamic
columns, especially such predicates that apply at the term-level. Order the predicates by the selectivity. For example, searching for a user ID when there are millions of users is highly selective and usually involves a term search, for which the index is very efficient.Then apply predicates that are selective and are based on numeric columns.
Last, for queries that scan a table column’s data (for example, for predicates such as
contains
"@!@!"
, that have no terms and don’t benefit from indexing), order the predicates such that the ones that scan columns with less data are first. Doing so reduces the need to decompress and scan large columns.
Avoid using redundant qualified references
Reference entities such as tables and materialized views by name.
For example, the table T
can be referenced as simply T
(the unqualified name), or by using a database qualifier (for example, database("DB").T
when the table is in a database called DB
), or by using a fully qualified name (for example, cluster("<serviceURL>").database("DB").T
).
For example, the table T
can be referenced as simply T
(the unqualified name), or by using a database qualifier (for example, database("DB").T
when the table is in a database called DB
), or by using a fully qualified name (for example, cluster("X.Y.kusto.windows.net").database("DB").T
).
It’s a best practice to avoid using name qualifications when they’re redundant, for the following reasons:
Unqualified names are easier to identify (for a human reader) as belonging to the database-in-scope.
Referencing database-in-scope entities is always at least as fast, and in some cases much faster, then entities that belong to other databases. This is especially true when those databases are in a different cluster. This is especially true when those databases are in a different Eventhouse. Avoiding qualified names helps the reader to do the right thing.
2.2 - Named expressions
This article discusses how to optimize repeat use of named expressions in a query.
In Kusto Query Language, you can bind names to complex expressions in several different ways:
- In a let statement
- In the as operator
- In the formal parameters list of user-defined functions
When you reference these named expressions in a query, the following steps occur:
- The calculation within the named expression is evaluated. This calculation produces either a scalar or tabular value.
- The named expression is replaced with the calculated value.
If the same bound name is used multiple times, then the underlying calculation will be repeated multiple times. When is this a concern?
- When the calculations consume many resources and are used many times.
- When the calculation is non-deterministic, but the query assumes all invocations to return the same value.
Mitigation
To mitigate these concerns, you can materialize the calculation results in memory during the query. Depending on the way the named calculation is defined, you’ll use different materialization strategies:
Tabular functions
Use the following strategies for tabular functions:
- let statements and function parameters: Use the materialize() function.
- as operator: Set the
hint.materialized
hint value totrue
.
For example, the following query uses the non-deterministic tabular sample operator:
Behavior without using the materialize function
range x from 1 to 100 step 1
| sample 1
| as T
| union T
Output
x |
---|
63 |
92 |
Behavior using the materialize function
range x from 1 to 100 step 1
| sample 1
| as hint.materialized=true T
| union T
Output
x |
---|
95 |
95 |
Scalar functions
Non-deterministic scalar functions can be forced to calculate exactly once by using toscalar().
For example, the following query uses the non-deterministic function, rand():
let x = () {rand(1000)};
let y = () {toscalar(rand(1000))};
print x, x, y, y
Output
print_0 | print_1 | print_2 | print_3 |
---|---|---|---|
166 | 137 | 70 | 70 |
Related content
3 - Data types
3.1 - Null values
All scalar data types in Kusto have a special value that represents a missing value. This value is called the null value, or null.
Null literals
The null value of a scalar type T is represented in the query language by the null literal T(null)
.
The following query returns a single row full of null values:
print bool(null), datetime(null), dynamic(null), guid(null), int(null), long(null), real(null), double(null), timespan(null)
Predicates on null values
The scalar function isnull()
can be used to determine if a scalar value
is the null value. The corresponding function isnotnull()
can be used
to determine if a scalar value isn’t the null value.
Equality and inequality of null values
- Equality (
==
): Applying the equality operator to two null values yieldsbool(null)
. Applying the equality operator to a null value and a non-null value yieldsbool(false)
. - Inequality (
!=
): Applying the inequality operator to two null values yieldsbool(null)
. Applying the inequality operator to a null value and a non-null value yieldsbool(true)
.
For example:
datatable(val:int)[5, int(null)]
| extend IsBiggerThan3 = val > 3
| extend IsBiggerThan3OrNull = val > 3 or isnull(val)
| extend IsEqualToNull = val == int(null)
| extend IsNotEqualToNull = val != int(null)
Output
val | IsBiggerThan3 | IsBiggerThan3OrNull | IsEqualToNull | IsNotEqualToNull |
---|---|---|---|---|
5 | true | true | false | true |
null | null | true | null | null |
Null values and aggregation functions
When applying the following operators to entities that include null values, the null values are ignored and don’t factor into the calculation:
- dcount()
- dcountif()
- make_bag()
- make_bag_if()
- make_list()
- make_list_if()
- make_set()
- make_set_if()
- stdev()
- stdevif()
- sum()
- sumif()
- variance()
- varianceif()
Null values and the where
operator
The where operator use Boolean expressions to determine
if to emit each input record to the output. This operator treats null values as if
they’re bool(false)
. Records for which the predicate returns the null value are dropped and don’t appear in the output.
For example:
datatable(ival:int, sval:string)[5, "a", int(null), "b"]
| where ival != 5
Output
ival | sval |
---|---|
null | b |
Null values and binary operators
Binary operators are scalar operators that accept two scalar values and produce a third value. For example, greater-than (>) and Boolean AND (&&) are binary operators.
For all binary operators, except as noted in Exceptions to this rule, the rule is as follows:
If one or both of the values input to the binary operator are null values, then the output of the binary operator is also the null value. In other words, the null value is “sticky”.
Exceptions to this rule
- For the equality (
==
) and inequality (!=
) operators, if one of the values is null and the other value isn’t null, then the result is eitherbool(false)
orbool(true)
, respectively. - For the logical AND (&&) operator, if one of
the values is
bool(false)
, the result is alsobool(false)
. - For the logical OR (
||
) operator, if one of the values isbool(true)
, the result is alsobool(true)
.
For example:
datatable(val:int)[5, int(null)]
| extend Add = val + 10
| extend Multiply = val * 10
Output
val | Add | Multiply |
---|---|---|
5 | 15 | 50 |
null | null | null |
Null values and the logical NOT (!
) operator
The logical NOT operator not() yields the value bool(null)
if the argument is the null value.
Null values and the in
operator
- The in operator behaves like a logical OR of equality comparisons.
- The
!in
operator behaves like a logicalAND
of inequality comparisons.
Null values and data ingestion
For most data types, a missing value in the data source produces a null value in the corresponding table cell. However, columns of type string
and CSV (or CSV-like) data formats are an exception to this rule, and a missing value produces an empty string.
For example:
.create table T(a:string, b:int)
.ingest inline into table T
[,]
[ , ]
[a,1]
T
| project a, b, isnull_a=isnull(a), isempty_a=isempty(a), stlen_a=strlen(a), isnull_b=isnull(b)
Output
a | b | isnull_a | isempty_a | strlen_a | isnull_b |
---|---|---|---|---|---|
false | true | 0 | true | ||
false | false | 1 | true | ||
a | 1 | false | false | 1 | false |
3.2 - Scalar data types
Every data value, like the value of an expression or a function parameter, has a data type which is either a scalar data type or a user-defined record. A scalar data type is one of the built-in predefined types in Supported data types. A user-defined record is an ordered sequence of name and scalar-data-type pairs, like the data type of a row in a table.
As in most languages, the data type determines what calculations and manipulations can be run against a value. For example, if you have a value that is of type string, you won’t be able to perform arithmetic calculations against it.
Supported data types
In Kusto Query Language, most of the data types follow standard conventions and have names you’ve probably seen before. The following table shows the full list:
Type | Description |
---|---|
bool (boolean ) | true (1 ) or false (0 ). |
datetime (date ) | An instant in time, typically expressed as a date and time of day. |
decimal | A 128-bit wide, decimal number. |
dynamic | An array, a property bag, or a value of any of the other scalar data types. |
guid (uuid , uniqueid ) | A 128-bit globally unique value. |
int | A signed, 32-bit wide, integer. |
long | A signed, 64-bit wide, integer. |
real (double ) | A 64-bit wide, double-precision, floating-point number. |
string | A sequence of zero or more Unicode characters. |
timespan (time ) | A time interval. |
While most of the data types are standard, you might be less familiar with types like dynamic or timespan, and guid.
Dynamic has a structure similar to JSON, but with one key difference: It can store Kusto Query Language-specific data types that traditional JSON can’t, such as a nested dynamic value, or timespan.
Timespan is a data type that refers to a measure of time such as hours, days, or seconds. Don’t confuse timespan with datetime, which evaluates to an actual date and time, not a measure of time. The following table shows a list of timespan suffixes.
GUID is a datatype representing a 128-bit, globally unique identifier, which follows the standard format of [8]-[4]-[4]-[4]-[12], where each [number] represents the number of characters and each character can range from 0-9 or a-f.
Null values
All nonstring data types can be null. When a value is null, it indicates an absence or mismatch of data. For example, if you try to input the string abc
into an integer column, it results in the null value. To check if an expression is null, use the isnull() function.
For more information, see Null values.
3.3 - The bool data type
The bool
data type can be: true
(1
), false
(0
), or null.
bool
literals
To specify a bool literal, use one of the following syntax options:
Syntax | Description | |
---|---|---|
true or bool(true) | Represents trueness. | |
false or bool(false) | Represents falsehood. | |
bool(null) | Represents the null value. |
Boolean operators
The bool
data type supports all of the logical operators: equality (==
), inequality (!=
), logical-and (and
), and logical-or (or
).
Related content
3.4 - The datetime data type
The datetime
data type represents an instant in time, typically expressed as a date and time of day.
Values range from 00:00:00 (midnight), January 1, 0001 Anno Domini (Common Era) through 11:59:59 P.M., December 31, 9999 A.D. (C.E.) in the Gregorian calendar.
Time values are measured in 100-nanosecond units called ticks, and a particular date is the number of ticks since 12:00 midnight, January 1, 0001 A.D. (C.E.) in the GregorianCalendar calendar (excluding ticks that would be added by leap seconds). For example, a ticks value of 31241376000000000 represents the date, Friday, January 01, 0100 12:00:00 midnight. This is sometimes called “a moment in linear time”.
datetime
literals
To specify a datetime
literal, use one of the following syntax options:
Syntax | Description | Example |
---|---|---|
datetime( year. month. day hour: minute: second. milliseconds) | A date and time in UTC format. | datetime(2015-12-31 23:59:59.9) |
datetime( year. month. day) | A date in UTC format. | datetime(2015-12-31) |
datetime() | Returns the current time. | |
datetime(null) | Represents the null value. |
The now()
and ago()
special functions
Kusto provides two special functions, now() and ago(), to allow queries to reference the time at which the query starts execution.
Supported formats
There are several formats for datetime
that are supported as datetime() literals
and the todatetime() function.
ISO 8601
Format | Example |
---|---|
%Y-%m-%dT%H:%M:%s%z | 2014-05-25T08:20:03.123456Z |
%Y-%m-%dT%H:%M:%s | 2014-05-25T08:20:03.123456 |
%Y-%m-%dT%H:%M | 2014-05-25T08:20 |
%Y-%m-%d %H:%M:%s%z | 2014-11-08 15:55:55.123456Z |
%Y-%m-%d %H:%M:%s | 2014-11-08 15:55:55 |
%Y-%m-%d %H:%M | 2014-11-08 15:55 |
%Y-%m-%d | 2014-11-08 |
RFC 822
Format | Example |
---|---|
%w, %e %b %r %H:%M:%s %Z | Sat, 8 Nov 14 15:05:02 GMT |
%w, %e %b %r %H:%M:%s | Sat, 8 Nov 14 15:05:02 |
%w, %e %b %r %H:%M | Sat, 8 Nov 14 15:05 |
%w, %e %b %r %H:%M %Z | Sat, 8 Nov 14 15:05 GMT |
%e %b %r %H:%M:%s %Z | 8 Nov 14 15:05:02 GMT |
%e %b %r %H:%M:%s | 8 Nov 14 15:05:02 |
%e %b %r %H:%M | 8 Nov 14 15:05 |
%e %b %r %H:%M %Z | 8 Nov 14 15:05 GMT |
RFC 850
Format | Example |
---|---|
%w, %e-%b-%r %H:%M:%s %Z | Saturday, 08-Nov-14 15:05:02 GMT |
%w, %e-%b-%r %H:%M:%s | Saturday, 08-Nov-14 15:05:02 |
%w, %e-%b-%r %H:%M %Z | Saturday, 08-Nov-14 15:05 GMT |
%w, %e-%b-%r %H:%M | Saturday, 08-Nov-14 15:05 |
%e-%b-%r %H:%M:%s %Z | 08-Nov-14 15:05:02 GMT |
%e-%b-%r %H:%M:%s | 08-Nov-14 15:05:02 |
%e-%b-%r %H:%M %Z | 08-Nov-14 15:05 GMT |
%e-%b-%r %H:%M | 08-Nov-14 15:05 |
Sortable
Format | Example |
---|---|
%Y-%n-%e %H:%M:%s | 2014-11-08 15:05:25 |
%Y-%n-%e %H:%M:%s %Z | 2014-11-08 15:05:25 GMT |
%Y-%n-%e %H:%M | 2014-11-08 15:05 |
%Y-%n-%e %H:%M %Z | 2014-11-08 15:05 GMT |
%Y-%n-%eT%H:%M:%s | 2014-11-08T15:05:25 |
%Y-%n-%eT%H:%M:%s %Z | 2014-11-08T15:05:25 GMT |
%Y-%n-%eT%H:%M | 2014-11-08T15:05 |
%Y-%n-%eT%H:%M %Z | 2014-11-08T15:05 GMT |
Related content
3.5 - The decimal data type
The decimal
data type represents a 128-bit wide, decimal number.
decimal
literals
To specify a decimal
literal, use one of the following syntax options:
|Syntax|Description|Example|
|–|–|
|decimal(
number)
|A decimal number represented by one or more digits, followed by a decimal point, and then one or more digits.|decimal(1.0)
|
|decimal(
numbere
exponent)
|A decimal number represented by scientific notation.|decimal(1e5)
is equivalent to 100,000|
|decimal(null)
|Represents the null value.||
Related content
3.6 - The dynamic data type
The dynamic
scalar data type can be any of the following values:
- An array of
dynamic
values, holding zero or more values with zero-based indexing. - A property bag that maps unique
string
values todynamic
values. The property bag has zero or more such mappings (called “slots”), indexed by the uniquestring
values. The slots are unordered. - A value of any of the primitive scalar data types:
bool
,datetime
,guid
,int
,long
,real
,string
, andtimespan
. - Null. For more information, see Null values.
Dynamic literals
To specify a dynamic
literal, use one of the following syntax options:
Syntax | Description | Example |
---|---|---|
dynamic([ value [, …]]) | An array of dynamic or other scalar literals. | dynamic([1, 2, "hello"]) |
dynamic({ key = value [, …]}) | A property bag, or object. The value for a key can be a nested property bag. | dynamic({"a":1, "b":{"a":2}}) |
dynamic( value) | A dynamic value holding the value of the inner scalar data type. | dynamic(4) |
dynamic(null) | Represents the null value. |
Dynamic object accessors
To subscript a dictionary, use either the dot notation (dict.key
) or the brackets notation (dict["key"]
). When the subscript is a string constant, both options are equivalent.
In the examples below dict
and arr
are columns of dynamic type:
Expression | Accessor expression type | Meaning | Comments |
---|---|---|---|
dict[col] | Entity name (column) | Subscripts a dictionary using the values of the column col as the key | Column must be of type string |
arr[index] | Entity index (column) | Subscripts an array using the values of the column index as the index | Column must be of type integer or boolean |
arr[-index] | Entity index (column) | Retrieves the ‘index’-th value from the end of the array | Column must be of type integer or boolean |
arr[(-1)] | Entity index | Retrieves the last value in the array | |
arr[toint(indexAsString)] | Function call | Casts the values of column indexAsString to int and use them to subscript an array | |
dict[[‘where’]] | Keyword used as entity name (column) | Subscripts a dictionary using the values of column where as the key | Entity names that are identical to some query language keywords must be quoted |
dict.[‘where’] or dict[‘where’] | Constant | Subscripts a dictionary using where string as the key |
Accessing a sub-object of a dynamic
value yields another dynamic
value, even if the sub-object has a different underlying type. Use the gettype
function to discover the actual underlying type of the value, and any of the cast function listed below to cast it to the actual type.
Casting dynamic objects
After subscripting a dynamic object, you must cast the value to a simple type.
Expression | Value | Type |
---|---|---|
X | parse_json(’[100,101,102]’) | array |
X[0] | parse_json(‘100’) | dynamic |
toint(X[1]) | 101 | int |
Y | parse_json(’{“a1”:100, “a b c”:“2015-01-01”}’) | dictionary |
Y.a1 | parse_json(‘100’) | dynamic |
Y[“a b c”] | parse_json(“2015-01-01”) | dynamic |
todate(Y[“a b c”]) | datetime(2015-01-01) | datetime |
Cast functions are:
tolong()
todouble()
todatetime()
totimespan()
tostring()
toguid()
parse_json()
Building dynamic objects
Several functions enable you to create new dynamic
objects:
- bag_pack() creates a property bag from name/value pairs.
- pack_array() creates an array from list of values (can be list of columns, for each row it will create an array from the specified columns).
- range() creates an array with an arithmetic series of numbers.
- zip() pairs “parallel” values from two arrays into a single array.
- repeat() creates an array with a repeated value.
Additionally, there are several aggregate functions which create dynamic
arrays to hold aggregated values:
- buildschema() returns the aggregate schema of multiple
dynamic
values. - make_bag() returns a property bag of dynamic values within the group.
- make_bag_if() returns a property bag of dynamic values within the group (with a predicate).
- make_list() returns an array holding all values, in sequence.
- make_list_if() returns an array holding all values, in sequence (with a predicate).
- make_list_with_nulls() returns an array holding all values, in sequence, including null values.
- make_set() returns an array holding all unique values.
- make_set_if() returns an array holding all unique values (with a predicate).
Operators and functions over dynamic types
For a complete list of scalar dynamic/array functions, see dynamic/array functions.
Operator or function | Usage with dynamic data types |
---|---|
value in array | True if there’s an element of array that == valuewhere City in ('London', 'Paris', 'Rome') |
value !in array | True if there’s no element of array that == value |
array_length( array) | Null if it isn’t an array |
bag_has_key( bag, key) | Checks whether a dynamic bag column contains a given key. |
bag_keys( bag) | Enumerates all the root keys in a dynamic property-bag object. |
bag_merge( bag1,…,bagN) | Merges dynamic property-bags into a dynamic property-bag with all properties merged. |
bag_set_key( bag,key,value) | Sets a given key to a given value in a dynamic property-bag. |
extract_json (path,object), extract_json( path,object) | Use path to navigate into object. |
parse_json( source) | Turns a JSON string into a dynamic object. |
range( from,to,step) | An array of values. |
mv-expand listColumn | Replicates a row for each value in a list in a specified cell. |
summarize buildschema( column) | Infers the type schema from column content. |
summarize make_bag( column) | Merges the property bag (dictionary) values in the column into one property bag, without key duplication. |
summarize make_bag_if( column,predicate) | Merges the property bag (dictionary) values in the column into one property bag, without key duplication (with predicate). |
summarize make_list( column) | Flattens groups of rows and puts the values of the column in an array. |
summarize make_list_if( column,predicate) | Flattens groups of rows and puts the values of the column in an array (with predicate). |
summarize make_list_with_nulls( column) | Flattens groups of rows and puts the values of the column in an array, including null values. |
summarize make_set( column) | Flattens groups of rows and puts the values of the column in an array, without duplication. |
Indexing for dynamic data
Every field is indexed during data ingestion. The scope of the index is a single data shard.
To index dynamic columns, the ingestion process enumerates all “atomic” elements within the dynamic value (property names, values, array elements) and forwards them to the index builder. Otherwise, dynamic fields have the same inverted term index as string fields.
Examples
Dynamic property bag
The following query creates a dynamic property bag.
print o=dynamic({"a":123, "b":"hello", "c":[1,2,3], "d":{}})
| extend a=o.a, b=o.b, c=o.c, d=o.d
For convenience, dynamic
literals that appear in the query text itself may also include other Kusto literals with types: datetime
, timespan
, real
, long
, guid
, bool
, and dynamic
.
This extension over JSON isn’t available when parsing strings (such as when using the parse_json
function or when ingesting data), but it enables you to do the following:
print d=dynamic({"a": datetime(1970-05-11)})
To parse a string
value that follows the JSON encoding rules into a dynamic
value, use the parse_json
function. For example:
parse_json('[43, 21, 65]')
- an array of numbersparse_json('{"name":"Alan", "age":21, "address":{"street":432,"postcode":"JLK32P"}}')
- a dictionaryparse_json('21')
- a single value of dynamic type containing a numberparse_json('"21"')
- a single value of dynamic type containing a stringparse_json('{"a":123, "b":"hello", "c":[1,2,3], "d":{}}')
- gives the same value aso
in the example above.
Ingest data into dynamic columns
The following example shows how you can define a table that holds a dynamic
column (as well as a datetime
column) and then ingest single record into it. It also demonstrates how you can encode JSON strings in CSV files.
// dynamic is just like any other type:
.create table Logs (Timestamp:datetime, Trace:dynamic)
// Everything between the "[" and "]" is parsed as a CSV line would be:
// 1. Since the JSON string includes double-quotes and commas (two characters
// that have a special meaning in CSV), we must CSV-quote the entire second field.
// 2. CSV-quoting means adding double-quotes (") at the immediate beginning and end
// of the field (no spaces allowed before the first double-quote or after the second
// double-quote!)
// 3. CSV-quoting also means doubling-up every instance of a double-quotes within
// the contents.
.ingest inline into table Logs
[2015-01-01,"{""EventType"":""Demo"", ""EventValue"":""Double-quote love!""}"]
Output
Timestamp | Trace |
---|---|
2015-01-01 00:00:00.0000000 | {“EventType”:“Demo”,“EventValue”:“Double-quote love!”} |
Related content
- For an example on how to query using dynamic objects and object accessors, see Map values from one set to another.
3.7 - The guid data type
The guid
data type represents a 128-bit globally unique value.
guid
literals
To specify a guid
literal, use one of the following syntax options:
Syntax | Description | Example |
---|---|---|
guid( id) | A guid ID string. | guid(74be27de-1e4e-49d9-b579-fe0b331d3642) |
guid(null) | Represents the null value. |
Related content
3.8 - The int data type
The int
data type represents a signed, 32-bit wide, integer.
int
literals
To specify an int
literal, use one of the following syntax options:
|Syntax|Description|Example|
|–|–|
|int(
number)
|A positive integer.|int(2)
|
|int(-
number)
|A negative integer.|int(-2)
|
|int(null)
|Represents the null value.||
Related content
3.9 - The long data type
The long
data type represents a signed, 64-bit wide, integer.
By default, integers and integers represented with hexadecimal syntax are of type long
.
long
literals
To specify a long
literal, use one of the following syntax options:
|Syntax|Description|Example|
|–|–|
|number|An integer. You don’t need to wrap the integer with long()
because integers are by default of type long
.|12
|
|0x
hex|An integer represented with hexadecimal syntax.|0xf
is equivalent to 15|
|long(-
number)
|A negative integer.|long(-1)
|
|long(null)
|Represents the null value.||
Related content
- tolong()
- To convert the
long
type into a hex string, see tohex() function.
3.10 - The real data type
The real
data type represents a 64-bit wide, double-precision, floating-point number.
By default, decimal numbers and numbers represented with scientific notation are of type real
.
real
literals
To specify a real
literal, use one of the following syntax options:
Syntax | Description | Example |
---|---|---|
number | A real number represented by one or more digits, followed by a decimal point, and then one or more digits. | 1.0 |
numbere exponent | A real number represented by scientific notation. | 1e5 |
real(null) | Represents the null value. | |
real(nan) | Not-a-number (NaN), such as when dividing a 0.0 by another 0.0 . | |
real(+inf) | Positive infinity, such as when dividing 1.0 by 0.0 . | |
real(-inf) | Negative infinity, such as when dividing -1.0 by 0.0 . |
Related content
3.11 - The string data type
The string
data type represents a sequence of zero or more Unicode
characters.
For information on string query operators, see String operators.
string
literals
A string literal is a string enclosed in quotes. You can use double quotes or single quotes to encode string literals in query text. With double quotes, you must escape nested double quote characters with a backslash (\
). With single quotes, you must escape nested single quote characters, and you don’t need to escape double quotes.
Use the backslash character to escape the enclosing quote characters, tab characters (\t
), newline characters (\n
), and the backslash itself (\\
).
Verbatim string literals
Verbatim string literals are string literals prepended with the @
character, which serves as a verbatim identifier. In this form, the backslash character (\
) stands for itself and isn’t an escape character. In verbatim string literals, double quotes are escaped with double quotes and single quotes are escaped with single quotes.
For an example, see Verbatim string.
Multi-line string literals
Indicate a multi-line string literals by a “triple-backtick chord” (```) at the beginning and end of the literal.
For an example, see Multi-line string literal.
Concatenation of separated string literals
In a Kusto query, when two or more adjacent string literals have no separation between them, they’re automatically combined to form a new string literal. Similarly, if the string literals are separated only by whitespace or comments, they’re also combined to form a new string literal.
For an example, see Concatenated string literals.
Obfuscated string literals
Queries are stored for telemetry and analysis. To safeguard sensitive information like passwords and secrets, you can mark a string as an obfuscated string literal. These marked strings are logged in obfuscated form replaced with asterisks (*
) in the query text.
An obfuscated string literal is created by prepending an h
or an H
character in front of a standard or verbatim string literal.
For an example, see Obfuscated string literal.
Examples
String literal with quotes
The following example demonstrates how to use quotes within string literals encompassed by single quotes and double quotes. For more information, see String literals.
print
s1 = 'string with "double quotes"',
s2 = "string with 'single quotes'"
Output
s1 | s2 |
---|---|
string with “double quotes” | string with ‘single quotes’ |
String literal with backslash escaping
The following example creates a regular expression pattern using backslashes to escape special characters. For more information, see String literals.
print pattern = '\\n.*(>|\'|=|\")[a-zA-Z0-9/+]{86}=='
Output
pattern |
---|
\n.*(>|’|=|")[a-zA-Z0-9/+]{86}== |
String literal with Unicode
The following example shows that a backslash is needed to include a Unicode character in a string literal.
print space = "Hello\u00A0World"
Output
space |
---|
Hello World |
Verbatim string literal
The following example creates a path in which the backslashes are part of the path instead of escape characters. To do this, the string @
sign is prepended to the string, creating a verbatim string literal.
print myPath = @'C:\Folder\filename.txt'
Output
myPath |
---|
C:\Folder\filename.txt |
Multi-line string literal
The following example shows the syntax for a multi-line string literal, which uses newlines and tabs to style a code block. For more information, see Multi-line string literals.
print program = ```
public class Program {
public static void Main() {
System.Console.WriteLine("Hello!");
}
}```
Output
program |
---|
public class Program { public static void Main() { System.Console.WriteLine(“Hello!”); } } |
Concatenated string literals
The following expressions all yield a string of length 13. For more information, see Concatenation of separated string literals.
print
none = strlen("Hello"', '@"world!"),
whitespace = strlen("Hello" ', ' @"world!"),
whitespaceAndComment = strlen("Hello"
// Comment
', '@"world!"
);
Output
none | whitespace | whitespaceAndComment |
---|---|---|
13 | 13 | 13 |
Obfuscated string literal
In the following query output, the h
string is visible in your results. However, in tracing or telemetry, the h
string is stored in an obfuscated form and substituted with asterisks in the log. For more information, see Obfuscated string literals.
print blob="https://contoso.blob.core.windows.net/container/blob.txt?"
h'sv=2012-02-12&se=2013-04-13T0...'
Output
blob |
---|
https://contoso.blob.core.windows.net/container/blob.txt?sv=2012-02-12&se=2013-04-13T0… |
Related content
3.12 - The timespan data type
The timespan
data type represents a time interval.
timespan
literals
To specify a timespan
literal, use one of the following syntax options:
Syntax | Description | Example | Length of time |
---|---|---|---|
nd | A time interval represented by one or more digits followed by d for days. | 2d | 2 days |
nh | A time interval represented by one or more digits followed by h for hours. | 1.5h | 1.5 hours |
nm | A time interval represented by one or more digits followed by m for minutes. | 30m | 30 minutes |
ns | A time interval represented by one or more digits followed by s for seconds. | 10s | 10 seconds |
nms | A time interval represented by one or more digits followed by ms for milliseconds. | 100ms | 100 milliseconds |
nmicrosecond | A time interval represented by one or more digits followed by microsecond . | 10microsecond | 10 microseconds |
ntick | A time interval represented by one or more digits followed by tick to indicate nanoseconds. | 1tick | 100 ns |
timespan( n seconds) | A time interval in seconds. | timespan(15 seconds) | 15 seconds |
timespan( n) | A time interval in days. | timespan(2) | 2 days |
timespan( days. hours: minutes: seconds. milliseconds) | A time interval in days, hours, minutes, and seconds passed. | timespan(0.12:34:56.7) | 0d+12h+34m+56.7s |
timespan(null) | Represents the null value. |
timespan
operators
Two values of type timespan
may be added, subtracted, and divided.
The last operation returns a value of type real
representing the
fractional number of times one value can fit the other.
Examples
The following example calculates how many seconds are in a day in several ways:
print
result1 = 1d / 1s,
result2 = time(1d) / time(1s),
result3 = 24 * 60 * time(00:01:00) / time(1s)
This example converts the number of seconds in a day (represented by an integer value) to a timespan unit:
print
seconds = 86400
| extend t = seconds * 1s
Related content
4 - Entities
4.1 - Columns
Columns are named entities that have a scalar data type. Columns are referenced in the query relative to the tabular data stream that is in context of the specific operator referencing them.Every table in Kusto, and every tabular data stream, is a rectangular grid of columns and rows. The columns of a table or a tabular data stream are ordered, so a column also has a specific position in the table’s collection of columns.
Reference columns in queries
In queries, columns are generally referenced by name only. They can only appear in expressions, and the query operator under which the expression appears determines the table or tabular data stream. The column’s name doesn’t need to be scoped further.
For example, in the following query we have an unnamed tabular data stream that is defined through the datatable operator and has a single column, c
. The tabular data stream is filtered by a predicate on the value of that column, and produces a new unnamed tabular data stream with the same columns but fewer rows. The as operator then names the tabular data stream, and its value is returned as the results of the query. Notice how column c
is referenced by name without referencing its container:
datatable (c:int) [int(-1), 0, 1, 2, 3]
| where c*c >= 2
| as Result
Related content
4.2 - Databases
Databases are named entities that hold tables and stored functions. Kusto follows a relation model of storing the data where the upper-level entity is a database
.
A single cluster can host several databases, in which each database hosts its own collection of tables, stored functions, and external tables. Each database has its own set of permissions that follow the Role Based Access Control (RBAC) model.
A single Eventhouse can host several databases, in which each database hosts its own collection of tables, stored functions, and external tables. Each database has its own set of permissions that follow the Role Based Access Control (RBAC) model.
A database hosts its own collection of tables, stored functions, and external tables. Each database has its own set of permissions that follow the Role Based Access Control (RBAC) model.
4.3 - Entities
Kusto queries execute in the context of a Kusto database. Data in the database is arranged in tables, which the query may reference, and within the table it is organized as a rectangular grid of columns and rows. Additionally, queries may reference stored functions in the database, which are query fragments made available for reuse.
Clusters are entities that hold databases. Clusters have no name, but they can be referenced by using the
cluster()
special function with the cluster’s URI. For example,cluster("https://help.kusto.windows.net")
is a reference to a cluster that holds theSamples
database.Databases are named entities that hold tables and stored functions. All Kusto queries run in the context of some database, and the entities of that database may be referenced by the query with no qualifications. Additionally, other databases may be referenced using the database() special function. For example,
cluster("https://help.kusto.windows.net").database("Samples")
is a universal reference to a specific database.Tables are named entities that hold data. A table has an ordered set of columns, and zero or more rows of data, each row holding one data value for each of the columns of the table. Tables may be referenced by name only if they are in the database in context of the query, or by qualifying them with a database reference otherwise. For example,
cluster("https://help.kusto.windows.net").database("Samples").StormEvents
is a universal reference to a particular table in theSamples
database. Tables may also be referenced by using the table() special function.Columns are named entities that have a scalar data type. Columns are referenced in the query relative to the tabular data stream that is in context of the specific operator referencing them.
Stored functions are named entities that allow reuse of Kusto queries or query parts.
Views are virtual tables based on functions (stored or defined in an ad-hoc fashion).
External tables are entities that reference data stored outside Kusto database. External tables are used for exporting data from Kusto to external storage as well as for querying external data without ingesting it into Kusto.
Related content
4.4 - Entity names
Kusto entities are referenced in a query by name. Entities that can be referenced by their name include databases, tables, columns, and stored functions, but not clusters. The name you assign an entity is called an identifier. In addition to entities, you can also assign an identifier to query parameters, or variables set through a let statement.
Kusto entities are referenced in a query by name. Entities that can be referenced by their name include databases, tables, columns, and stored functions. The name you assign an entity is called an identifier. In addition to entities, you can also assign an identifier to query parameters, or variables set through a let statement.
An entity’s name is unique to the entity type in the context of its container. For example, two tables in the same database can’t have the same name, but a database and a table can have the same name because they’re different entity types. Similarly, a table and a stored function may have the same name.
Pretty names
In addition to the entity’s name, some entities may have a pretty name. Similar to the use of entity names, pretty names can be used to reference an entity in queries. But unlike entity names, pretty names aren’t necessarily unique in the context of their container. When a container has multiple entities with the same pretty name, the pretty name can’t be used to reference the entity.
Pretty names allow middle-tier applications to map automatically created entity names (such as UUIDs) to names that are human-readable for display and referencing purposes.
For an example on how to assign a pretty name, see .alter database prettyname command.
Identifier naming rules
An identifier is the name you assign to entities, query parameters, or variable set through a let statement. Valid identifiers must follow these rules:
- Identifiers are case-sensitive. Database names are case-insensitive, and therefore an exception to this rule.
- Identifiers must be between 1 and 1024 characters long.
- Identifiers may contain letters, digits, and underscores (
_
). - Identifiers may contain certain special characters: spaces, dots (
.
), and dashes (-
). For information on how to reference identifiers with special characters, see Reference identifiers in queries.
Avoid naming identifiers as language keywords or literals
In KQL, there are keywords and literals that have similar naming rules as identifiers. You can have identifiers with the same name as keywords or literals. However, we recommend that you avoid doing so as referencing them in queries requires special quoting.
To avoid using an identifier that might also be a language keyword or literal, such as where
, summarize
, and 1day
, you can choose your entity name according to the following conventions, which aren’t applicable to language keywords:
Use a name that starts with a capital letter (
A
toZ
).Use a name that starts or ends with a single underscore (
_
).[!NOTE] KQL reserves all identifiers that start or end with a sequence of two underscore characters (
__
); users can’t define such names for their own use.
For information on how to reference these identifiers, see Reference identifiers in queries.
Reference identifiers in queries
The following table provides an explanation on how to reference identifiers in queries.
Identifier type | Identifier | Reference | Explanation |
---|---|---|---|
Normal | entity | entity | Identifiers (entity ) that don’t include special characters or map to some language keyword don’t need to be enclosed in quotation marks. |
Special character | entity-name | ['entity-name'] | Identifiers names that include special characters (such as - ) must be enclosed using [' and '] or using [" and "] . |
language keyword | where | ["where"] | Identifier names that are language keywords must be enclosed using [' and '] or [" and "] . |
literal | 1day | ["1day"] | Identifier names that are literals must be enclosed using [' and '] or [" and "] . |
Related content
4.5 - Entity references
Kusto entities are referenced in a query by name. Entities that can be referenced by their name include databases, tables, columns, and stored functions, but not clusters.
Kusto entities are referenced in a query by name. Entities that can be referenced by their name include databases, tables, columns, and stored functions.
If the entity’s container is unambiguous in the current context, use the entity name without additional qualifications. For example, when running a query against a
database called DB
, you may reference a table called T
in that database by its name, T
.
If the entity’s container isn’t available from the context, or you want to reference an entity from a container different than the container in context, use the entity’s qualified name.
The name is the concatenation of the entity name to the container’s, and potentially its container’s, and so on. In this way, a query running against database DB
may refer to a table T1
in a different database DB1
, by using database("DB1").T1
.
If the query wants to reference a table from another cluster it can do so, for example, by using cluster("https://C2.kusto.windows.net/").database("DB2").T2
.
Entity references can also use the entity pretty name, as long as it’s unique in the context of the entity’s container. For more information, see entity pretty names.
Wildcard matching for entity names
In some contexts, you may use a wildcard (*
) to match all or part of an entity
name. For example, the following query references all tables in the current database,
and all tables in database DB
whose name starts with a T
:
union *, database("DB1").T*
Such names are system-reserved.
Related content
4.6 - External tables
An external table is a schema entity that references data stored external to a Kusto database.
Similar to tables, an external table has a well-defined schema (an ordered list of column name and data type pairs). Unlike tables where data is ingested into your cluster, external tables operate on data stored and managed outside your cluster.
Supported external data stores are:
- Files stored in Azure Blob Storage or in Azure Data Lake. Most commonly the data is stored in some standard format such as CSV, JSON, Parquet, AVRO, etc. For the list of supported formats, refer to supported formats.
- SQL table (SQL Server, MySql, PostgreSql, and Cosmos DB).
See the following ways of creating external tables:
- Create or alter Azure Blob Storage/ADLS external tables
- Create or alter delta external tables
- Create and alter SQL external tables
- Create external table using Azure Data Explorer web UI Wizard
An external table can be referenced by its name using the external_table() function.
Use the following commands to manage external tables:
For more information about how to query external tables, and ingested and uningested data, see Query data in Azure Data Lake using Azure Data Explorer.
To accelerate queries over external delta tables, see Query acceleration policy.
4.7 - Fact and dimension tables
When designing the schema for a database, think of tables as broadly belonging to one of two categories.
Fact tables
Fact tables are tables whose records are immutable “facts”, such as service logs and measurement information. Records are progressively appended into the table in a streaming fashion or in large chunks. The records stay there until they’re removed because of cost or because they’ve lost their value. Records are otherwise never updated.
Entity data is sometimes held in fact tables, where the entity data changes slowly. For example, data about some physical entity, such as a piece of office equipment that infrequently changes location. Since data in Kusto is immutable, the common practice is to have each table hold two columns:
- An identity (
string
) column that identifies the entity - A last-modified (
datetime
) timestamp column
Only the last record for each entity identity is then retrieved.
Dimension tables
Dimension tables:
- Hold reference data, such as lookup tables from an entity identifier to its properties
- Hold snapshot-like data in tables whose entire contents change in a single transaction
Dimension tables aren’t regularly ingested with new data. Instead, the entire data content is updated at once, using operations such as .set-or-replace, .move extents, or .rename tables.
Sometimes, dimension tables might be derived from fact tables. This process can be done via a materialized view on the fact table, with a query on the table that takes the last record for each entity.
Differentiate fact and dimension tables
There are processes in Kusto that differentiate between fact tables and dimension tables. One of them is continuous export.
These mechanisms are guaranteed to process data in fact tables precisely once. They rely on the database cursor mechanism.
For example, every execution of a continuous export job, exports all records that were ingested since the last update of the database cursor. Continuous export jobs must differentiate between fact tables and dimension tables. Fact tables only process newly ingested data, and dimension tables are used as lookups. As such, the entire table must be taken into account.
There’s no way to “mark” a table as being a “fact table” or a “dimension table”. The way data is ingested into the table, and how the table is used, is what identifies its type.
The way data is ingested into the table, and how the table is used, is what identifies its type.
4.8 - Stored functions
Functions are reusable queries or query parts. Functions can be stored as database entities, similar to tables, called stored functions. Alternatively, functions can be created in an ad-hoc fashion with a let statement, called query-defined functions. For more information, see user-defined functions.
To create and manage stored functions, see the Stored functions management overview.
For more information on working with functions in Log Analytics, see Functions in Azure Monitor log queries.
4.9 - Tables
Tables are named entities that hold data. A table has an ordered set of columns, and zero or more rows of data. Each row holds one data value for each of the columns of the table. The order of rows in the table is unknown, and doesn’t in general affect queries, except for some tabular operators (such as the top operator) that are inherently undetermined. For information on how to create and manage tables, see managing tables.
Tables occupy the same namespace as stored functions. If a stored function and a table both have the same name, the stored function will be chosen.
References tables in queries
The simplest way to reference a table is by using its name. This reference can be done for all tables that are in the database in context. For example, the following query counts the records of the current database’s StormEvents
table:
StormEvents
| count
An equivalent way to write the query above is by escaping the table name:
["StormEvents"]
| count
Tables may also be referenced by explicitly noting the database they are in. Then you can author queries that combine data from multiple databases. For example, the following query will work with any database in context, as long as the caller has access to the target database:
cluster("https://help.kusto.windows.net").database("Samples").StormEvents
| count
It’s also possible to reference a table by using the table() special function, as long as the argument to that function evaluates to a constant. For example:
let counter=(TableName:string) { table(TableName) | count };
counter("StormEvents")
Related content
4.10 - Views
A view is a virtual table based on the result-set of a Kusto Query Language (KQL) query.
Like real tables, views organize data with rows and columns, and participate in tasks that involve wildcard table name resolution, such as union * and search * scenarios. However, unlike real tables, views don’t maintain dedicated data storage. Rather, they dynamically represent the result of a query.
How to define a view
Views are defined through user-defined functions, which come in two forms: query-defined functions and stored functions. To qualify as a view, a function must accept no arguments and yield a tabular expression as its output.
To define a query-defined function as a view, specify the view
keyword before the function definition. For an example, see Query-defined view.
To define a stored function as a view, set the view
property to true
when you create the function. For an example, see Stored view. For more information, see the .create function command.
Examples
Query-defined view
The following query defines two functions: T_view
and T_notview
. The query results demonstrate that only T_view
is resolved by the wildcard reference in the union operation.
let T_view = view () { print x=1 };
let T_notview = () { print x=2 };
union T*
Stored view
The following query defines a stored view. This view behaves like any other stored function, yet can partake in wildcard scenarios.
.create function
with (view=true, docstring='Simple demo view', folder='Demo')
MyView() { StormEvents | take 100 }
Related content
5 - Functions
5.1 - bartlett_test_fl()
The bartlett_test_fl()
function is a user-defined tabular function that performs the Bartlett Test.
Syntax
T | invoke bartlett_test_fl()(
data1,
data2,
test_statistic,
p_value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
data1 | string | ✔️ | The name of the column containing the first set of data to be used for the test. |
data2 | string | ✔️ | The name of the column containing the second set of data to be used for the test. |
test_statistic | string | ✔️ | The name of the column to store test statistic value for the results. |
p_value | string | ✔️ | The name of the column to store p-value for the results. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let bartlett_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
let code = ```if 1:
from scipy import stats
data1 = kargs["data1"]
data2 = kargs["data2"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
def func(row):
statistics = stats.bartlett(row[data1], row[data2])
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Stats", docstring = "Bartlett Test")
bartlett_test_fl(tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
let code = ```if 1:
from scipy import stats
data1 = kargs["data1"]
data2 = kargs["data2"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
def func(row):
statistics = stats.bartlett(row[data1], row[data2])
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let bartlett_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
let code = ```if 1:
from scipy import stats
data1 = kargs["data1"]
data2 = kargs["data2"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
def func(row):
statistics = stats.bartlett(row[data1], row[data2])
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
// Example query that uses the function
datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke bartlett_test_fl('sample1', 'sample2', 'test_stat', 'p_val')
Stored
datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke bartlett_test_fl('sample1', 'sample2', 'test_stat', 'p_val')
Output
id | sample1 | sample2 | test_stat | p_val |
---|---|---|---|---|
Test #1 | [23.64, 20.57, 20.42] | [27.1, 22.12, 33.56] | 1.7660796224425723 | 0.183868001738637 |
Test #2 | [20.85, 21.89, 23.41] | [35.09, 30.02, 26.52] | 1.9211710616896014 | 0.16572762069132516 |
Test #3 | [20.13, 20.5, 21.7, 22.02] | [32.2, 32.79, 33.9, 34.22] | 0.0026985713829234454 | 0.958570306268548 |
| Test #3 | [20.13, 20.5, 21.7, 22.02] | [32.2, 32.79, 33.9, 34.22] | 0.0026985713829234454 | 0.958570306268548 |
5.2 - binomial_test_fl()
The function binomial_test_fl()
is a UDF (user-defined function) that performs the binomial test.
Syntax
T | invoke binomial_test_fl(
successes,
trials [,
success_prob [,
alt_hypotheis ]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
successes | string | ✔️ | The name of the column containing the number of success results. |
trials | string | ✔️ | The name of the column containing the total number of trials. |
p_value | string | ✔️ | The name of the column to store the results. |
success_prob | real | The success probability. The default is 0.5. | |
alt_hypotheis | string | The alternate hypothesis can be two-sided , greater , or less . The default is two-sided . |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let binomial_test_fl = (tbl:(*), successes:string, trials:string, p_value:string, success_prob:real=0.5, alt_hypotheis:string='two-sided')
{
let kwargs = bag_pack('successes', successes, 'trials', trials, 'p_value', p_value, 'success_prob', success_prob, 'alt_hypotheis', alt_hypotheis);
let code = ```if 1:
from scipy import stats
successes = kargs["successes"]
trials = kargs["trials"]
p_value = kargs["p_value"]
success_prob = kargs["success_prob"]
alt_hypotheis = kargs["alt_hypotheis"]
def func(row, prob, h1):
pv = stats.binom_test(row[successes], row[trials], p=prob, alternative=h1)
return pv
result = df
result[p_value] = df.apply(func, axis=1, args=(success_prob, alt_hypotheis), result_type="expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Stats", docstring = "Binomial test")
binomial_test_fl(tbl:(*), successes:string, trials:string, p_value:string, success_prob:real=0.5, alt_hypotheis:string='two-sided')
{
let kwargs = bag_pack('successes', successes, 'trials', trials, 'p_value', p_value, 'success_prob', success_prob, 'alt_hypotheis', alt_hypotheis);
let code = ```if 1:
from scipy import stats
successes = kargs["successes"]
trials = kargs["trials"]
p_value = kargs["p_value"]
success_prob = kargs["success_prob"]
alt_hypotheis = kargs["alt_hypotheis"]
def func(row, prob, h1):
pv = stats.binom_test(row[successes], row[trials], p=prob, alternative=h1)
return pv
result = df
result[p_value] = df.apply(func, axis=1, args=(success_prob, alt_hypotheis), result_type="expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let binomial_test_fl = (tbl:(*), successes:string, trials:string, p_value:string, success_prob:real=0.5, alt_hypotheis:string='two-sided')
{
let kwargs = bag_pack('successes', successes, 'trials', trials, 'p_value', p_value, 'success_prob', success_prob, 'alt_hypotheis', alt_hypotheis);
let code = ```if 1:
from scipy import stats
successes = kargs["successes"]
trials = kargs["trials"]
p_value = kargs["p_value"]
success_prob = kargs["success_prob"]
alt_hypotheis = kargs["alt_hypotheis"]
def func(row, prob, h1):
pv = stats.binom_test(row[successes], row[trials], p=prob, alternative=h1)
return pv
result = df
result[p_value] = df.apply(func, axis=1, args=(success_prob, alt_hypotheis), result_type="expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
datatable(id:string, x:int, n:int) [
'Test #1', 3, 5,
'Test #2', 5, 5,
'Test #3', 3, 15
]
| extend p_val=0.0
| invoke binomial_test_fl('x', 'n', 'p_val', success_prob=0.2, alt_hypotheis='greater')
Stored
datatable(id:string, x:int, n:int) [
'Test #1', 3, 5,
'Test #2', 5, 5,
'Test #3', 3, 15
]
| extend p_val=0.0
| invoke binomial_test_fl('x', 'n', 'p_val', success_prob=0.2, alt_hypotheis='greater')
Output
id | x | n | p_val |
---|---|---|---|
Test #1 | 3 | 5 | 0.05792 |
Test #2 | 5 | 5 | 0.00032 |
Test #3 | 3 | 15 | 0.601976790745087 |
5.3 - comb_fl()
Calculate C(n, k)
The function comb_fl()
is a user-defined function (UDF) that calculates C(n, k), the number of combinations for selection of k items out of n, without order. It’s based on the native gamma() function to calculate factorial. For more information, see facorial_fl(). For a selection of k items with order, use perm_fl().
Syntax
comb_fl(
n, k)
Parameters
Name | Type | Required | Description |
---|
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let comb_fl=(n:int, k:int)
{
let fact_n = gamma(n+1);
let fact_nk = gamma(n-k+1);
let fact_k = gamma(k+1);
tolong(fact_n/fact_nk/fact_k)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Stats", docstring = "Calculate number of combinations for selection of k items out of n items without order")
comb_fl(n:int, k:int)
{
let fact_n = gamma(n+1);
let fact_nk = gamma(n-k+1);
let fact_k = gamma(k+1);
tolong(fact_n/fact_nk/fact_k)
}
Example
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let comb_fl=(n:int, k:int)
{
let fact_n = gamma(n+1);
let fact_nk = gamma(n-k+1);
let fact_k = gamma(k+1);
tolong(fact_n/fact_nk/fact_k)
};
range n from 3 to 10 step 3
| extend k = n-2
| extend cnk = comb_fl(n, k)
Stored
range n from 3 to 10 step 3
| extend k = n-2
| extend cnk = comb_fl(n, k)
Output
n | k | cnk |
---|---|---|
3 | 1 | 3 |
6 | 4 | 15 |
9 | 7 | 36 |
5.4 - dbscan_dynamic_fl()
The function dbscan_dynamic_fl()
is a UDF (user-defined function) that clusterizes a dataset using the DBSCAN algorithm. This function is similar to dbscan_fl() just the features are supplied by a single numerical array column and not by multiple scalar columns.
Syntax
T | invoke dbscan_fl(
features_col,
cluster_col,
epsilon,
min_samples,
metric,
metric_params)
Parameters
Name | Type | Required | Description |
---|---|---|---|
features_col | string | ✔️ | The name of the column containing the numeric array of features to be used for clustering. |
cluster_col | string | ✔️ | The name of the column to store the output cluster ID for each record. |
epsilon | real | ✔️ | The maximum distance between two samples to be considered as neighbors. |
min_samples | int | The number of samples in a neighborhood for a point to be considered as a core point. | |
metric | string | The metric to use when calculating distance between points. | |
metric_params | dynamic | Extra keyword arguments for the metric function. |
- For detailed description of the parameters, see DBSCAN documentation
- For the list of metrics see distance computations
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let dbscan_dynamic_fl=(tbl:(*), features_col:string, cluster_col:string, epsilon:double, min_samples:int=10, metric:string='minkowski', metric_params:dynamic=dynamic({'p': 2}))
{
let kwargs = bag_pack('features_col', features_col, 'cluster_col', cluster_col, 'epsilon', epsilon, 'min_samples', min_samples,
'metric', metric, 'metric_params', metric_params);
let code = ```if 1:
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
features_col = kargs["features_col"]
cluster_col = kargs["cluster_col"]
epsilon = kargs["epsilon"]
min_samples = kargs["min_samples"]
metric = kargs["metric"]
metric_params = kargs["metric_params"]
df1 = df[features_col].apply(np.array)
mat = np.vstack(df1.values)
# Scale the dataframe
scaler = StandardScaler()
mat = scaler.fit_transform(mat)
# see https://docs.scipy.org/doc/scipy/reference/spatial.distance.html for the various distance metrics
dbscan = DBSCAN(eps=epsilon, min_samples=min_samples, metric=metric, metric_params=metric_params) # 'minkowski', 'chebyshev'
labels = dbscan.fit_predict(mat)
result = df
result[cluster_col] = labels
```;
tbl
| evaluate python(typeof(*),code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\ML", docstring = "DBSCAN clustering of features passed as a single column containing numerical array")
dbscan_dynamic_fl(tbl:(*), features_col:string, cluster_col:string, epsilon:double, min_samples:int=10, metric:string='minkowski', metric_params:dynamic=dynamic({'p': 2}))
{
let kwargs = bag_pack('features_col', features_col, 'cluster_col', cluster_col, 'epsilon', epsilon, 'min_samples', min_samples,
'metric', metric, 'metric_params', metric_params);
let code = ```if 1:
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
features_col = kargs["features_col"]
cluster_col = kargs["cluster_col"]
epsilon = kargs["epsilon"]
min_samples = kargs["min_samples"]
metric = kargs["metric"]
metric_params = kargs["metric_params"]
df1 = df[features_col].apply(np.array)
mat = np.vstack(df1.values)
# Scale the dataframe
scaler = StandardScaler()
mat = scaler.fit_transform(mat)
# see https://docs.scipy.org/doc/scipy/reference/spatial.distance.html for the various distance metrics
dbscan = DBSCAN(eps=epsilon, min_samples=min_samples, metric=metric, metric_params=metric_params) # 'minkowski', 'chebyshev'
labels = dbscan.fit_predict(mat)
result = df
result[cluster_col] = labels
```;
tbl
| evaluate python(typeof(*),code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Clustering of artificial dataset with three clusters
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let dbscan_dynamic_fl=(tbl:(*), features_col:string, cluster_col:string, epsilon:double, min_samples:int=10, metric:string='minkowski', metric_params:dynamic=dynamic({'p': 2}))
{
let kwargs = bag_pack('features_col', features_col, 'cluster_col', cluster_col, 'epsilon', epsilon, 'min_samples', min_samples,
'metric', metric, 'metric_params', metric_params);
let code = ```if 1:
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
features_col = kargs["features_col"]
cluster_col = kargs["cluster_col"]
epsilon = kargs["epsilon"]
min_samples = kargs["min_samples"]
metric = kargs["metric"]
metric_params = kargs["metric_params"]
df1 = df[features_col].apply(np.array)
mat = np.vstack(df1.values)
# Scale the dataframe
scaler = StandardScaler()
mat = scaler.fit_transform(mat)
# see https://docs.scipy.org/doc/scipy/reference/spatial.distance.html for the various distance metrics
dbscan = DBSCAN(eps=epsilon, min_samples=min_samples, metric=metric, metric_params=metric_params) # 'minkowski', 'chebyshev'
labels = dbscan.fit_predict(mat)
result = df
result[cluster_col] = labels
```;
tbl
| evaluate python(typeof(*),code, kwargs)
};
union
(range x from 1 to 100 step 1 | extend x=rand()+3, y=rand()+2),
(range x from 101 to 200 step 1 | extend x=rand()+1, y=rand()+4),
(range x from 201 to 300 step 1 | extend x=rand()+2, y=rand()+6)
| project Features=pack_array(x, y), cluster_id=int(null)
| invoke dbscan_dynamic_fl("Features", "cluster_id", epsilon=0.6, min_samples=4, metric_params=dynamic({'p':2}))
| extend x=toreal(Features[0]), y=toreal(Features[1])
| render scatterchart with(series=cluster_id)
Stored
union
(range x from 1 to 100 step 1 | extend x=rand()+3, y=rand()+2),
(range x from 101 to 200 step 1 | extend x=rand()+1, y=rand()+4),
(range x from 201 to 300 step 1 | extend x=rand()+2, y=rand()+6)
| project Features=pack_array(x, y), cluster_id=int(null)
| invoke dbscan_dynamic_fl("Features", "cluster_id", epsilon=0.6, min_samples=4, metric_params=dynamic({'p':2}))
| extend x=toreal(Features[0]), y=toreal(Features[1])
| render scatterchart with(series=cluster_id)
5.5 - dbscan_fl()
The function dbscan_fl()
is a UDF (user-defined function) that clusterizes a dataset using the DBSCAN algorithm.
Syntax
T | invoke dbscan_fl(
features,
cluster_col,
epsilon,
min_samples,
metric,
metric_params)
Parameters
Name | Type | Required | Description |
---|---|---|---|
features | dynamic | ✔️ | An array containing the names of the features columns to use for clustering. |
cluster_col | string | ✔️ | The name of the column to store the output cluster ID for each record. |
epsilon | real | ✔️ | The maximum distance between two samples to be considered as neighbors. |
min_samples | int | The number of samples in a neighborhood for a point to be considered as a core point. | |
metric | string | The metric to use when calculating distance between points. | |
metric_params | dynamic | Extra keyword arguments for the metric function. |
- For detailed description of the parameters, see DBSCAN documentation
- For the list of metrics see distance computations
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let dbscan_fl=(tbl:(*), features:dynamic, cluster_col:string, epsilon:double, min_samples:int=10,
metric:string='minkowski', metric_params:dynamic=dynamic({'p': 2}))
{
let kwargs = bag_pack('features', features, 'cluster_col', cluster_col, 'epsilon', epsilon, 'min_samples', min_samples,
'metric', metric, 'metric_params', metric_params);
let code = ```if 1:
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
features = kargs["features"]
cluster_col = kargs["cluster_col"]
epsilon = kargs["epsilon"]
min_samples = kargs["min_samples"]
metric = kargs["metric"]
metric_params = kargs["metric_params"]
df1 = df[features]
mat = df1.values
# Scale the dataframe
scaler = StandardScaler()
mat = scaler.fit_transform(mat)
# see https://docs.scipy.org/doc/scipy/reference/spatial.distance.html for the various distance metrics
dbscan = DBSCAN(eps=epsilon, min_samples=min_samples, metric=metric, metric_params=metric_params) # 'minkowski', 'chebyshev'
labels = dbscan.fit_predict(mat)
result = df
result[cluster_col] = labels
```;
tbl
| evaluate python(typeof(*),code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\ML", docstring = "DBSCAN clustering")
dbscan_fl(tbl:(*), features:dynamic, cluster_col:string, epsilon:double, min_samples:int=10,
metric:string='minkowski', metric_params:dynamic=dynamic({'p': 2}))
{
let kwargs = bag_pack('features', features, 'cluster_col', cluster_col, 'epsilon', epsilon, 'min_samples', min_samples,
'metric', metric, 'metric_params', metric_params);
let code = ```if 1:
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
features = kargs["features"]
cluster_col = kargs["cluster_col"]
epsilon = kargs["epsilon"]
min_samples = kargs["min_samples"]
metric = kargs["metric"]
metric_params = kargs["metric_params"]
df1 = df[features]
mat = df1.values
# Scale the dataframe
scaler = StandardScaler()
mat = scaler.fit_transform(mat)
# see https://docs.scipy.org/doc/scipy/reference/spatial.distance.html for the various distance metrics
dbscan = DBSCAN(eps=epsilon, min_samples=min_samples, metric=metric, metric_params=metric_params) # 'minkowski', 'chebyshev'
labels = dbscan.fit_predict(mat)
result = df
result[cluster_col] = labels
```;
tbl
| evaluate python(typeof(*),code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Clustering of artificial dataset with three clusters
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let dbscan_fl=(tbl:(*), features:dynamic, cluster_col:string, epsilon:double, min_samples:int=10,
metric:string='minkowski', metric_params:dynamic=dynamic({'p': 2}))
{
let kwargs = bag_pack('features', features, 'cluster_col', cluster_col, 'epsilon', epsilon, 'min_samples', min_samples,
'metric', metric, 'metric_params', metric_params);
let code = ```if 1:
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
features = kargs["features"]
cluster_col = kargs["cluster_col"]
epsilon = kargs["epsilon"]
min_samples = kargs["min_samples"]
metric = kargs["metric"]
metric_params = kargs["metric_params"]
df1 = df[features]
mat = df1.values
# Scale the dataframe
scaler = StandardScaler()
mat = scaler.fit_transform(mat)
# see https://docs.scipy.org/doc/scipy/reference/spatial.distance.html for the various distance metrics
dbscan = DBSCAN(eps=epsilon, min_samples=min_samples, metric=metric, metric_params=metric_params) # 'minkowski', 'chebyshev'
labels = dbscan.fit_predict(mat)
result = df
result[cluster_col] = labels
```;
tbl
| evaluate python(typeof(*),code, kwargs)
};
union
(range x from 1 to 100 step 1 | extend x=rand()+3, y=rand()+2),
(range x from 101 to 200 step 1 | extend x=rand()+1, y=rand()+4),
(range x from 201 to 300 step 1 | extend x=rand()+2, y=rand()+6)
| extend cluster_id=int(null)
| invoke dbscan_fl(pack_array("x", "y"), "cluster_id", epsilon=0.6, min_samples=4, metric_params=dynamic({'p':2}))
| render scatterchart with(series=cluster_id)
Stored
union
(range x from 1 to 100 step 1 | extend x=rand()+3, y=rand()+2),
(range x from 101 to 200 step 1 | extend x=rand()+1, y=rand()+4),
(range x from 201 to 300 step 1 | extend x=rand()+2, y=rand()+6)
| extend cluster_id=int(null)
| invoke dbscan_fl(pack_array("x", "y"), "cluster_id", epsilon=0.6, min_samples=4, metric_params=dynamic({'p':2}))
| render scatterchart with(series=cluster_id)
5.6 - detect_anomalous_new_entity_fl()
Detect the appearance of anomalous new entities in timestamped data.
The function detect_anomalous_new_entity_fl()
is a UDF (user-defined function) that detects the appearance of anomalous new entities - such as IP addresses or users - in timestamped data, such as traffic logs. In cybersecurity context, such events might be suspicious and indicate a potential attack or compromise.
The anomaly model is based on a Poisson distribution representing the number of new entities appearing per time bin (such as day) for each scope. Poisson distribution parameter is estimated based on the rate of appearance of new entities in training period, with added decay factor reflecting the fact that recent appearances are more important than old ones. Thus we calculate the probability to encounter a new entity in defined detection period per some scope - such as a subscription or an account. The model output is controlled by several optional parameters, such as minimal threshold for anomaly, decay rate parameter, and others.
The model’s direct output is an anomaly score based on the inverse of estimated probability to encounter a new entity. The score is monotonous in the range of [0, 1], with 1 representing something anomalous. In addition to the anomaly score, there’s a binary flag for detected anomaly (controlled by a minimal threshold parameter), and other explanatory fields.
Syntax
detect_anomalous_new_entity_fl(
entityColumnName, scopeColumnName, timeColumnName, startTraining, startDetection, endDetection, [maxEntitiesThresh], [minTrainingDaysThresh], [decayParam], [anomalyScoreThresh])
Parameters
Name | Type | Required | Description |
---|---|---|---|
entityColumnName | string | ✔️ | The name of the input table column containing the names or IDs of the entities for which anomaly model is calculated. |
scopeColumnName | string | ✔️ | The name of the input table column containing the partition or scope, so that a different anomaly model is built for each scope. |
timeColumnName | string | ✔️ | The name of the input table column containing the timestamps, that are used to define the training and detection periods. |
startTraining | datetime | ✔️ | The beginning of the training period for the anomaly model. Its end is defined by the beginning of detection period. |
startDetection | datetime | ✔️ | The beginning of the detection period for anomaly detection. |
endDetection | datetime | ✔️ | The end of the detection period for anomaly detection. |
maxEntitiesThresh | int | The maximum number of existing entities in scope to calculate anomalies. If the number of entities is above the threshold, the scope is considered too noisy and anomalies aren’t calculated. The default value is 60. | |
minTrainingDaysThresh | int | The minimum number of days in training period that a scope exists to calculate anomalies. If it is below threshold, the scope is considered too new and unknown, so anomalies aren’t calculated. The default value is 14. | |
decayParam | real | The decay rate parameter for anomaly model, a number in range (0,1]. Lower values mean faster decay, so more importance is given to later appearances in training period. A value of 1 means no decay, so a simple average is used for Poisson distribution parameter estimation. The default value is 0.95. | |
anomalyScoreThresh | real | The minimum value of anomaly score for which an anomaly is detected, a number in range [0, 1]. Higher values mean that only more significant cases are considered anomalous, so fewer anomalies are detected (higher precision, lower recall). The default value is 0.9. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let detect_anomalous_new_entity_fl = (T:(*), entityColumnName:string, scopeColumnName:string
, timeColumnName:string, startTraining:datetime, startDetection:datetime, endDetection:datetime
, maxEntitiesThresh:int = 60, minTrainingDaysThresh:int = 14, decayParam:real = 0.95, anomalyScoreThresh:real = 0.9)
{
//pre-process the input data by adding standard column names and dividing to datasets
let timePeriodBinSize = 'day'; // we assume a reasonable bin for time is day, so the probability model is built per that bin size
let processedData = (
T
| extend scope = column_ifexists(scopeColumnName, '')
| extend entity = column_ifexists(entityColumnName, '')
| extend sliceTime = todatetime(column_ifexists(timeColumnName, ''))
| where isnotempty(scope) and isnotempty(entity) and isnotempty(sliceTime)
| extend dataSet = case((sliceTime >= startTraining and sliceTime < startDetection), 'trainSet'
, sliceTime >= startDetection and sliceTime <= endDetection, 'detectSet'
, 'other')
| where dataSet in ('trainSet', 'detectSet')
);
// summarize the data by scope and entity. this will be used to create a distribution of entity appearances based on first seen data
let entityData = (
processedData
| summarize countRowsEntity = count(), firstSeenEntity = min(sliceTime), lastSeenEntity = max(sliceTime), firstSeenSet = arg_min(sliceTime, dataSet)
by scope, entity
| extend firstSeenSet = dataSet
| project-away dataSet
);
// aggregate entity data per scope and get the number of entities appearing over time
let aggregatedCandidateScopeData = (
entityData
| summarize countRowsScope = sum(countRowsEntity), countEntitiesScope = dcount(entity), countEntitiesScopeInTrain = dcountif(entity, firstSeenSet == 'trainSet')
, firstSeenScope = min(firstSeenEntity), lastSeenScope = max(lastSeenEntity), hasNewEntities = iff(dcountif(entity,firstSeenSet == 'detectSet') > 0, 1, 0)
by scope
| extend slicesInTrainingScope = datetime_diff(timePeriodBinSize, startDetection, firstSeenScope)
| where countEntitiesScopeInTrain <= maxEntitiesThresh and slicesInTrainingScope >= minTrainingDaysThresh and lastSeenScope >= startDetection and hasNewEntities == 1
);
let modelData = (
entityData
| join kind = inner (aggregatedCandidateScopeData) on scope
| where firstSeenSet == 'trainSet'
| summarize countAddedEntities = dcount(entity), firstSeenScope = min(firstSeenScope), slicesInTrainingScope = max(slicesInTrainingScope), countEntitiesScope = max(countEntitiesScope)
by scope, firstSeenSet, firstSeenEntity
| extend diffInDays = datetime_diff(timePeriodBinSize, startDetection, firstSeenEntity)
// adding exponentially decaying weights to counts
| extend decayingWeight = pow(base = decayParam, exponent = diffInDays)
| extend decayingValue = countAddedEntities * decayingWeight
| summarize newEntityProbability = round(1 - exp(-1.0 * sum(decayingValue)/max(diffInDays)), 4)
, countKnownEntities = sum(countAddedEntities), lastNewEntityTimestamp = max(firstSeenEntity), slicesOnScope = max(slicesInTrainingScope)///for explainability
by scope, firstSeenSet
// anomaly score is based on probability to get no new entities, calculated using Poisson distribution (P(X=0) = exp(-avg)) with added decay on average
| extend newEntityAnomalyScore = round(1 - newEntityProbability, 4)
| extend isAnomalousNewEntity = iff(newEntityAnomalyScore >= anomalyScoreThresh, 1, 0)
);
let resultsData = (
processedData
| where dataSet == 'detectSet'
| join kind = inner (modelData) on scope
| project-away scope1
| where isAnomalousNewEntity == 1
| summarize arg_min(sliceTime, *) by scope, entity
| extend anomalyType = strcat('newEntity_', entityColumnName), anomalyExplainability = strcat('The ', entityColumnName, ' ', entity, ' wasn\'t seen on ', scopeColumnName, ' ', scope, ' during the last ', slicesOnScope, ' ', timePeriodBinSize, 's. Previously, ', countKnownEntities
, ' entities were seen, the last one of them appearing at ', format_datetime(lastNewEntityTimestamp, 'yyyy-MM-dd HH:mm'), '.')
| join kind = leftouter (entityData | where firstSeenSet == 'trainSet' | extend entityFirstSeens = strcat(entity, ' : ', format_datetime(firstSeenEntity, 'yyyy-MM-dd HH:mm')) | sort by scope, firstSeenEntity asc | summarize anomalyState = make_list(entityFirstSeens) by scope) on scope
| project-away scope1
);
resultsData
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (docstring = "Detect new and anomalous entity (such as username or IP) per scope (such as subscription or account)", skipvalidation = "true", folder = 'KCL')
detect_anomalous_new_entity_fl(T:(*), entityColumnName:string, scopeColumnName:string
, timeColumnName:string, startTraining:datetime, startDetection:datetime, endDetection:datetime
, maxEntitiesThresh:int = 60, minTrainingDaysThresh:int = 14, decayParam:real = 0.95, anomalyScoreThresh:real = 0.9)
{
//pre-process the input data by adding standard column names and dividing to datasets
let timePeriodBinSize = 'day'; // we assume a reasonable bin for time is day, so the probability model is built per that bin size
let processedData = (
T
| extend scope = column_ifexists(scopeColumnName, '')
| extend entity = column_ifexists(entityColumnName, '')
| extend sliceTime = todatetime(column_ifexists(timeColumnName, ''))
| where isnotempty(scope) and isnotempty(entity) and isnotempty(sliceTime)
| extend dataSet = case((sliceTime >= startTraining and sliceTime < startDetection), 'trainSet'
, sliceTime >= startDetection and sliceTime <= endDetection, 'detectSet'
, 'other')
| where dataSet in ('trainSet', 'detectSet')
);
// summarize the data by scope and entity. this will be used to create a distribution of entity appearances based on first seen data
let entityData = (
processedData
| summarize countRowsEntity = count(), firstSeenEntity = min(sliceTime), lastSeenEntity = max(sliceTime), firstSeenSet = arg_min(sliceTime, dataSet)
by scope, entity
| extend firstSeenSet = dataSet
| project-away dataSet
);
// aggregate entity data per scope and get the number of entities appearing over time
let aggregatedCandidateScopeData = (
entityData
| summarize countRowsScope = sum(countRowsEntity), countEntitiesScope = dcount(entity), countEntitiesScopeInTrain = dcountif(entity, firstSeenSet == 'trainSet')
, firstSeenScope = min(firstSeenEntity), lastSeenScope = max(lastSeenEntity), hasNewEntities = iff(dcountif(entity,firstSeenSet == 'detectSet') > 0, 1, 0)
by scope
| extend slicesInTrainingScope = datetime_diff(timePeriodBinSize, startDetection, firstSeenScope)
| where countEntitiesScopeInTrain <= maxEntitiesThresh and slicesInTrainingScope >= minTrainingDaysThresh and lastSeenScope >= startDetection and hasNewEntities == 1
);
let modelData = (
entityData
| join kind = inner (aggregatedCandidateScopeData) on scope
| where firstSeenSet == 'trainSet'
| summarize countAddedEntities = dcount(entity), firstSeenScope = min(firstSeenScope), slicesInTrainingScope = max(slicesInTrainingScope), countEntitiesScope = max(countEntitiesScope)
by scope, firstSeenSet, firstSeenEntity
| extend diffInDays = datetime_diff(timePeriodBinSize, startDetection, firstSeenEntity)
// adding exponentially decaying weights to counts of
| extend decayingWeight = pow(base = decayParam, exponent = diffInDays)
| extend decayingValue = countAddedEntities * decayingWeight
| summarize newEntityProbability = round(1 - exp(-1.0 * sum(decayingValue)/max(diffInDays)), 4)
, countKnownEntities = sum(countAddedEntities), lastNewEntityTimestamp = max(firstSeenEntity), slicesOnScope = max(slicesInTrainingScope)///for explainability
by scope, firstSeenSet
// anomaly score is based on probability to get no new entities, calculated using Poisson distribution (P(X=0) = exp(-avg)) with added decay on average
| extend newEntityAnomalyScore = round(1 - newEntityProbability, 4)
| extend isAnomalousNewEntity = iff(newEntityAnomalyScore >= anomalyScoreThresh, 1, 0)
);
let resultsData = (
processedData
| where dataSet == 'detectSet'
| join kind = inner (modelData) on scope
| project-away scope1
| where isAnomalousNewEntity == 1
| summarize arg_min(sliceTime, *) by scope, entity
| extend anomalyType = strcat('newEntity_', entityColumnName), anomalyExplainability = strcat('The ', entityColumnName, ' ', entity, ' wasn\'t seen on ', scopeColumnName, ' ', scope, ' during the last ', slicesOnScope, ' ', timePeriodBinSize, 's. Previously, ', countKnownEntities
, ' entities were seen, the last one of them appearing at ', format_datetime(lastNewEntityTimestamp, 'yyyy-MM-dd HH:mm'), '.')
| join kind = leftouter (entityData | where firstSeenSet == 'trainSet' | extend entityFirstSeens = strcat(entity, ' : ', format_datetime(firstSeenEntity, 'yyyy-MM-dd HH:mm')) | sort by scope, firstSeenEntity asc | summarize anomalyState = make_list(entityFirstSeens) by scope) on scope
| project-away scope1
);
resultsData
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let detect_anomalous_new_entity_fl = (T:(*), entityColumnName:string, scopeColumnName:string
, timeColumnName:string, startTraining:datetime, startDetection:datetime, endDetection:datetime
, maxEntitiesThresh:int = 60, minTrainingDaysThresh:int = 14, decayParam:real = 0.95, anomalyScoreThresh:real = 0.9)
{
//pre-process the input data by adding standard column names and dividing to datasets
let timePeriodBinSize = 'day'; // we assume a reasonable bin for time is day, so the probability model is built per that bin size
let processedData = (
T
| extend scope = column_ifexists(scopeColumnName, '')
| extend entity = column_ifexists(entityColumnName, '')
| extend sliceTime = todatetime(column_ifexists(timeColumnName, ''))
| where isnotempty(scope) and isnotempty(entity) and isnotempty(sliceTime)
| extend dataSet = case((sliceTime >= startTraining and sliceTime < startDetection), 'trainSet'
, sliceTime >= startDetection and sliceTime <= endDetection, 'detectSet'
, 'other')
| where dataSet in ('trainSet', 'detectSet')
);
// summarize the data by scope and entity. this will be used to create a distribution of entity appearances based on first seen data
let entityData = (
processedData
| summarize countRowsEntity = count(), firstSeenEntity = min(sliceTime), lastSeenEntity = max(sliceTime), firstSeenSet = arg_min(sliceTime, dataSet)
by scope, entity
| extend firstSeenSet = dataSet
| project-away dataSet
);
// aggregate entity data per scope and get the number of entities appearing over time
let aggregatedCandidateScopeData = (
entityData
| summarize countRowsScope = sum(countRowsEntity), countEntitiesScope = dcount(entity), countEntitiesScopeInTrain = dcountif(entity, firstSeenSet == 'trainSet')
, firstSeenScope = min(firstSeenEntity), lastSeenScope = max(lastSeenEntity), hasNewEntities = iff(dcountif(entity,firstSeenSet == 'detectSet') > 0, 1, 0)
by scope
| extend slicesInTrainingScope = datetime_diff(timePeriodBinSize, startDetection, firstSeenScope)
| where countEntitiesScopeInTrain <= maxEntitiesThresh and slicesInTrainingScope >= minTrainingDaysThresh and lastSeenScope >= startDetection and hasNewEntities == 1
);
let modelData = (
entityData
| join kind = inner (aggregatedCandidateScopeData) on scope
| where firstSeenSet == 'trainSet'
| summarize countAddedEntities = dcount(entity), firstSeenScope = min(firstSeenScope), slicesInTrainingScope = max(slicesInTrainingScope), countEntitiesScope = max(countEntitiesScope)
by scope, firstSeenSet, firstSeenEntity
| extend diffInDays = datetime_diff(timePeriodBinSize, startDetection, firstSeenEntity)
// adding exponentially decaying weights to counts
| extend decayingWeight = pow(base = decayParam, exponent = diffInDays)
| extend decayingValue = countAddedEntities * decayingWeight
| summarize newEntityProbability = round(1 - exp(-1.0 * sum(decayingValue)/max(diffInDays)), 4)
, countKnownEntities = sum(countAddedEntities), lastNewEntityTimestamp = max(firstSeenEntity), slicesOnScope = max(slicesInTrainingScope)///for explainability
by scope, firstSeenSet
// anomaly score is based on probability to get no new entities, calculated using Poisson distribution (P(X=0) = exp(-avg)) with added decay on average
| extend newEntityAnomalyScore = round(1 - newEntityProbability, 4)
| extend isAnomalousNewEntity = iff(newEntityAnomalyScore >= anomalyScoreThresh, 1, 0)
);
let resultsData = (
processedData
| where dataSet == 'detectSet'
| join kind = inner (modelData) on scope
| project-away scope1
| where isAnomalousNewEntity == 1
| summarize arg_min(sliceTime, *) by scope, entity
| extend anomalyType = strcat('newEntity_', entityColumnName), anomalyExplainability = strcat('The ', entityColumnName, ' ', entity, ' wasn\'t seen on ', scopeColumnName, ' ', scope, ' during the last ', slicesOnScope, ' ', timePeriodBinSize, 's. Previously, ', countKnownEntities
, ' entities were seen, the last one of them appearing at ', format_datetime(lastNewEntityTimestamp, 'yyyy-MM-dd HH:mm'), '.')
| join kind = leftouter (entityData | where firstSeenSet == 'trainSet' | extend entityFirstSeens = strcat(entity, ' : ', format_datetime(firstSeenEntity, 'yyyy-MM-dd HH:mm')) | sort by scope, firstSeenEntity asc | summarize anomalyState = make_list(entityFirstSeens) by scope) on scope
| project-away scope1
);
resultsData
};
// synthetic data generation
let detectPeriodStart = datetime(2022-04-30 05:00:00.0000000);
let trainPeriodStart = datetime(2022-03-01 05:00);
let names = pack_array("Admin", "Dev1", "Dev2", "IT-support");
let countNames = array_length(names);
let testData = range t from 1 to 24*60 step 1
| extend timeSlice = trainPeriodStart + 1h * t
| extend countEvents = round(2*rand() + iff((t/24)%7>=5, 10.0, 15.0) - (((t%24)/10)*((t%24)/10)), 2) * 100 // generate a series with weekly seasonality
| extend userName = tostring(names[toint(rand(countNames))])
| extend deviceId = hash_md5(rand())
| extend accountName = iff(((rand() < 0.2) and (timeSlice < detectPeriodStart)), 'testEnvironment', 'prodEnvironment')
| extend userName = iff(timeSlice == detectPeriodStart, 'H4ck3r', userName)
| extend deviceId = iff(timeSlice == detectPeriodStart, 'abcdefghijklmnoprtuvwxyz012345678', deviceId)
| sort by timeSlice desc
;
testData
| invoke detect_anomalous_new_entity_fl(entityColumnName = 'userName' //principalName for positive, deviceId for negative
, scopeColumnName = 'accountName'
, timeColumnName = 'timeSlice'
, startTraining = trainPeriodStart
, startDetection = detectPeriodStart
, endDetection = detectPeriodStart
)
Stored
let detectPeriodStart = datetime(2022-04-30 05:00:00.0000000);
let trainPeriodStart = datetime(2022-03-01 05:00);
let names = pack_array("Admin", "Dev1", "Dev2", "IT-support");
let countNames = array_length(names);
let testData = range t from 1 to 24*60 step 1
| extend timeSlice = trainPeriodStart + 1h * t
| extend countEvents = round(2*rand() + iff((t/24)%7>=5, 10.0, 15.0) - (((t%24)/10)*((t%24)/10)), 2) * 100 // generate a series with weekly seasonality
| extend userName = tostring(names[toint(rand(countNames))])
| extend deviceId = hash_md5(rand())
| extend accountName = iff(((rand() < 0.2) and (timeSlice < detectPeriodStart)), 'testEnvironment', 'prodEnvironment')
| extend userName = iff(timeSlice == detectPeriodStart, 'H4ck3r', userName)
| extend deviceId = iff(timeSlice == detectPeriodStart, 'abcdefghijklmnoprtuvwxyz012345678', deviceId)
| sort by timeSlice desc
;
testData
| invoke detect_anomalous_new_entity_fl(entityColumnName = 'userName'
, scopeColumnName = 'accountName'
, timeColumnName = 'timeSlice'
, startTraining = trainPeriodStart
, startDetection = detectPeriodStart
, endDetection = detectPeriodStart
)
Output
scope | entity | sliceTime | t | timeSlice | countEvents | userName | deviceId | accountName | dataSet | firstSeenSet | newEntityProbability | countKnownEntities | lastNewEntityTimestamp | slicesOnScope | newEntityAnomalyScore | isAnomalousNewEntity | anomalyType | anomalyExplainability | anomalyState |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
prodEnvironment | H4ck3r | 2022-04-30 05:00:00.0000000 | 1440 | 2022-04-30 05:00:00.0000000 | 1687 | H4ck3r | abcdefghijklmnoprtuvwxyz012345678 | prodEnvironment | detectSet | trainSet | 0.0031 | 4 | 2022-03-01 09:00:00.0000000 | 60 | 0.9969 | 1 | newEntity_userName | The userName H4ck3r wasn’t seen on accountName prodEnvironment during the last 60 days. Previously, four entities were seen, the last one of them appearing at 2022-03-01 09:00. | [“IT-support : 2022-03-01 07:00”, “Admin : 2022-03-01 08:00”, “Dev2 : 2022-03-01 09:00”, “Dev1 : 2022-03-01 14:00”] |
The output of running the function is the first-seen row in test dataset for each entity per scope, filtered for new entities (meaning they didn’t appear during the training period) that were tagged as anomalous (meaning that entity anomaly score was above anomalyScoreThresh). Some other fields are added for clarity:
dataSet
: current dataset (is alwaysdetectSet
).firstSeenSet
: dataset in which the scope was first seen (should be ’trainSet’).newEntityProbability
: probability to see any new entity based on Poisson model estimation.countKnownEntities
: existing entities on scope.lastNewEntityTimestamp
: last time a new entity was seen before the anomalous one.slicesOnScope
: count of slices per scope.newEntityAnomalyScore
: anomaly score was the new entity in range [0, 1], higher values meaning more anomaly.isAnomalousNewEntity
: binary flag for anomalous new entitiesanomalyType
: shows the type of anomaly (helpful when running several anomaly detection logics together).anomalyExplainability
: textual wrapper for generated anomaly and its explanation.anomalyState
: bag of existing entities on scope with their first seen times.
Running this function on user per account with default parameters gets a previously unseen and anomalous user (‘H4ck3r’) with high anomaly score of 0.9969, meaning that this is unexpected (due to small numbers of existing users in training period).
When we run the function with default parameters on deviceId as entity, we won’t see an anomaly, due to large number of existing devices which makes it expected. However, if we lower the parameter anomalyScoreThresh to 0.0001 and raise the parameter to maxEntitiesThresh to 10000, we’ll effectively decrease precision in favor of recall, and detect an anomaly (with a low anomaly score) on device ‘abcdefghijklmnoprtuvwxyz012345678’.
The output shows the anomalous entities together with explanation fields in standardized format. These fields are useful for investigating the anomaly and for running anomalous entity detection on several entities or running other algorithms together.
The suggested usage in cybersecurity context is running the function on meaningful entities - such as usernames or IP addresses - per meaningful scopes - such as subscription on accounts. A detected anomalous new entity means that its appearance isn’t expected on the scope, and might be suspicious.
5.7 - factorial_fl()
Calculate factorial.
The function factorial_fl()
is a UDF (user-defined function) that calculates factorial of positive integers (n!). It’s a simple wrapper of the native gamma() function.
Syntax
factorial_fl(
n)
Parameters
Name | Type | Required | Description |
---|---|---|---|
n | int | ✔️ | The input integer for which to calculate the factorial. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let factorial_fl=(n:int)
{
gamma(n+1)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Stats", docstring = "Calculate factorial")
factorial_fl(n:int)
{
gamma(n+1)
}
Example
Query-defined
let factorial_fl=(n:int)
{
gamma(n+1)
};
range x from 1 to 10 step 3
| extend fx = factorial_fl(x)
Stored
range x from 1 to 10 step 3
| extend fx = factorial_fl(x)
Output
x | fx |
---|---|
1 | 1 |
4 | 24 |
7 | 5040 |
10 | 3628799 |
5.8 - Functions
Functions are reusable queries or query parts. Kusto supports two kinds of functions:
Built-in functions are hard-coded functions defined by Kusto that can’t be modified by users.
User-defined functions, which are divided into two types:
Stored functions: user-defined functions that are stored and managed database schema entities, similar to tables. For more information, see Stored functions. To create a stored function, use the .create function command.
Query-defined functions: user-defined functions that are defined and used within the scope of a single query. The definition of such functions is done through a let statement. For more information on how to create query-defined functions, see Create a user defined function.
For more information on user-defined functions, see User-defined functions.
5.9 - Functions library
The following article contains a categorized list of UDF (user-defined functions).
The user-defined functions code is given in the articles. It can be used within a let statement embedded in a query or can be persisted in a database using .create function
.
Cybersecurity functions
Function Name | Description |
---|---|
detect_anomalous_new_entity_fl() | Detect the appearance of anomalous new entities in timestamped data. |
detect_anomalous_spike_fl() | Detect the appearance of anomalous spikes in numeric variables in timestamped data. |
graph_blast_radius_fl() | Calculate the Blast Radius (list and score) of source nodes over path or edge data. |
graph_exposure_perimeter_fl() | Calculate the Exposure Perimeter (list and score) of target nodes over path or edge data. |
graph_path_discovery_fl() | Discover valid paths between relevant endpoints (sources and targets) over graph data (edge and nodes). |
General functions
Function Name | Description |
---|---|
geoip_fl() | Retrieves geographic information of ip address. |
get_packages_version_fl() | Returns version information of the Python engine and the specified packages. |
Machine learning functions
Function Name | Description |
---|---|
dbscan_fl() | Clusterize using the DBSCAN algorithm, features are in separate columns. |
dbscan_dynamic_fl() | Clusterize using the DBSCAN algorithm, features are in a single dynamic column. |
kmeans_fl() | Clusterize using the K-Means algorithm, features are in separate columns. |
kmeans_dynamic_fl() | Clusterize using the K-Means algorithm, features are in a single dynamic column. |
predict_fl() | Predict using an existing trained machine learning model. |
predict_onnx_fl() | Predict using an existing trained machine learning model in ONNX format. |
Plotly functions
The following section contains functions for rendering interactive Plotly charts.
Function Name | Description |
---|---|
plotly_anomaly_fl() | Render anomaly chart using a Plotly template. |
plotly_gauge_fl() | Render gauge chart using a Plotly template. |
plotly_scatter3d_fl() | Render 3D scatter chart using a Plotly template. |
PromQL functions
The following section contains common PromQL functions. These functions can be used for analysis of metrics ingested to your database by the Prometheus monitoring system. All functions assume that metrics in your database are structured using the Prometheus data model.
Function Name | Description |
---|---|
series_metric_fl() | Select and retrieve time series stored with the Prometheus data model. |
series_rate_fl() | Calculate the average rate of counter metric increase per second. |
Series processing functions
Function Name | Description |
---|---|
quantize_fl() | Quantize metric columns. |
series_clean_anomalies_fl() | Replace anomalies in a series by interpolated value. |
series_cosine_similarity_fl() | Calculate the cosine similarity of two numerical vectors. |
series_dbl_exp_smoothing_fl() | Apply a double exponential smoothing filter on series. |
series_dot_product_fl() | Calculate the dot product of two numerical vectors. |
series_downsample_fl() | Downsample time series by an integer factor. |
series_exp_smoothing_fl() | Apply a basic exponential smoothing filter on series. |
series_fit_lowess_fl() | Fit a local polynomial to series using LOWESS method. |
series_fit_poly_fl() | Fit a polynomial to series using regression analysis. |
series_fbprophet_forecast_fl() | Forecast time series values using the Prophet algorithm. |
series_lag_fl() | Apply a lag filter on series. |
series_monthly_decompose_anomalies_fl() | Detect anomalies in a series with monthly seasonality. |
series_moving_avg_fl() | Apply a moving average filter on series. |
series_moving_var_fl() | Apply a moving variance filter on series. |
series_mv_ee_anomalies_fl() | Multivariate Anomaly Detection for series using elliptical envelope model. |
series_mv_if_anomalies_fl() | Multivariate Anomaly Detection for series using isolation forest model. |
series_mv_oc_anomalies_fl() | Multivariate Anomaly Detection for series using one class SVM model. |
series_rolling_fl() | Apply a rolling aggregation function on series. |
series_shapes_fl() | Detects positive/negative trend or jump in series. |
series_uv_anomalies_fl() | Detect anomalies in time series using the Univariate Anomaly Detection Cognitive Service API. |
series_uv_change_points_fl() | Detect change points in time series using the Univariate Anomaly Detection Cognitive Service API. |
time_weighted_avg_fl() | Calculates the time weighted average of a metric using fill forward interpolation. |
time_weighted_avg2_fl() | Calculates the time weighted average of a metric using linear interpolation. |
time_weighted_val_fl() | Calculates the time weighted value of a metric using linear interpolation. |
time_window_rolling_avg_fl() | Calculates the rolling average of a metric over a constant duration time window. |
Statistical and probability functions
Function Name | Description |
---|---|
bartlett_test_fl() | Perform the Bartlett test. |
binomial_test_fl() | Perform the binomial test. |
comb_fl() | Calculate C(n, k), the number of combinations for selection of k items out of n. |
factorial_fl() | Calculate n!, the factorial of n. |
ks_test_fl() | Perform a Kolmogorov Smirnov test. |
levene_test_fl() | Perform a Levene test. |
normality_test_fl() | Performs the Normality Test. |
mann_whitney_u_test_fl() | Perform a Mann-Whitney U Test. |
pair_probabilities_fl() | Calculate various probabilities and related metrics for a pair of categorical variables. |
pairwise_dist_fl() | Calculate pairwise distances between entities based on multiple nominal and numerical variables. |
percentiles_linear_fl() | Calculate percentiles using linear interpolation between closest ranks |
perm_fl() | Calculate P(n, k), the number of permutations for selection of k items out of n. |
two_sample_t_test_fl() | Perform the two sample t-test. |
wilcoxon_test_fl() | Perform the Wilcoxon Test. |
Text analytics
Function Name | Description |
---|---|
log_reduce_fl() | Find common patterns in textual logs and output a summary table. |
log_reduce_full_fl() | Find common patterns in textual logs and output a full table. |
log_reduce_predict_fl() | Apply a trained model to find common patterns in textual logs and output a summary table. |
log_reduce_predict_full_fl() | Apply a trained model to find common patterns in textual logs and output a full table. |
log_reduce_train_fl() | Find common patterns in textual logs and output a model. |
5.10 - geoip_fl()
geoip_fl()
is a user-defined function that retrieves geographic information of ip address.
Syntax
T | invoke geoip_fl(
ip_col,
country_col,
state_col,
city_col,
longitude_col,
latitude_col)
Parameters
Name | Type | Required | Description |
---|---|---|---|
ip_col | string | ✔️ | The name of the column containing the IP addresses to resolve. |
country_col | string | ✔️ | The name of the column to store the retrieved country. |
state_col | string | ✔️ | The name of the column to store the retrieved state. |
city_col | string | ✔️ | The name of the column to store the retrieved city. |
longitude_col | real | ✔️ | The name of the column to store the retrieved longitude. |
latitude_col | real | ✔️ | The name of the column to store the retrieved latitude. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let geoip_fl=(tbl:(*), ip_col:string, country_col:string, state_col:string, city_col:string, longitude_col:string, latitude_col:string)
{
let kwargs = bag_pack('ip_col', ip_col, 'country_col', country_col, 'state_col', state_col, 'city_col', city_col, 'longitude_col', longitude_col, 'latitude_col', latitude_col);
let code= ```if 1:
from sandbox_utils import Zipackage
Zipackage.install('geoip2.zip')
import geoip2.database
ip_col = kargs['ip_col']
country_col = kargs['country_col']
state_col = kargs['state_col']
city_col = kargs['city_col']
longitude_col = kargs['longitude_col']
latitude_col = kargs['latitude_col']
result=df
reader = geoip2.database.Reader(r'C:\\Temp\\GeoLite2-City.mmdb')
def geodata(ip):
try:
gd = reader.city(ip)
geo = pd.Series((gd.country.name, gd.subdivisions.most_specific.name, gd.city.name, gd.location.longitude, gd.location.latitude))
except:
geo = pd.Series((None, None, None, None, None))
return geo
result[[country_col, state_col, city_col, longitude_col, latitude_col]] = result[ip_col].apply(geodata)
```;
tbl
| evaluate python(typeof(*), code, kwargs,
external_artifacts =
pack('geoip2.zip', 'https://artifactswestus.blob.core.windows.net/public/geoip2-4.6.0.zip',
'GeoLite2-City.mmdb', 'https://artifactswestus.blob.core.windows.net/public/GeoLite2-City-20230221.mmdb')
)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = 'Packages\\Utils', docstring = 'Retrieve geographics of ip address')
geoip_fl(tbl:(*), ip_col:string, country_col:string, state_col:string, city_col:string, longitude_col:string, latitude_col:string)
{
let kwargs = bag_pack('ip_col', ip_col, 'country_col', country_col, 'state_col', state_col, 'city_col', city_col, 'longitude_col', longitude_col, 'latitude_col', latitude_col);
let code= ```if 1:
from sandbox_utils import Zipackage
Zipackage.install('geoip2.zip')
import geoip2.database
ip_col = kargs['ip_col']
country_col = kargs['country_col']
state_col = kargs['state_col']
city_col = kargs['city_col']
longitude_col = kargs['longitude_col']
latitude_col = kargs['latitude_col']
result=df
reader = geoip2.database.Reader(r'C:\\Temp\\GeoLite2-City.mmdb')
def geodata(ip):
try:
gd = reader.city(ip)
geo = pd.Series((gd.country.name, gd.subdivisions.most_specific.name, gd.city.name, gd.location.longitude, gd.location.latitude))
except:
geo = pd.Series((None, None, None, None, None))
return geo
result[[country_col, state_col, city_col, longitude_col, latitude_col]] = result[ip_col].apply(geodata)
```;
tbl
| evaluate python(typeof(*), code, kwargs,
external_artifacts =
pack('geoip2.zip', 'https://artifactswestus.blob.core.windows.net/public/geoip2-4.6.0.zip',
'GeoLite2-City.mmdb', 'https://artifactswestus.blob.core.windows.net/public/GeoLite2-City-20230221.mmdb')
)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let geoip_fl=(tbl:(*), ip_col:string, country_col:string, state_col:string, city_col:string, longitude_col:string, latitude_col:string)
{
let kwargs = bag_pack('ip_col', ip_col, 'country_col', country_col, 'state_col', state_col, 'city_col', city_col, 'longitude_col', longitude_col, 'latitude_col', latitude_col);
let code= ```if 1:
from sandbox_utils import Zipackage
Zipackage.install('geoip2.zip')
import geoip2.database
ip_col = kargs['ip_col']
country_col = kargs['country_col']
state_col = kargs['state_col']
city_col = kargs['city_col']
longitude_col = kargs['longitude_col']
latitude_col = kargs['latitude_col']
result=df
reader = geoip2.database.Reader(r'C:\\Temp\\GeoLite2-City.mmdb')
def geodata(ip):
try:
gd = reader.city(ip)
geo = pd.Series((gd.country.name, gd.subdivisions.most_specific.name, gd.city.name, gd.location.longitude, gd.location.latitude))
except:
geo = pd.Series((None, None, None, None, None))
return geo
result[[country_col, state_col, city_col, longitude_col, latitude_col]] = result[ip_col].apply(geodata)
```;
tbl
| evaluate python(typeof(*), code, kwargs,
external_artifacts =
pack('geoip2.zip', 'https://artifactswestus.blob.core.windows.net/public/geoip2-4.6.0.zip',
'GeoLite2-City.mmdb', 'https://artifactswestus.blob.core.windows.net/public/GeoLite2-City-20230221.mmdb')
)
};
datatable(ip:string) [
'8.8.8.8',
'20.53.203.50',
'20.81.111.85',
'20.103.85.33',
'20.84.181.62',
'205.251.242.103',
]
| extend country='', state='', city='', longitude=real(null), latitude=real(null)
| invoke geoip_fl('ip','country', 'state', 'city', 'longitude', 'latitude')
Stored
datatable(ip:string) [
'8.8.8.8',
'20.53.203.50',
'20.81.111.85',
'20.103.85.33',
'20.84.181.62',
'205.251.242.103',
]
| extend country='', state='', city='', longitude=real(null), latitude=real(null)
| invoke geoip_fl('ip','country', 'state', 'city', 'longitude', 'latitude')
Output
ip | country | state | city | longitude | latitude |
---|---|---|---|---|---|
20.103.85.33 | Netherlands | North Holland | Amsterdam | 4.8883 | 52.3716 |
20.53.203.50 | Australia | New South Wales | Sydney | 151.2006 | -33.8715 |
20.81.111.85 | United States | Virginia | Tappahannock | -76.8545 | 37.9273 |
20.84.181.62 | United States | Iowa | Des Moines | -93.6124 | 41.6021 |
205.251.242.103 | United States | Virginia | Ashburn | -77.4903 | 39.0469 |
8.8.8.8 | United States | California | Los Angeles | -118.2441 | 34.0544 |
5.11 - get_packages_version_fl()
get_packages_version_fl()
is a user-defined function that retrieves the versions of the Python engine and packages of the inline python() plugin.
The function accepts a dynamic array containing the names of the packages to check, and returns their respective versions and the Python engine version.
Syntax
T | invoke get_packages_version_fl(
packages)
Parameters
Name | Type | Required | Description |
---|---|---|---|
packages | dynamic | A dynamic array containing the names of the packages. Default is empty list to retrieve only the Python engine version. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let get_packages_version_fl = (packages:dynamic=dynamic([]))
{
let kwargs = pack('packages', packages);
let code =
```if 1:
import importlib
import sys
packages = kargs["packages"]
result = pd.DataFrame(columns=["name", "ver"])
for i in range(len(packages)):
result.loc[i, "name"] = packages[i]
try:
m = importlib.import_module(packages[i])
result.loc[i, "ver"] = m.__version__ if hasattr(m, "__version__") else "missing __version__ attribute"
except Exception as ex:
result.loc[i, "ver"] = "ERROR: " + (ex.msg if hasattr(ex, "msg") else "exception, no msg")
id = result.shape[0]
result.loc[id, "name"] = "Python"
result.loc[id, "ver"] = sys.version
```;
print 1
| evaluate python(typeof(name:string , ver:string), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Utils", docstring = "Returns version information of the Python engine and the specified packages")
get_packages_version_fl(packages:dynamic=dynamic([]))
{
let kwargs = pack('packages', packages);
let code =
```if 1:
import importlib
import sys
packages = kargs["packages"]
result = pd.DataFrame(columns=["name", "ver"])
for i in range(len(packages)):
result.loc[i, "name"] = packages[i]
try:
m = importlib.import_module(packages[i])
result.loc[i, "ver"] = m.__version__ if hasattr(m, "__version__") else "missing __version__ attribute"
except Exception as ex:
result.loc[i, "ver"] = "ERROR: " + (ex.msg if hasattr(ex, "msg") else "exception, no msg")
id = result.shape[0]
result.loc[id, "name"] = "Python"
result.loc[id, "ver"] = sys.version
```;
print 1
| evaluate python(typeof(name:string , ver:string), code, kwargs)
}
Example
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let get_packages_version_fl = (packages:dynamic=dynamic([]))
{
let kwargs = pack('packages', packages);
let code =
```if 1:
import importlib
import sys
packages = kargs["packages"]
result = pd.DataFrame(columns=["name", "ver"])
for i in range(len(packages)):
result.loc[i, "name"] = packages[i]
try:
m = importlib.import_module(packages[i])
result.loc[i, "ver"] = m.__version__ if hasattr(m, "__version__") else "missing __version__ attribute"
except Exception as ex:
result.loc[i, "ver"] = "ERROR: " + (ex.msg if hasattr(ex, "msg") else "exception, no msg")
id = result.shape[0]
result.loc[id, "name"] = "Python"
result.loc[id, "ver"] = sys.version
```;
print 1
| evaluate python(typeof(name:string , ver:string), code, kwargs)
};
get_packages_version_fl(pack_array('numpy', 'scipy', 'pandas', 'statsmodels', 'sklearn', 'onnxruntime', 'plotly'))
Stored
get_packages_version_fl(pack_array('numpy', 'scipy', 'pandas', 'statsmodels', 'sklearn', 'onnxruntime', 'plotly'))
Output
name | ver |
---|---|
numpy | 1.23.4 |
onnxruntime | 1.13.1 |
pandas | 1.5.1 |
plotly | 5.11.0 |
Python | 3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)] |
scipy | 1.9.3 |
sklearn | 1.1.3 |
statsmodels | 0.13.2 |
5.12 - kmeans_dynamic_fl()
The function kmeans_dynamic_fl()
is a UDF (user-defined function) that clusterizes a dataset using the k-means algorithm. This function is similar to kmeans_fl() just the features are supplied by a single numerical array column and not by multiple scalar columns.
Syntax
T | invoke kmeans_dynamic_fl(
k,
features_col,
cluster_col)
Parameters
Name | Type | Required | Description |
---|---|---|---|
k | int | ✔️ | The number of clusters. |
features_col | string | ✔️ | The name of the column containing the numeric array of features to be used for clustering. |
cluster_col | string | ✔️ | The name of the column to store the output cluster ID for each record. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let kmeans_dynamic_fl=(tbl:(*),k:int, features_col:string, cluster_col:string)
{
let kwargs = bag_pack('k', k, 'features_col', features_col, 'cluster_col', cluster_col);
let code = ```if 1:
from sklearn.cluster import KMeans
k = kargs["k"]
features_col = kargs["features_col"]
cluster_col = kargs["cluster_col"]
df1 = df[features_col].apply(np.array)
matrix = np.vstack(df1.values)
kmeans = KMeans(n_clusters=k, random_state=0)
kmeans.fit(matrix)
result = df
result[cluster_col] = kmeans.labels_
```;
tbl
| evaluate python(typeof(*),code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\ML", docstring = "K-Means clustering of features passed as a single column containing numerical array")
kmeans_dynamic_fl(tbl:(*),k:int, features_col:string, cluster_col:string)
{
let kwargs = bag_pack('k', k, 'features_col', features_col, 'cluster_col', cluster_col);
let code = ```if 1:
from sklearn.cluster import KMeans
k = kargs["k"]
features_col = kargs["features_col"]
cluster_col = kargs["cluster_col"]
df1 = df[features_col].apply(np.array)
matrix = np.vstack(df1.values)
kmeans = KMeans(n_clusters=k, random_state=0)
kmeans.fit(matrix)
result = df
result[cluster_col] = kmeans.labels_
```;
tbl
| evaluate python(typeof(*),code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Clustering of artificial dataset with three clusters
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let kmeans_dynamic_fl=(tbl:(*),k:int, features_col:string, cluster_col:string)
{
let kwargs = bag_pack('k', k, 'features_col', features_col, 'cluster_col', cluster_col);
let code = ```if 1:
from sklearn.cluster import KMeans
k = kargs["k"]
features_col = kargs["features_col"]
cluster_col = kargs["cluster_col"]
df1 = df[features_col].apply(np.array)
matrix = np.vstack(df1.values)
kmeans = KMeans(n_clusters=k, random_state=0)
kmeans.fit(matrix)
result = df
result[cluster_col] = kmeans.labels_
```;
tbl
| evaluate python(typeof(*),code, kwargs)
};
union
(range x from 1 to 100 step 1 | extend x=rand()+3, y=rand()+2),
(range x from 101 to 200 step 1 | extend x=rand()+1, y=rand()+4),
(range x from 201 to 300 step 1 | extend x=rand()+2, y=rand()+6)
| project Features=pack_array(x, y), cluster_id=int(null)
| invoke kmeans_dynamic_fl(3, "Features", "cluster_id")
| extend x=toreal(Features[0]), y=toreal(Features[1])
| render scatterchart with(series=cluster_id)
Stored
union
(range x from 1 to 100 step 1 | extend x=rand()+3, y=rand()+2),
(range x from 101 to 200 step 1 | extend x=rand()+1, y=rand()+4),
(range x from 201 to 300 step 1 | extend x=rand()+2, y=rand()+6)
| project Features=pack_array(x, y), cluster_id=int(null)
| invoke kmeans_dynamic_fl(3, "Features", "cluster_id")
| extend x=toreal(Features[0]), y=toreal(Features[1])
| render scatterchart with(series=cluster_id)
5.13 - kmeans_fl()
The function kmeans_fl()
is a UDF (user-defined function) that clusterizes a dataset using the k-means algorithm.
Syntax
T | invoke kmeans_fl(
k,
features,
cluster_col)
Parameters
Name | Type | Required | Description |
---|---|---|---|
k | int | ✔️ | The number of clusters. |
features | dynamic | ✔️ | An array containing the names of the features columns to use for clustering. |
cluster_col | string | ✔️ | The name of the column to store the output cluster ID for each record. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let kmeans_fl=(tbl:(*), k:int, features:dynamic, cluster_col:string)
{
let kwargs = bag_pack('k', k, 'features', features, 'cluster_col', cluster_col);
let code = ```if 1:
from sklearn.cluster import KMeans
k = kargs["k"]
features = kargs["features"]
cluster_col = kargs["cluster_col"]
km = KMeans(n_clusters=k)
df1 = df[features]
km.fit(df1)
result = df
result[cluster_col] = km.labels_
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create function with (folder = "Packages\\ML", docstring = "K-Means clustering")
kmeans_fl(tbl:(*), k:int, features:dynamic, cluster_col:string)
{
let kwargs = bag_pack('k', k, 'features', features, 'cluster_col', cluster_col);
let code = ```if 1:
from sklearn.cluster import KMeans
k = kargs["k"]
features = kargs["features"]
cluster_col = kargs["cluster_col"]
km = KMeans(n_clusters=k)
df1 = df[features]
km.fit(df1)
result = df
result[cluster_col] = km.labels_
```;
tbl
| evaluate python(typeof(*), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Clusterize artificial dataset with three clusters
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let kmeans_fl=(tbl:(*), k:int, features:dynamic, cluster_col:string)
{
let kwargs = bag_pack('k', k, 'features', features, 'cluster_col', cluster_col);
let code = ```if 1:
from sklearn.cluster import KMeans
k = kargs["k"]
features = kargs["features"]
cluster_col = kargs["cluster_col"]
km = KMeans(n_clusters=k)
df1 = df[features]
km.fit(df1)
result = df
result[cluster_col] = km.labels_
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
OccupancyDetection
| extend cluster_id=int(null)
union
(range x from 1 to 100 step 1 | extend x=rand()+3, y=rand()+2),
(range x from 101 to 200 step 1 | extend x=rand()+1, y=rand()+4),
(range x from 201 to 300 step 1 | extend x=rand()+2, y=rand()+6)
| invoke kmeans_fl(3, bag_pack("x", "y"), "cluster_id")
| render scatterchart with(series=cluster_id)
Stored
union
(range x from 1 to 100 step 1 | extend x=rand()+3, y=rand()+2),
(range x from 101 to 200 step 1 | extend x=rand()+1, y=rand()+4),
(range x from 201 to 300 step 1 | extend x=rand()+2, y=rand()+6)
| invoke kmeans_fl(3, bag_pack("x", "y"), "cluster_id")
| render scatterchart with(series=cluster_id)
5.14 - ks_test_fl()
The function ks_test_fl()
is a UDF (user-defined function) that performs the Kolmogorov Smirnov Test.
Syntax
T | invoke ks_test_fl(
data1,
data2,
test_statistic,
p_value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
data1 | string | ✔️ | The name of the column containing the first set of data to be used for the test. |
data2 | string | ✔️ | The name of the column containing the second set of data to be used for the test. |
test_statistic | string | ✔️ | The name of the column to store test statistic value for the results. |
p_value | string | ✔️ | The name of the column to store p-value for the results. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let ks_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
let code = ```if 1:
from scipy import stats
data1 = kargs["data1"]
data2 = kargs["data2"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
def func(row):
statistics = stats.ks_2samp(row[data1], row[data2])
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Stats", docstring = "Kolmogorov Smirnov Test")
ks_test_fl(tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
let code = ```if 1:
from scipy import stats
data1 = kargs["data1"]
data2 = kargs["data2"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
def func(row):
statistics = stats.ks_2samp(row[data1], row[data2])
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let ks_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
let code = ```if 1:
from scipy import stats
data1 = kargs["data1"]
data2 = kargs["data2"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
def func(row):
statistics = stats.ks_2samp(row[data1], row[data2])
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke ks_test_fl('sample1', 'sample2', 'test_stat', 'p_val')
Stored
datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke ks_test_fl('sample1', 'sample2', 'test_stat', 'p_val')
Output
id | sample1 | sample2 | test_stat | p_val |
---|---|---|---|---|
Test #1 | [23.64, 20.57, 20.42] | [27.1, 22.12, 33.56] | 0.66666666666666674 | 0.3197243332709643 |
Test #2 | [20.85, 21.89, 23.41] | [35.09, 30.02, 26.52] | 1 | 0.03262165165202116 |
Test #3 | [20.13, 20.5, 21.7, 22.02] | [32.2, 32.79, 33.9, 34.22] | 1 | 0.01106563701580386 |
5.15 - levene_test_fl()
The function levene_test_fl()
is a UDF (user-defined function) that performs the Levene Test.
Syntax
T | invoke levene_test_fl(
data1,
data2,
test_statistic,
p_value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
data1 | string | ✔️ | The name of the column containing the first set of data to be used for the test. |
data2 | string | ✔️ | The name of the column containing the second set of data to be used for the test. |
test_statistic | string | ✔️ | The name of the column to store test statistic value for the results. |
p_value | string | ✔️ | The name of the column to store p-value for the results. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
<!-- let levene_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
let code = ```if 1:
from scipy import stats
data1 = kargs["data1"]
data2 = kargs["data2"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
def func(row):
statistics = stats.levene(row[data1], row[data2])
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Stats", docstring = "Levene Test")
levene_test_fl(tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
let code = ```if 1:
from scipy import stats
data1 = kargs["data1"]
data2 = kargs["data2"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
def func(row):
statistics = stats.levene(row[data1], row[data2])
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
<!-- let levene_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
let code = ```if 1:
from scipy import stats
data1 = kargs["data1"]
data2 = kargs["data2"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
def func(row):
statistics = stats.levene(row[data1], row[data2])
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke levene_test_fl('sample1', 'sample2', 'test_stat', 'p_val')
Stored
datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke levene_test_fl('sample1', 'sample2', 'test_stat', 'p_val')
Output
id | sample1 | sample2 | test_stat | p_val |
---|---|---|---|---|
Test #1 | [23.64, 20.57, 20.42] | [27.1, 22.12, 33.56] | 1.5587395987367387 | 0.27993504690044563 |
Test #2 | [20.85, 21.89, 23.41] | [35.09, 30.02, 26.52] | 1.6402495788130482 | 0.26950872948841353 |
Test #3 | [20.13, 20.5, 21.7, 22.02] | [32.2, 32.79, 33.9, 34.22] | 0.0032989690721642395 | 0.95606240301049072 |
5.16 - log_reduce_fl()
The function log_reduce_fl()
finds common patterns in semi-structured textual columns, such as log lines, and clusters the lines according to the extracted patterns. It outputs a summary table containing the found patterns sorted top down by their respective frequency.
Syntax
T |
invoke
log_reduce_fl(
reduce_col [,
use_logram [,
use_drain [,
custom_regexes [,
custom_regexes_policy [,
delimiters [,
similarity_th [,
tree_depth [,
trigram_th [,
bigram_th ]]]]]]]]])
Parameters
The following parameters description is a summary. For more information, see More about the algorithm section.
Name | Type | Required | Description |
---|---|---|---|
reduce_col | string | ✔️ | The name of the string column the function is applied to. |
use_logram | bool | Enable or disable the Logram algorithm. Default value is true . | |
use_drain | bool | Enable or disable the Drain algorithm. Default value is true . | |
custom_regexes | dynamic | A dynamic array containing pairs of regular expression and replacement symbols to be searched in each input row, and replaced with their respective matching symbol. Default value is dynamic([]) . The default regex table replaces numbers, IP addresses, and GUIDs. | |
custom_regexes_policy | string | Either ‘prepend’, ‘append’ or ‘replace’. Controls whether custom_regexes are prepend/append/replace the default ones. Default value is ‘prepend’. | |
delimiters | dynamic | A dynamic array containing delimiter strings. Default value is dynamic([" "]) , defining space as the only single character delimiter. | |
similarity_th | real | Similarity threshold, used by the Drain algorithm. Increasing similarity_th results in more refined databases. Default value is 0.5. If Drain is disabled, then this parameter has no effect. | |
tree_depth | int | Increasing tree_depth improves the runtime of the Drain algorithm, but might reduce its accuracy. Default value is 4. If Drain is disabled, then this parameter has no effect. | |
trigram_th | int | Decreasing trigram_th increases the chances of Logram to replace tokens with wildcards. Default value is 10. If Logram is disabled, then this parameter has no effect. | |
bigram_th | int | Decreasing bigram_th increases the chances of Logram to replace tokens with wildcards. Default value is 15. If Logram is disabled, then this parameter has no effect. |
More about the algorithm
The function runs multiples passes over the rows to be reduced to common patterns. The following list explains the passes:
Regular expression replacements: In this pass, each line is independently matched to a set of regular expressions, and each matched expression is replaced by a replacement symbol. The default regular expressions replace IP addresses, numbers, and GUIDs with /<IP>, <GUID> and /<NUM>. The user can prepend/append more regular expressions to those, or replace it with new ones or empty list by modifying custom_regexes and custom_regexes_policy. For example, to replace whole numbers with <WNUM> set custom_regexes=pack_array(’/^\d+$/’, ‘<WNUM>’); to cancel regular expressions replacement set custom_regexes_policy=‘replace’. For each line, the function keeps list of the original expressions (before replacements) to be output as parameters of the generic replacement tokens.
Tokenization: similar to the previous step, each line is processed independently and broken into tokens based on set of delimiters. For example, to define breaking to tokens by either comma, period or semicolon set delimiters=pack_array(’,’, ‘.’, ‘;’).
Apply Logram algorithm: this pass is optional, pending use_logram is true. We recommend using Logram when large scale is required, and when parameters can appear in the first tokens of the log entry. OTOH, disable it when the log entries are short, as the algorithm tends to replace tokens with wildcards too often in such cases. The Logram algorithm considers 3-tuples and 2-tuples of tokens. If a 3-tuple of tokens is common in the log lines (it appears more than trigram_th times), then it’s likely that all three tokens are part of the pattern. If the 3-tuple is rare, then it’s likely that it contains a variable that should be replaced by a wildcard. For rare 3-tuples, we consider the frequency with which 2-tuples contained in the 3-tuple appear. If a 2-tuple is common (it appears more than bigram_th times), then the remaining token is likely to be a parameter, and not part of the pattern.
The Logram algorithm is easy to parallelize. It requires two passes on the log corpus: the first one to count the frequency of each 3-tuple and 2-tuple, and the second one to apply the logic previously described to each entry. To parallelize the algorithm, we only need to partition the log entries, and unify the frequency counts of different workers.Apply Drain algorithm: this pass is optional, pending use_drain is true. Drain is a log parsing algorithm based on a truncated depth prefix tree. Log messages are split according to their length, and for each length the first tree_depth tokens of the log message are used to build a prefix tree. If no match for the prefix tokens was found, a new branch is created. If a match for the prefix was found, we search for the most similar pattern among the patterns contained in the tree leaf. Pattern similarity is measured by the ratio of matched nonwildcard tokens out of all tokens. If the similarity of the most similar pattern is above the similarity threshold (the parameter similarity_th), then the log entry is matched to the pattern. For that pattern, the function replaces all nonmatching tokens by wildcards. If the similarity of the most similar pattern is below the similarity threshold, a new pattern containing the log entry is created.
We set default tree_depth to 4 based on testing various logs. Increasing this depth can improve runtime but might degrade patterns accuracy; decreasing it’s more accurate but slower, as each node performs many more similarity tests.
Usually, Drain efficiently generalizes and reduces patterns (though it’s hard to be parallelized). However, as it relies on a prefix tree, it might not be optimal in log entries containing parameters in the first tokens. This can be resolved in most cases by applying Logram first.
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let log_reduce_fl=(tbl:(*), reduce_col:string,
use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>',
'([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>',
'(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', 'LogReduce', 'parameters_column', '',
'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table,
'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th,
'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn',
'output_type', 'summary');
let code = ```if 1:
from log_cluster import log_reduce
result = log_reduce.log_reduce(df, kargs)
```;
tbl
| extend LogReduce=''
| evaluate python(typeof(Count:int, LogReduce:string, example:string), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = 'Packages\\Text', docstring = 'Find common patterns in textual logs, output a summary table')
log_reduce_fl(tbl:(*), reduce_col:string,
use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>',
'([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>',
'(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', 'LogReduce', 'parameters_column', '',
'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table,
'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th,
'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn',
'output_type', 'summary');
let code = ```if 1:
from log_cluster import log_reduce
result = log_reduce.log_reduce(df, kargs)
```;
tbl
| extend LogReduce=''
| evaluate python(typeof(Count:int, LogReduce:string, example:string), code, kwargs)
}
Example
The following example uses the invoke operator to run the function. This example uses Apache Hadoop distributed file system logs.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let log_reduce_fl=(tbl:(*), reduce_col:string,
use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>',
'([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>',
'(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', 'LogReduce', 'parameters_column', '',
'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table,
'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th,
'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn',
'output_type', 'summary');
let code = ```if 1:
from log_cluster import log_reduce
result = log_reduce.log_reduce(df, kargs)
```;
tbl
| extend LogReduce=''
| evaluate python(typeof(Count:int, LogReduce:string, example:string), code, kwargs)
};
//
// Finding common patterns in HDFS logs, a commonly used benchmark for log parsing
//
HDFS_log
| take 100000
| invoke log_reduce_fl(reduce_col="data")
Stored
//
// Finding common patterns in HDFS logs, a commonly used benchmark for log parsing
//
HDFS_log
| take 100000
| invoke log_reduce_fl(reduce_col="data")
Output
Count | LogReduce | Example |
---|---|---|
55356 | 081110 | <NUM> <NUM> INFO dfs.FSNamesystem: BLOCK* NameSystem.delete: blk_<NUM> is added to invalidSet of <IP> 081110 220623 26 INFO dfs.FSNamesystem: BLOCK* NameSystem.delete: blk_1239016582509138045 is added to invalidSet of 10.251.123.195:50010 |
10278 | 081110 | <NUM> <NUM> INFO dfs.FSNamesystem: BLOCK* NameSystem.addStoredBlock: blockMap updated: <IP> is added to blk_<NUM> size <NUM> 081110 215858 27 INFO dfs.FSNamesystem: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.250.11.85:50010 is added to blk_5080254298708411681 size 67108864 |
10256 | 081110 | <NUM> <NUM> INFO dfs.DataNode$PacketResponder: PacketResponder <NUM> for block blk_<NUM> terminating 081110 215858 15496 INFO dfs.DataNode$PacketResponder: PacketResponder 2 for block blk_-7746692545918257727 terminating |
10256 | 081110 | <NUM> <NUM> INFO dfs.DataNode$PacketResponder: Received block blk_<NUM> of size <NUM> from <IP> 081110 215858 15485 INFO dfs.DataNode$PacketResponder: Received block blk_5080254298708411681 of size 67108864 from /10.251.43.21 |
9140 | 081110 | <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> 081110 215858 15494 INFO dfs.DataNode$DataXceiver: Receiving block blk_-7037346755429293022 src: /10.251.43.21:45933 dest: /10.251.43.21:50010 |
3047 | 081110 | <NUM> <NUM> INFO dfs.FSNamesystem: BLOCK* NameSystem.allocateBlock: /user/root/rand3/temporary/task<NUM><NUM>m<NUM>_<NUM>/part-<NUM>. <> 081110 215858 26 INFO dfs.FSNamesystem: BLOCK NameSystem.allocateBlock: /user/root/rand3/_temporary/task_200811101024_0005_m_001805_0/part-01805. blk-7037346755429293022 |
1402 | 081110 | <NUM> <NUM> INFO <>: <> block blk_<NUM> <> <> 081110 215957 15556 INFO dfs.DataNode$DataTransfer: 10.250.15.198:50010:Transmitted block blk_-3782569120714539446 to /10.251.203.129:50010 |
177 | 081110 | <NUM> <NUM> INFO <>: <> <> <> <*> 081110 215859 13 INFO dfs.DataBlockScanner: Verification succeeded for blk_-7244926816084627474 |
36 | 081110 | <NUM> <NUM> INFO <>: <> <> <> for block <*> 081110 215924 15636 INFO dfs.DataNode$BlockReceiver: Receiving empty packet for block blk_3991288654265301939 |
12 | 081110 | <NUM> <NUM> INFO dfs.FSNamesystem: BLOCK* <> <> <> <> <> <> <> <> 081110 215953 19 INFO dfs.FSNamesystem: BLOCK* ask 10.250.15.198:50010 to replicate blk_-3782569120714539446 to datanode(s) 10.251.203.129:50010 |
12 | 081110 | <NUM> <NUM> INFO <>: <> <> <> <> <> block blk_<NUM> <> <> 081110 215955 18 INFO dfs.DataNode: 10.250.15.198:50010 Starting thread to transfer block blk_-3782569120714539446 to 10.251.203.129:50010 |
12 | 081110 | <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Received block blk_<NUM> src: <IP> dest: <IP> of size <NUM> 081110 215957 15226 INFO dfs.DataNode$DataXceiver: Received block blk_-3782569120714539446 src: /10.250.15.198:51013 dest: /10.250.15.198:50010 of size 14474705 |
6 | 081110 | <NUM> <NUM> <> dfs.FSNamesystem: BLOCK NameSystem.addStoredBlock: <> <> <> <> <> <> <> <> size <NUM> 081110 215924 27 WARN dfs.FSNamesystem: BLOCK* NameSystem.addStoredBlock: Redundant addStoredBlock request received for blk_2522553781740514003 on 10.251.202.134:50010 size 67108864 |
6 | 081110 | <NUM> <NUM> INFO dfs.DataNode$DataXceiver: <> <> <> <> <>: <> <> <> <> <> 081110 215936 15714 INFO dfs.DataNode$DataXceiver: writeBlock blk_720939897861061328 received exception java.io.IOException: Couldn’t read from stream |
3 | 081110 | <NUM> <NUM> INFO dfs.FSNamesystem: BLOCK* NameSystem.addStoredBlock: <> <> <> <> <> <> <> size <NUM> <> <> <> <> <> <> <> <>. 081110 220635 28 INFO dfs.FSNamesystem: BLOCK NameSystem.addStoredBlock: addStoredBlock request received for blk_-81196479666306310 on 10.250.17.177:50010 size 53457811 But it doesn’t belong to any file. |
1 | 081110 | <NUM> <NUM> <> <>: <> <> <> <> <> <> <>. <> <> <> <> <>. 081110 220631 19 WARN dfs.FSDataset: Unexpected error trying to delete block blk_-2012154052725261337. BlockInfo not found in volumeMap. |
5.17 - log_reduce_full_fl()
The function log_reduce_full_fl()
finds common patterns in semi structured textual columns, such as log lines, and clusters the lines according to the extracted patterns. The function’s algorithm and most of the parameters are identical to log_reduce_fl(). However, log_reduce_fl()
outputs a patterns summary table, whereas this function outputs a full table containing the pattern and parameters per each line.
Syntax
T |
invoke
log_reduce_full_fl(
reduce_col [,
pattern_col [,
parameters_col [,
use_logram [,
use_drain [,
custom_regexes [,
custom_regexes_policy [,
delimiters [,
similarity_th [,
tree_depth [,
trigram_th [,
bigram_th ]]]]]]]]]]])
Parameters
The following parameters description is a summary. For more information, see More about the algorithm section.
Name | Type | Required | Description |
---|---|---|---|
reduce_col | string | ✔️ | The name of the string column the function is applied to. |
pattern_col | string | ✔️ | The name of the string column to populate the pattern. |
parameters_col | string | ✔️ | The name of the string column to populate the pattern’s parameters. |
use_logram | bool | Enable or disable the Logram algorithm. Default value is true . | |
use_drain | bool | Enable or disable the Drain algorithm. Default value is true . | |
custom_regexes | dynamic | A dynamic array containing pairs of regular expression and replacement symbols to be searched in each input row, and replaced with their respective matching symbol. Default value is dynamic([]) . The default regex table replaces numbers, IPs and GUIDs. | |
custom_regexes_policy | string | Either ‘prepend’, ‘append’ or ‘replace’. Controls whether custom_regexes are prepend/append/replace the default ones. Default value is ‘prepend’. | |
delimiters | dynamic | A dynamic array containing delimiter strings. Default value is dynamic([" "]) , defining space as the only single character delimiter. | |
similarity_th | real | Similarity threshold, used by the Drain algorithm. Increasing similarity_th results in more refined clusters. Default value is 0.5. If Drain is disabled, then this parameter has no effect. | |
tree_depth | int | Increasing tree_depth improves the runtime of the Drain algorithm, but might reduce its accuracy. Default value is 4. If Drain is disabled, then this parameter has no effect. | |
trigram_th | int | Decreasing trigram_th increases the chances of Logram to replace tokens with wildcards. Default value is 10. If Logram is disabled, then this parameter has no effect. | |
bigram_th | int | Decreasing bigram_th increases the chances of Logram to replace tokens with wildcards. Default value is 15. If Logram is disabled, then this parameter has no effect. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let log_reduce_full_fl=(tbl:(*), reduce_col:string, pattern_col:string, parameters_col:string,
use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>',
'([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>',
'(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', pattern_col, 'parameters_column', parameters_col,
'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table,
'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th,
'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn',
'output_type', 'full');
let code = ```if 1:
from log_cluster import log_reduce
result = log_reduce.log_reduce(df, kargs)
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = 'Packages\\Text', docstring = 'Find common patterns in textual logs, output a full table')
log_reduce_full_fl(tbl:(*), reduce_col:string, pattern_col:string, parameters_col:string,
use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>',
'([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>',
'(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', pattern_col, 'parameters_column', parameters_col,
'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table,
'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th,
'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn',
'output_type', 'full');
let code = ```if 1:
from log_cluster import log_reduce
result = log_reduce.log_reduce(df, kargs)
```;
tbl
| evaluate python(typeof(*), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let log_reduce_full_fl=(tbl:(*), reduce_col:string, pattern_col:string, parameters_col:string,
use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>',
'([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>',
'(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', pattern_col, 'parameters_column', parameters_col,
'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table,
'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th,
'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn',
'output_type', 'full');
let code = ```if 1:
from log_cluster import log_reduce
result = log_reduce.log_reduce(df, kargs)
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
//
// Finding common patterns in HDFS logs, a commonly used benchmark for log parsing
//
HDFS_log
| take 100000
| extend Patterns="", Parameters=""
| invoke log_reduce_full_fl(reduce_col="data", pattern_col="Patterns", parameters_col="Parameters")
| take 10
Stored
//
// Finding common patterns in HDFS logs, a commonly used benchmark for log parsing
//
HDFS_log
| take 100000
| extend Patterns="", Parameters=""
| invoke log_reduce_full_fl(reduce_col="data", pattern_col="Patterns", parameters_col="Parameters")
| take 10
Output
data | Patterns | Parameters |
---|---|---|
081110 | 215858 | 15485 INFO dfs.DataNode$PacketResponder: Received block blk_5080254298708411681 of size 67108864 from /10.251.43.21 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: Received block blk_<NUM> of size <NUM> from <IP> “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15485"”, ““parameter_2"”: ““5080254298708411681"”, ““parameter_3"”: ““67108864"”, ““parameter_4"”: “"/10.251.43.21"”}” |
081110 | 215858 | 15494 INFO dfs.DataNode$DataXceiver: Receiving block blk_-7037346755429293022 src: /10.251.43.21:45933 dest: /10.251.43.21:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15494"”, ““parameter_2"”: “"-7037346755429293022"”, ““parameter_3"”: “"/10.251.43.21:45933"”, ““parameter_4"”: “"/10.251.43.21:50010"”}” |
081110 | 215858 | 15496 INFO dfs.DataNode$PacketResponder: PacketResponder 2 for block blk_-7746692545918257727 terminating 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: PacketResponder <NUM> for block blk_<NUM> terminating “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15496"”, ““parameter_2"”: ““2"”, ““parameter_3"”: “"-7746692545918257727"”}” |
081110 | 215858 | 15496 INFO dfs.DataNode$PacketResponder: Received block blk_-7746692545918257727 of size 67108864 from /10.251.107.227 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: Received block blk_<NUM> of size <NUM> from <IP> “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15496"”, ““parameter_2"”: “"-7746692545918257727"”, ““parameter_3"”: ““67108864"”, ““parameter_4"”: “"/10.251.107.227"”}” |
081110 | 215858 | 15511 INFO dfs.DataNode$DataXceiver: Receiving block blk_-8578644687709935034 src: /10.251.107.227:39600 dest: /10.251.107.227:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15511"”, ““parameter_2"”: “"-8578644687709935034"”, ““parameter_3"”: “"/10.251.107.227:39600"”, ““parameter_4"”: “"/10.251.107.227:50010"”}” |
081110 | 215858 | 15514 INFO dfs.DataNode$DataXceiver: Receiving block blk_722881101738646364 src: /10.251.75.79:58213 dest: /10.251.75.79:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15514"”, ““parameter_2"”: ““722881101738646364"”, ““parameter_3"”: “"/10.251.75.79:58213"”, ““parameter_4"”: “"/10.251.75.79:50010"”}” |
081110 | 215858 | 15517 INFO dfs.DataNode$PacketResponder: PacketResponder 2 for block blk_-7110736255599716271 terminating 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: PacketResponder <NUM> for block blk_<NUM> terminating “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15517"”, ““parameter_2"”: ““2"”, ““parameter_3"”: “"-7110736255599716271"”}” |
081110 | 215858 | 15517 INFO dfs.DataNode$PacketResponder: Received block blk_-7110736255599716271 of size 67108864 from /10.251.42.246 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: Received block blk_<NUM> of size <NUM> from <IP> “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15517"”, ““parameter_2"”: “"-7110736255599716271"”, ““parameter_3"”: ““67108864"”, ““parameter_4"”: “"/10.251.42.246"”}” |
081110 | 215858 | 15533 INFO dfs.DataNode$DataXceiver: Receiving block blk_7257432994295824826 src: /10.251.26.8:41803 dest: /10.251.26.8:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15533"”, ““parameter_2"”: ““7257432994295824826"”, ““parameter_3"”: “"/10.251.26.8:41803"”, ““parameter_4"”: “"/10.251.26.8:50010"”}” |
081110 | 215858 | 15533 INFO dfs.DataNode$DataXceiver: Receiving block blk_-7771332301119265281 src: /10.251.43.210:34258 dest: /10.251.43.210:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15533"”, ““parameter_2"”: “"-7771332301119265281"”, ““parameter_3"”: “"/10.251.43.210:34258"”, ““parameter_4"”: “"/10.251.43.210:50010"”}” |
5.18 - log_reduce_predict_fl()
The function log_reduce_predict_fl()
parses semi structured textual columns, such as log lines, and for each line it matches the respective pattern from a pretrained model or reports an anomaly if no matching pattern was found. The function’s’ output is similar to log_reduce_fl(), though the patterns are retrieved from a pretrained model that generated by log_reduce_train_fl().
Syntax
T |
invoke
log_reduce_predict_fl(
models_tbl,
model_name,
reduce_col [,
anomaly_str ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
models_tbl | table | ✔️ | A table containing models generated by log_reduce_train_fl(). The table’s schema should be (name:string, timestamp: datetime, model:string). |
model_name | string | ✔️ | The name of the model that will be retrieved from models_tbl. If the table contains few models matching the model name, the latest one is used. |
reduce_col | string | ✔️ | The name of the string column the function is applied to. |
anomaly_str | string | This string is output for lines that have no matched pattern in the model. Default value is “ANOMALY”. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let log_reduce_predict_fl=(tbl:(*), models_tbl: (name:string, timestamp: datetime, model:string),
model_name:string, reduce_col:string, anomaly_str: string = 'ANOMALY')
{
let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
let kwargs = bag_pack('logs_col', reduce_col, 'output_patterns_col', 'LogReduce','output_parameters_col', '',
'model', model_str, 'anomaly_str', anomaly_str, 'output_type', 'summary');
let code = ```if 1:
from log_cluster import log_reduce_predict
result = log_reduce_predict.log_reduce_predict(df, kargs)
```;
tbl
| evaluate hint.distribution=per_node python(typeof(Count:int, LogReduce:string, example:string), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = 'Packages\\Text', docstring = 'Apply a trained model to find common patterns in textual logs, output a summary table')
log_reduce_predict_fl(tbl:(*), models_tbl: (name:string, timestamp: datetime, model:string),
model_name:string, reduce_col:string, anomaly_str: string = 'ANOMALY')
{
let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
let kwargs = bag_pack('logs_col', reduce_col, 'output_patterns_col', 'LogReduce','output_parameters_col', '',
'model', model_str, 'anomaly_str', anomaly_str, 'output_type', 'summary');
let code = ```if 1:
from log_cluster import log_reduce_predict
result = log_reduce_predict.log_reduce_predict(df, kargs)
```;
tbl
| evaluate hint.distribution=per_node python(typeof(Count:int, LogReduce:string, example:string), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let log_reduce_predict_fl=(tbl:(*), models_tbl: (name:string, timestamp: datetime, model:string),
model_name:string, reduce_col:string, anomaly_str: string = 'ANOMALY')
{
let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
let kwargs = bag_pack('logs_col', reduce_col, 'output_patterns_col', 'LogReduce','output_parameters_col', '',
'model', model_str, 'anomaly_str', anomaly_str, 'output_type', 'summary');
let code = ```if 1:
from log_cluster import log_reduce_predict
result = log_reduce_predict.log_reduce_predict(df, kargs)
```;
tbl
| evaluate hint.distribution=per_node python(typeof(Count:int, LogReduce:string, example:string), code, kwargs)
};
HDFS_log_100k
| take 1000
| invoke log_reduce_predict_fl(models_tbl=ML_Models, model_name="HDFS_100K", reduce_col="data")
Stored
HDFS_log_100k
| take 1000
| invoke log_reduce_predict_fl(models_tbl=ML_Models, model_name="HDFS_100K", reduce_col="data")
Output
Count | LogReduce | example |
---|---|---|
239 | 081110 | <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> 081110 215858 15494 INFO dfs.DataNode$DataXceiver: Receiving block blk_-7037346755429293022 src: /10.251.43.21:45933 dest: /10.251.43.21:50010 |
231 | 081110 | <NUM> <NUM> INFO dfs.DataNode$PacketResponder: Received block blk_<NUM> of size <NUM> from <IP> 081110 215858 15485 INFO dfs.DataNode$PacketResponder: Received block blk_5080254298708411681 of size 67108864 from /10.251.43.21 |
230 | 081110 | <NUM> <NUM> INFO dfs.DataNode$PacketResponder: PacketResponder <NUM> for block blk_<NUM> terminating 081110 215858 15496 INFO dfs.DataNode$PacketResponder: PacketResponder 2 for block blk_-7746692545918257727 terminating |
218 | 081110 | <NUM> <NUM> INFO dfs.FSNamesystem: BLOCK* NameSystem.addStoredBlock: blockMap updated: <IP> is added to blk_<NUM> size <NUM> 081110 215858 27 INFO dfs.FSNamesystem: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.250.11.85:50010 is added to blk_5080254298708411681 size 67108864 |
79 | 081110 | <NUM> <NUM> INFO dfs.FSNamesystem: BLOCK* NameSystem.allocateBlock: <>. <> 081110 215858 26 INFO dfs.FSNamesystem: BLOCK* NameSystem.allocateBlock: /user/root/rand3/_temporary/task_200811101024_0005_m_001805_0/part-01805. blk-7037346755429293022 |
3 | 081110 | <NUM> <NUM> INFO dfs.DataBlockScanner: Verification succeeded for <*> 081110 215859 13 INFO dfs.DataBlockScanner: Verification succeeded for blk_-7244926816084627474 |
5.19 - log_reduce_predict_full_fl()
The function log_reduce_predict_full_fl()
parses semi structured textual columns, such as log lines, and for each line it matches the respective pattern from a pretrained model or reports an anomaly if no matching pattern was found. The patterns are retrieved from a pretrained model, generated by log_reduce_train_fl()
. The function is similar to log_reduce_predict_fl(), but unlike log_reduce_predict_fl() that outputs a patterns summary table, this function outputs a full table containing the pattern and parameters per each line.
Syntax
T |
invoke
log_reduce_predict_full_fl(
models_tbl,
model_name,
reduce_col,
pattern_col,
parameters_col [,
anomaly_str ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
models_tbl | table | ✔️ | A table containing models generated by log_reduce_train_fl(). The table’s schema should be (name:string, timestamp: datetime, model:string). |
model_name | string | ✔️ | The name of the model that will be retrieved from models_tbl. If the table contains few models matching the model name, the latest one is used. |
reduce_col | string | ✔️ | The name of the string column the function is applied to. |
pattern_col | string | ✔️ | The name of the string column to populate the pattern. |
parameters_col | string | ✔️ | The name of the string column to populate the pattern’s parameters. |
anomaly_str | string | This string is output for lines that have no matched pattern in the model. Default value is “ANOMALY”. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let log_reduce_predict_full_fl=(tbl:(*), models_tbl: (name:string, timestamp: datetime, model:string),
model_name:string, reduce_col:string, pattern_col:string, parameters_col:string,
anomaly_str: string = 'ANOMALY')
{
let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
let kwargs = bag_pack('logs_col', reduce_col, 'output_patterns_col', pattern_col,'output_parameters_col',
parameters_col, 'model', model_str, 'anomaly_str', anomaly_str, 'output_type', 'full');
let code = ```if 1:
from log_cluster import log_reduce_predict
result = log_reduce_predict.log_reduce_predict(df, kargs)
```;
tbl
| evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = 'Packages\\Text', docstring = 'Apply a trained model to find common patterns in textual logs, output a full table')
log_reduce_predict_full_fl(tbl:(*), models_tbl: (name:string, timestamp: datetime, model:string),
model_name:string, reduce_col:string, pattern_col:string, parameters_col:string,
anomaly_str: string = 'ANOMALY')
{
let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
let kwargs = bag_pack('logs_col', reduce_col, 'output_patterns_col', pattern_col,'output_parameters_col',
parameters_col, 'model', model_str, 'anomaly_str', anomaly_str, 'output_type', 'full');
let code = ```if 1:
from log_cluster import log_reduce_predict
result = log_reduce_predict.log_reduce_predict(df, kargs)
```;
tbl
| evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let log_reduce_predict_full_fl=(tbl:(*), models_tbl: (name:string, timestamp: datetime, model:string),
model_name:string, reduce_col:string, pattern_col:string, parameters_col:string,
anomaly_str: string = 'ANOMALY')
{
let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
let kwargs = bag_pack('logs_col', reduce_col, 'output_patterns_col', pattern_col,'output_parameters_col',
parameters_col, 'model', model_str, 'anomaly_str', anomaly_str, 'output_type', 'full');
let code = ```if 1:
from log_cluster import log_reduce_predict
result = log_reduce_predict.log_reduce_predict(df, kargs)
```;
tbl
| evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
};
HDFS_log_100k
| extend Patterns='', Parameters=''
| take 10
| invoke log_reduce_predict_full_fl(models_tbl=ML_Models, model_name="HDFS_100K", reduce_col="data", pattern_col="Patterns", parameters_col="Parameters")
Stored
HDFS_log_100k
| extend Patterns='', Parameters=''
| take 10
| invoke log_reduce_predict_full_fl(models_tbl=ML_Models, model_name="HDFS_100K", reduce_col="data", pattern_col="Patterns", parameters_col="Parameters")
Output
data | Patterns | Parameters |
---|---|---|
081110 | 215858 | 15485 INFO dfs.DataNode$PacketResponder: Received block blk_5080254298708411681 of size 67108864 from /10.251.43.21 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: Received block blk_<NUM> of size <NUM> from <IP> {“parameter_0”: “215858”, “parameter_1”: “15485”, “parameter_2”: “5080254298708411681”, “parameter_3”: “67108864”, “parameter_4”: “/10.251.43.21”} |
081110 | 215858 | 15494 INFO dfs.DataNode$DataXceiver: Receiving block blk_-7037346755429293022 src: /10.251.43.21:45933 dest: /10.251.43.21:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> {“parameter_0”: “215858”, “parameter_1”: “15494”, “parameter_2”: “-7037346755429293022”, “parameter_3”: “/10.251.43.21:45933”, “parameter_4”: “/10.251.43.21:50010”} |
081110 | 215858 | 15496 INFO dfs.DataNode$PacketResponder: PacketResponder 2 for block blk_-7746692545918257727 terminating 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: PacketResponder <NUM> for block blk_<NUM> terminating {“parameter_0”: “215858”, “parameter_1”: “15496”, “parameter_2”: “2”, “parameter_3”: “-7746692545918257727”} |
081110 | 215858 | 15496 INFO dfs.DataNode$PacketResponder: Received block blk_-7746692545918257727 of size 67108864 from /10.251.107.227 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: Received block blk_<NUM> of size <NUM> from <IP> {“parameter_0”: “215858”, “parameter_1”: “15496”, “parameter_2”: “-7746692545918257727”, “parameter_3”: “67108864”, “parameter_4”: “/10.251.107.227”} |
081110 | 215858 | 15511 INFO dfs.DataNode$DataXceiver: Receiving block blk_-8578644687709935034 src: /10.251.107.227:39600 dest: /10.251.107.227:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> {“parameter_0”: “215858”, “parameter_1”: “15511”, “parameter_2”: “-8578644687709935034”, “parameter_3”: “/10.251.107.227:39600”, “parameter_4”: “/10.251.107.227:50010”} |
081110 | 215858 | 15514 INFO dfs.DataNode$DataXceiver: Receiving block blk_722881101738646364 src: /10.251.75.79:58213 dest: /10.251.75.79:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> {“parameter_0”: “215858”, “parameter_1”: “15514”, “parameter_2”: “722881101738646364”, “parameter_3”: “/10.251.75.79:58213”, “parameter_4”: “/10.251.75.79:50010”} |
081110 | 215858 | 15517 INFO dfs.DataNode$PacketResponder: PacketResponder 2 for block blk_-7110736255599716271 terminating 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: PacketResponder <NUM> for block blk_<NUM> terminating {“parameter_0”: “215858”, “parameter_1”: “15517”, “parameter_2”: “2”, “parameter_3”: “-7110736255599716271”} |
081110 | 215858 | 15517 INFO dfs.DataNode$PacketResponder: Received block blk_-7110736255599716271 of size 67108864 from /10.251.42.246 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: Received block blk_<NUM> of size <NUM> from <IP> {“parameter_0”: “215858”, “parameter_1”: “15517”, “parameter_2”: “-7110736255599716271”, “parameter_3”: “67108864”, “parameter_4”: “/10.251.42.246”} |
081110 | 215858 | 15533 INFO dfs.DataNode$DataXceiver: Receiving block blk_7257432994295824826 src: /10.251.26.8:41803 dest: /10.251.26.8:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> {“parameter_0”: “215858”, “parameter_1”: “15533”, “parameter_2”: “7257432994295824826”, “parameter_3”: “/10.251.26.8:41803”, “parameter_4”: “/10.251.26.8:50010”} |
081110 | 215858 | 15533 INFO dfs.DataNode$DataXceiver: Receiving block blk_-7771332301119265281 src: /10.251.43.210:34258 dest: /10.251.43.210:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> {“parameter_0”: “215858”, “parameter_1”: “15533”, “parameter_2”: “-7771332301119265281”, “parameter_3”: “/10.251.43.210:34258”, “parameter_4”: “/10.251.43.210:50010”} |
5.20 - log_reduce_train_fl()
The function log_reduce_train_fl()
finds common patterns in semi structured textual columns, such as log lines, and clusters the lines according to the extracted patterns. The function’s algorithm and most of the parameters are identical to log_reduce_fl(), but unlike log_reduce_fl() that outputs a patterns summary table, this function outputs the serialized model. The model can be used by the function log_reduce_predict_fl()/log_reduce_predict_full_fl() to predict the matched pattern for new log lines.
Syntax
T |
invoke
log_reduce_train_fl(
reduce_col,
model_name [,
use_logram [,
use_drain [,
custom_regexes [,
custom_regexes_policy [,
delimiters [,
similarity_th [,
tree_depth [,
trigram_th [,
bigram_th ]]]]]]]]])
Parameters
The following parameters description is a summary. For more information, see More about the algorithm section.
Name | Type | Required | Description |
---|---|---|---|
reduce_col | string | ✔️ | The name of the string column the function is applied to. |
model_name | string | ✔️ | The name of the output model. |
use_logram | bool | Enable or disable the Logram algorithm. Default value is true . | |
use_drain | bool | Enable or disable the Drain algorithm. Default value is true . | |
custom_regexes | dynamic | A dynamic array containing pairs of regular expression and replacement symbols to be searched in each input row, and replaced with their respective matching symbol. Default value is dynamic([]) . The default regex table replaces numbers, IPs and GUIDs. | |
custom_regexes_policy | string | Either ‘prepend’, ‘append’ or ‘replace’. Controls whether custom_regexes are prepend/append/replace the default ones. Default value is ‘prepend’. | |
delimiters | dynamic | A dynamic array containing delimiter strings. Default value is dynamic([" "]) , defining space as the only single character delimiter. | |
similarity_th | real | Similarity threshold, used by the Drain algorithm. Increasing similarity_th results in more refined databases. Default value is 0.5. If Drain is disabled, then this parameter has no effect. | |
tree_depth | int | Increasing tree_depth improves the runtime of the Drain algorithm, but might reduce its accuracy. Default value is 4. If Drain is disabled, then this parameter has no effect. | |
trigram_th | int | Decreasing trigram_th increases the chances of Logram to replace tokens with wildcards. Default value is 10. If Logram is disabled, then this parameter has no effect. | |
bigram_th | int | Decreasing bigram_th increases the chances of Logram to replace tokens with wildcards. Default value is 15. If Logram, then is disabled this parameter has no effect. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let log_reduce_train_fl=(tbl:(*), reduce_col:string, model_name:string,
use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>',
'([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>',
'(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', 'LogReduce', 'parameters_column', '',
'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table,
'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th,
'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn',
'output_type', 'model');
let code = ```if 1:
from log_cluster import log_reduce
result = log_reduce.log_reduce(df, kargs)
```;
tbl
| extend LogReduce=''
| evaluate python(typeof(model:string), code, kwargs)
| project name=model_name, timestamp=now(), model
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = 'Packages\\Text', docstring = 'Find common patterns in textual logs, output a model')
log_reduce_train_fl(tbl:(*), reduce_col:string, model_name:string,
use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>',
'([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>',
'(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', 'LogReduce', 'parameters_column', '',
'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table,
'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th,
'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn',
'output_type', 'model');
let code = ```if 1:
from log_cluster import log_reduce
result = log_reduce.log_reduce(df, kargs)
```;
tbl
| extend LogReduce=''
| evaluate python(typeof(model:string), code, kwargs)
| project name=model_name, timestamp=now(), model
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
//
// Finding common patterns in HDFS logs, export and store the trained model in ML_Models table
//
.set-or-append ML_Models <|
//
let log_reduce_train_fl=(tbl:(*), reduce_col:string, model_name:string,
use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>',
'([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>',
'(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', 'LogReduce', 'parameters_column', '',
'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table,
'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th,
'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn',
'output_type', 'model');
let code = ```if 1:
from log_cluster import log_reduce
result = log_reduce.log_reduce(df, kargs)
```;
tbl
| extend LogReduce=''
| evaluate python(typeof(model:string), code, kwargs)
| project name=model_name, timestamp=now(), model
};
HDFS_log_100k
| take 100000
| invoke log_reduce_train_fl(reduce_col="data", model_name="HDFS_100K")
Stored
//
// Finding common patterns in HDFS logs, export and store the trained model in ML_Models table
//
.set-or-append ML_Models <|
//
HDFS_log_100k
| take 100000
| invoke log_reduce_train_fl(reduce_col="data", model_name="HDFS_100K")
Output
ExtentId | OriginalSize | ExtentSize | CompressedSize | IndexSize | RowCount |
---|---|---|---|---|---|
3734a525-cc08-44b9-a992-72de97b32414 | 10383 | 11546 | 10834 | 712 | 1 |
5.21 - mann_whitney_u_test_fl()
The function mann_whitney_u_test_fl()
is a UDF (user-defined function) that performs the Mann-Whitney U Test.
Syntax
T | mann_whitney_u_test_fl(
data1,
data2,
test_statistic,
p_value [,
use_continuity ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
data1 | string | ✔️ | The name of the column containing the first set of data to be used for the test. |
data2 | string | ✔️ | The name of the column containing the second set of data to be used for the test. |
test_statistic | string | ✔️ | The name of the column to store test statistic value for the results. |
p_value | string | ✔️ | The name of the column to store p-value for the results. |
use_continuity | bool | Determines if a continuity correction (1/2) is applied. Default is true . |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let mann_whitney_u_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string, use_continuity:bool=true)
{
let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value, 'use_continuity', use_continuity);
let code = ```if 1:
from scipy import stats
data1 = kargs["data1"]
data2 = kargs["data2"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
use_continuity = kargs["use_continuity"]
def func(row):
statistics = stats.mannwhitneyu(row[data1], row[data2], use_continuity=use_continuity)
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Stats", docstring = "Mann-Whitney U Test")
mann_whitney_u_test_fl(tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string, use_continuity:bool=true)
{
let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value, 'use_continuity', use_continuity);
let code = ```if 1:
from scipy import stats
data1 = kargs["data1"]
data2 = kargs["data2"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
use_continuity = kargs["use_continuity"]
def func(row):
statistics = stats.mannwhitneyu(row[data1], row[data2], use_continuity=use_continuity)
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let mann_whitney_u_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string, use_continuity:bool=true)
{
let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value, 'use_continuity', use_continuity);
let code = ```if 1:
from scipy import stats
data1 = kargs["data1"]
data2 = kargs["data2"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
use_continuity = kargs["use_continuity"]
def func(row):
statistics = stats.mannwhitneyu(row[data1], row[data2], use_continuity=use_continuity)
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke mann_whitney_u_test_fl('sample1', 'sample2', 'test_stat', 'p_val')
Stored
datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke mann_whitney_u_test_fl('sample1', 'sample2', 'test_stat', 'p_val')
Output
id | sample1 | sample2 | test_stat | p_val |
---|---|---|---|---|
Test #1 | [23.64, 20.57, 20.42] | [27.1, 22.12, 33.56] | 1 | 0.095215131912761986 |
Test #2 | [20.85, 21.89, 23.41] | [35.09, 30.02, 26.52] | 0 | 0.04042779918502612 |
Test #3 | [20.13, 20.5, 21.7, 22.02] | [32.2, 32.79, 33.9, 34.22] | 0 | 0.015191410988288745 |
5.22 - normality_test_fl()
The function normality_test_fl()
is a UDF (user-defined function) that performs the Normality Test.
Syntax
T | invoke normality_test_fl(
data,
test_statistic,
p_value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
data | string | ✔️ | The name of the column containing the data to be used for the test. |
test_statistic | string | ✔️ | The name of the column to store test statistic value for the results. |
p_value | string | ✔️ | The name of the column to store p-value for the results. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let normality_test_fl = (tbl:(*), data:string, test_statistic:string, p_value:string)
{
let kwargs = bag_pack('data', data, 'test_statistic', test_statistic, 'p_value', p_value);
let code = ```if 1:
from scipy import stats
data = kargs["data"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
def func(row):
statistics = stats.normaltest(row[data])
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Stats", docstring = "Normality Test")
normality_test_fl(tbl:(*), data:string, test_statistic:string, p_value:string)
{
let kwargs = bag_pack('data', data, 'test_statistic', test_statistic, 'p_value', p_value);
let code = ```if 1:
from scipy import stats
data = kargs["data"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
def func(row):
statistics = stats.normaltest(row[data])
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let normality_test_fl = (tbl:(*), data:string, test_statistic:string, p_value:string)
{
let kwargs = bag_pack('data', data, 'test_statistic', test_statistic, 'p_value', p_value);
let code = ```if 1:
from scipy import stats
data = kargs["data"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
def func(row):
statistics = stats.normaltest(row[data])
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
datatable(id:string, sample1:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42, 27.1, 22.12, 33.56, 23.64, 20.57]),
'Test #2', dynamic([20.85, 21.89, 23.41, 35.09, 30.02, 26.52, 20.85, 21.89]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02, 32.2, 32.79, 33.9, 34.22, 20.13, 20.5])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke normality_test_fl('sample1', 'test_stat', 'p_val')
Stored
datatable(id:string, sample1:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42, 27.1, 22.12, 33.56, 23.64, 20.57]),
'Test #2', dynamic([20.85, 21.89, 23.41, 35.09, 30.02, 26.52, 20.85, 21.89]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02, 32.2, 32.79, 33.9, 34.22, 20.13, 20.5])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke normality_test_fl('sample1', 'test_stat', 'p_val')
Output
id | sample1 | test_stat | p_val |
---|---|---|---|
Test #1 | [23.64, 20.57, 20.42, 27.1, 22.12, 33.56, 23.64, 20.57] | 7.4881873153941036 | 0.023657060728893706 |
Test #2 | [20.85, 21.89, 23.41, 35.09, 30.02, 26.52, 20.85, 21.89] | 3.29982750330276 | 0.19206647332255408 |
Test #3 | [20.13, 20.5, 21.7, 22.02, 32.2, 32.79, 33.9, 34.22, 20.13, 20.5] | 6.9868433851364324 | 0.030396685911910585 |
5.23 - pair_probabilities_fl()
Calculate various probabilities and related metrics for a pair of categorical variables.
The function pair_probabilities_fl()
is a UDF (user-defined function) that calculates the following probabilities and related metrics for a pair of categorical variables, A and B, as follows:
- P(A) is the probability of each value A=a
- P(B) is the probability of each value B=b
- P(A|B) is the conditional probability of A=a given B=b
- P(B|A) is the conditional probability of B=b given A=a
- P(A∪B) is the union probability (A=a or B=b)
- P(A∩B) is the intersection probability (A=a and B=b)
- The lift metric is calculated as P(A∩B)/P(A)*P(B). For more information, see lift metric.
- A lift near 1 means that the joint probability of two values is similar to what is expected in case that both variables are independent.
- Lift » 1 means that values cooccur more often than expected under independence assumption.
- Lift « 1 means that values are less likely to cooccur than expected under independence assumption.
- The Jaccard similarity coefficient is calculated as P(A∩B)/P(A∪B). For more information, see Jaccard similarity coefficient.
- A high Jaccard coefficient, close to 1, means that the values tend to occur together.
- A low Jaccard coefficient, close to 0, means that the values tend to stay apart.
Syntax
pair_probabilities_fl(
A, B, Scope)
Parameters
Name | Type | Required | Description |
---|---|---|---|
A | scalar | ✔️ | The first categorical variable. |
B | scalar | ✔️ | The second categorical variable. |
Scope | scalar | ✔️ | The field that contains the scope, so that the probabilities for A and B are calculated independently for each scope value. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let pair_probabilities_fl = (tbl:(*), A_col:string, B_col:string, scope_col:string)
{
let T = materialize(tbl | extend _A = column_ifexists(A_col, ''), _B = column_ifexists(B_col, ''), _scope = column_ifexists(scope_col, ''));
let countOnScope = T | summarize countAllOnScope = count() by _scope;
let probAB = T | summarize countAB = count() by _A, _B, _scope | join kind = leftouter (countOnScope) on _scope | extend P_AB = todouble(countAB)/countAllOnScope;
let probA = probAB | summarize countA = sum(countAB), countAllOnScope = max(countAllOnScope) by _A, _scope | extend P_A = todouble(countA)/countAllOnScope;
let probB = probAB | summarize countB = sum(countAB), countAllOnScope = max(countAllOnScope) by _B, _scope | extend P_B = todouble(countB)/countAllOnScope;
probAB
| join kind = leftouter (probA) on _A, _scope // probability for each value of A
| join kind = leftouter (probB) on _B, _scope // probability for each value of B
| extend P_AUB = P_A + P_B - P_AB // union probability
, P_AIB = P_AB/P_B // conditional probability of A on B
, P_BIA = P_AB/P_A // conditional probability of B on A
| extend Lift_AB = P_AB/(P_A * P_B) // lift metric
, Jaccard_AB = P_AB/P_AUB // Jaccard similarity index
| project _A, _B, _scope, bin(P_A, 0.00001), bin(P_B, 0.00001), bin(P_AB, 0.00001), bin(P_AUB, 0.00001), bin(P_AIB, 0.00001)
, bin(P_BIA, 0.00001), bin(Lift_AB, 0.00001), bin(Jaccard_AB, 0.00001)
| sort by _scope, _A, _B
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Stats", docstring = "Calculate probabilities and related metrics for a pair of categorical variables")
pair_probabilities_fl (tbl:(*), A_col:string, B_col:string, scope_col:string)
{
let T = materialize(tbl | extend _A = column_ifexists(A_col, ''), _B = column_ifexists(B_col, ''), _scope = column_ifexists(scope_col, ''));
let countOnScope = T | summarize countAllOnScope = count() by _scope;
let probAB = T | summarize countAB = count() by _A, _B, _scope | join kind = leftouter (countOnScope) on _scope | extend P_AB = todouble(countAB)/countAllOnScope;
let probA = probAB | summarize countA = sum(countAB), countAllOnScope = max(countAllOnScope) by _A, _scope | extend P_A = todouble(countA)/countAllOnScope;
let probB = probAB | summarize countB = sum(countAB), countAllOnScope = max(countAllOnScope) by _B, _scope | extend P_B = todouble(countB)/countAllOnScope;
probAB
| join kind = leftouter (probA) on _A, _scope // probability for each value of A
| join kind = leftouter (probB) on _B, _scope // probability for each value of B
| extend P_AUB = P_A + P_B - P_AB // union probability
, P_AIB = P_AB/P_B // conditional probability of A on B
, P_BIA = P_AB/P_A // conditional probability of B on A
| extend Lift_AB = P_AB/(P_A * P_B) // lift metric
, Jaccard_AB = P_AB/P_AUB // Jaccard similarity index
| project _A, _B, _scope, bin(P_A, 0.00001), bin(P_B, 0.00001), bin(P_AB, 0.00001), bin(P_AUB, 0.00001), bin(P_AIB, 0.00001)
, bin(P_BIA, 0.00001), bin(Lift_AB, 0.00001), bin(Jaccard_AB, 0.00001)
| sort by _scope, _A, _B
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let pair_probabilities_fl = (tbl:(*), A_col:string, B_col:string, scope_col:string)
{
let T = materialize(tbl | extend _A = column_ifexists(A_col, ''), _B = column_ifexists(B_col, ''), _scope = column_ifexists(scope_col, ''));
let countOnScope = T | summarize countAllOnScope = count() by _scope;
let probAB = T | summarize countAB = count() by _A, _B, _scope | join kind = leftouter (countOnScope) on _scope | extend P_AB = todouble(countAB)/countAllOnScope;
let probA = probAB | summarize countA = sum(countAB), countAllOnScope = max(countAllOnScope) by _A, _scope | extend P_A = todouble(countA)/countAllOnScope;
let probB = probAB | summarize countB = sum(countAB), countAllOnScope = max(countAllOnScope) by _B, _scope | extend P_B = todouble(countB)/countAllOnScope;
probAB
| join kind = leftouter (probA) on _A, _scope // probability for each value of A
| join kind = leftouter (probB) on _B, _scope // probability for each value of B
| extend P_AUB = P_A + P_B - P_AB // union probability
, P_AIB = P_AB/P_B // conditional probability of A on B
, P_BIA = P_AB/P_A // conditional probability of B on A
| extend Lift_AB = P_AB/(P_A * P_B) // lift metric
, Jaccard_AB = P_AB/P_AUB // Jaccard similarity index
| project _A, _B, _scope, bin(P_A, 0.00001), bin(P_B, 0.00001), bin(P_AB, 0.00001), bin(P_AUB, 0.00001), bin(P_AIB, 0.00001)
, bin(P_BIA, 0.00001), bin(Lift_AB, 0.00001), bin(Jaccard_AB, 0.00001)
| sort by _scope, _A, _B
};
//
let dancePairs = datatable(boy:string, girl:string, dance_class:string)[
'James', 'Mary', 'Modern',
'James', 'Mary', 'Modern',
'Robert', 'Mary', 'Modern',
'Robert', 'Mary', 'Modern',
'Michael', 'Patricia', 'Modern',
'Michael', 'Patricia', 'Modern',
'James', 'Patricia', 'Modern',
'Robert', 'Patricia', 'Modern',
'Michael', 'Patricia', 'Modern',
'Michael', 'Patricia', 'Modern',
'James', 'Linda', 'Modern',
'James', 'Linda', 'Modern',
'Robert', 'Linda', 'Modern',
'Robert', 'Linda', 'Modern',
'James', 'Linda', 'Modern',
'Robert', 'Mary', 'Modern',
'Michael', 'Patricia', 'Modern',
'Michael', 'Patricia', 'Modern',
'James', 'Linda', 'Modern',
'Robert', 'Mary', 'Classic',
'Robert', 'Linda', 'Classic',
'James', 'Mary', 'Classic',
'James', 'Linda', 'Classic'
];
dancePairs
| invoke pair_probabilities_fl('boy','girl', 'dance_class')
Stored
let dancePairs = datatable(boy:string, girl:string, dance_class:string)[
'James', 'Mary', 'Modern',
'James', 'Mary', 'Modern',
'Robert', 'Mary', 'Modern',
'Robert', 'Mary', 'Modern',
'Michael', 'Patricia', 'Modern',
'Michael', 'Patricia', 'Modern',
'James', 'Patricia', 'Modern',
'Robert', 'Patricia', 'Modern',
'Michael', 'Patricia', 'Modern',
'Michael', 'Patricia', 'Modern',
'James', 'Linda', 'Modern',
'James', 'Linda', 'Modern',
'Robert', 'Linda', 'Modern',
'Robert', 'Linda', 'Modern',
'James', 'Linda', 'Modern',
'Robert', 'Mary', 'Modern',
'Michael', 'Patricia', 'Modern',
'Michael', 'Patricia', 'Modern',
'James', 'Linda', 'Modern',
'Robert', 'Mary', 'Classic',
'Robert', 'Linda', 'Classic',
'James', 'Mary', 'Classic',
'James', 'Linda', 'Classic'
];
dancePairs
| invoke pair_probabilities_fl('boy','girl', 'dance_class')
Output
Let’s look at list of pairs of people dancing at two dance classes supposedly at random to find out if anything looks anomalous (meaning, not random). We’ll start by looking at each class by itself.
The Michael-Patricia pair has a lift metric of 2.375, which is significantly above 1. This value means that they’re seen together much more often that what would be expected if this pairing was random. Their Jaccard coefficient is 0.75, which is close to 1. When the pair dances, they prefer to dance together.
A | B | scope | P_A | P_B | P_AB | P_AUB | P_AIB | P_BIA | Lift_AB | Jaccard_AB |
---|---|---|---|---|---|---|---|---|---|---|
Robert | Patricia | Modern | 0.31578 | 0.42105 | 0.05263 | 0.68421 | 0.12499 | 0.16666 | 0.39583 | 0.07692 |
Robert | Mary | Modern | 0.31578 | 0.26315 | 0.15789 | 0.42105 | 0.59999 | 0.49999 | 1.89999 | 0.37499 |
Robert | Linda | Modern | 0.31578 | 0.31578 | 0.10526 | 0.52631 | 0.33333 | 0.33333 | 1.05555 | 0.2 |
Michael | Patricia | Modern | 0.31578 | 0.42105 | 0.31578 | 0.42105 | 0.75 | 0.99999 | 2.375 | 0.75 |
James | Patricia | Modern | 0.36842 | 0.42105 | 0.05263 | 0.73684 | 0.12499 | 0.14285 | 0.33928 | 0.07142 |
James | Mary | Modern | 0.36842 | 0.26315 | 0.10526 | 0.52631 | 0.4 | 0.28571 | 1.08571 | 0.2 |
James | Linda | Modern | 0.36842 | 0.31578 | 0.21052 | 0.47368 | 0.66666 | 0.57142 | 1.80952 | 0.44444 |
Robert | Mary | Classic | 0.49999 | 0.49999 | 0.24999 | 0.75 | 0.49999 | 0.49999 | 0.99999 | 0.33333 |
Robert | Linda | Classic | 0.49999 | 0.49999 | 0.24999 | 0.75 | 0.49999 | 0.49999 | 0.99999 | 0.33333 |
James | Mary | Classic | 0.49999 | 0.49999 | 0.24999 | 0.75 | 0.49999 | 0.49999 | 0.99999 | 0.33333 |
James | Linda | Classic | 0.49999 | 0.49999 | 0.24999 | 0.75 | 0.49999 | 0.49999 | 0.99999 | 0.33333 |
5.24 - pairwise_dist_fl()
Calculate pairwise distances between entities based on multiple nominal and numerical variables.
The function pairwise_dist_fl()
is a UDF (user-defined function) that calculates the multivariate distance between data points belonging to the same partition, taking into account nominal and numerical variables.
- All string fields, besides entity and partition names, are considered nominal variables. The distance is equal to 1 if the values are different, and 0 if they’re the same.
- All numerical fields are considered numerical variables. They’re normalized by transforming to z-scores and the distance is calculated as the absolute value of the difference. The total multivariate distance between data points is calculated as the average of the distances between variables.
A distance close to zero means that the entities are similar and a distance above 1 means they’re different. Similarly, an entity with an average distance close to or above one indicates that it’s different from many other entities in the partition, indicating a potential outlier.
Syntax
pairwise_dist_fl(
entity, partition)
Parameters
Name | Type | Required | Description |
---|---|---|---|
entity | string | ✔️ | The name of the input table column containing the names or IDs of the entities for which the distances will be calculated. |
partition | string | ✔️ | The name of the input table column containing the partition or scope, so that the distances are calculated for all pairs of entities under the same partition. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let pairwise_dist_fl = (tbl:(*), id_col:string, partition_col:string)
{
let generic_dist = (value1:dynamic, value2:dynamic)
{
// Calculates the distance between two values; treats all strings as nominal values and numbers as numerical,
// can be extended to other data types or tweaked by adding weights or changing formulas.
iff(gettype(value1[0]) == "string", todouble(tostring(value1[0]) != tostring(value2[0])), abs(todouble(value1[0]) - todouble(value2[0])))
};
let T = (tbl | extend _entity = column_ifexists(id_col, ''), _partition = column_ifexists(partition_col, '') | project-reorder _entity, _partition);
let sum_data = (
// Calculates summary statistics to be used for normalization.
T
| project-reorder _entity
| project _partition, p = pack_array(*)
| mv-expand with_itemindex=idx p
| summarize count(), avg(todouble(p)), stdev(todouble(p)) by _partition, idx
| sort by _partition, idx asc
| summarize make_list(avg_p), make_list(stdev_p) by _partition
);
let normalized_data = (
// Performs normalization on numerical variables by substrcting mean and scaling by standard deviation. Other normalization techniques can be used
// by adding metrics to previous function and using here.
T
| project _partition, p = pack_array(*)
| join kind = leftouter (sum_data) on _partition
| mv-apply p, list_avg_p, list_stdev_p on (
extend normalized = iff((not(isnan(todouble(list_avg_p))) and (list_stdev_p > 0)), pack_array((todouble(p) - todouble(list_avg_p))/todouble(list_stdev_p)), p)
| summarize a = make_list(normalized) by _partition
)
| project _partition, a
);
let dist_data = (
// Calculates distances of included variables and sums them up to get a multivariate distance between all entities under the same partition.
normalized_data
| join kind = inner (normalized_data) on _partition
| project entity = tostring(a[0]), entity1 = tostring(a1[0]), a = array_slice(a, 1, -1), a1 = array_slice(a1, 1, -1), _partition
| mv-apply a, a1 on
(
project d = generic_dist(pack_array(a), pack_array(a1))
| summarize d = make_list(d)
)
| extend dist = bin((1.0*array_sum(d)-1.0)/array_length(d), 0.0001) // -1 cancels the artifact distance calculated between entity names appearing in the bag and normalizes by number of features
| project-away d
| where entity != entity1
| sort by _partition asc, entity asc, dist asc
);
dist_data
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Stats", docstring = "Calculate distances between pairs of entites based on multiple nominal and numerical variables")
pairwise_dist_fl (tbl:(*), id_col:string, partition_col:string)
{
let generic_dist = (value1:dynamic, value2:dynamic)
{
// Calculates the distance between two values; treats all strings as nominal values and numbers as numerical,
// can be extended to other data types or tweaked by adding weights or changing formulas.
iff(gettype(value1[0]) == "string", todouble(tostring(value1[0]) != tostring(value2[0])), abs(todouble(value1[0]) - todouble(value2[0])))
};
let T = (tbl | extend _entity = column_ifexists(id_col, ''), _partition = column_ifexists(partition_col, '') | project-reorder _entity, _partition);
let sum_data = (
// Calculates summary statistics to be used for normalization.
T
| project-reorder _entity
| project _partition, p = pack_array(*)
| mv-expand with_itemindex=idx p
| summarize count(), avg(todouble(p)), stdev(todouble(p)) by _partition, idx
| sort by _partition, idx asc
| summarize make_list(avg_p), make_list(stdev_p) by _partition
);
let normalized_data = (
// Performs normalization on numerical variables by substrcting mean and scaling by standard deviation. Other normalization techniques can be used
// by adding metrics to previous function and using here.
T
| project _partition, p = pack_array(*)
| join kind = leftouter (sum_data) on _partition
| mv-apply p, list_avg_p, list_stdev_p on (
extend normalized = iff((not(isnan(todouble(list_avg_p))) and (list_stdev_p > 0)), pack_array((todouble(p) - todouble(list_avg_p))/todouble(list_stdev_p)), p)
| summarize a = make_list(normalized) by _partition
)
| project _partition, a
);
let dist_data = (
// Calculates distances of included variables and sums them up to get a multivariate distance between all entities under the same partition.
normalized_data
| join kind = inner (normalized_data) on _partition
| project entity = tostring(a[0]), entity1 = tostring(a1[0]), a = array_slice(a, 1, -1), a1 = array_slice(a1, 1, -1), _partition
| mv-apply a, a1 on
(
project d = generic_dist(pack_array(a), pack_array(a1))
| summarize d = make_list(d)
)
| extend dist = bin((1.0*array_sum(d)-1.0)/array_length(d), 0.0001) // -1 cancels the artifact distance calculated between entity names appearing in the bag and normalizes by number of features
| project-away d
| where entity != entity1
| sort by _partition asc, entity asc, dist asc
);
dist_data
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let pairwise_dist_fl = (tbl:(*), id_col:string, partition_col:string)
{
let generic_dist = (value1:dynamic, value2:dynamic)
{
// Calculates the distance between two values; treats all strings as nominal values and numbers as numerical,
// can be extended to other data types or tweaked by adding weights or changing formulas.
iff(gettype(value1[0]) == "string", todouble(tostring(value1[0]) != tostring(value2[0])), abs(todouble(value1[0]) - todouble(value2[0])))
};
let T = (tbl | extend _entity = column_ifexists(id_col, ''), _partition = column_ifexists(partition_col, '') | project-reorder _entity, _partition);
let sum_data = (
// Calculates summary statistics to be used for normalization.
T
| project-reorder _entity
| project _partition, p = pack_array(*)
| mv-expand with_itemindex=idx p
| summarize count(), avg(todouble(p)), stdev(todouble(p)) by _partition, idx
| sort by _partition, idx asc
| summarize make_list(avg_p), make_list(stdev_p) by _partition
);
let normalized_data = (
// Performs normalization on numerical variables by substrcting mean and scaling by standard deviation. Other normalization techniques can be used
// by adding metrics to previous function and using here.
T
| project _partition, p = pack_array(*)
| join kind = leftouter (sum_data) on _partition
| mv-apply p, list_avg_p, list_stdev_p on (
extend normalized = iff((not(isnan(todouble(list_avg_p))) and (list_stdev_p > 0)), pack_array((todouble(p) - todouble(list_avg_p))/todouble(list_stdev_p)), p)
| summarize a = make_list(normalized) by _partition
)
| project _partition, a
);
let dist_data = (
// Calculates distances of included variables and sums them up to get a multivariate distance between all entities under the same partition.
normalized_data
| join kind = inner (normalized_data) on _partition
| project entity = tostring(a[0]), entity1 = tostring(a1[0]), a = array_slice(a, 1, -1), a1 = array_slice(a1, 1, -1), _partition
| mv-apply a, a1 on
(
project d = generic_dist(pack_array(a), pack_array(a1))
| summarize d = make_list(d)
)
| extend dist = bin((1.0*array_sum(d)-1.0)/array_length(d), 0.0001) // -1 cancels the artifact distance calculated between entity names appearing in the bag and normalizes by number of features
| project-away d
| where entity != entity1
| sort by _partition asc, entity asc, dist asc
);
dist_data
};
//
let raw_data = datatable(name:string, gender: string, height:int, weight:int, limbs:int, accessory:string, type:string)[
'Andy', 'M', 160, 80, 4, 'Hat', 'Person',
'Betsy', 'F', 170, 70, 4, 'Bag', 'Person',
'Cindy', 'F', 130, 30, 4, 'Hat', 'Person',
'Dan', 'M', 190, 105, 4, 'Hat', 'Person',
'Elmie', 'M', 110, 30, 4, 'Toy', 'Person',
'Franny', 'F', 170, 65, 4, 'Bag', 'Person',
'Godzilla', '?', 260, 210, 5, 'Tail', 'Person',
'Hannie', 'F', 112, 28, 4, 'Toy', 'Person',
'Ivie', 'F', 105, 20, 4, 'Toy', 'Person',
'Johnnie', 'M', 107, 21, 4, 'Toy', 'Person',
'Kyle', 'M', 175, 76, 4, 'Hat', 'Person',
'Laura', 'F', 180, 70, 4, 'Bag', 'Person',
'Mary', 'F', 160, 60, 4, 'Bag', 'Person',
'Noah', 'M', 178, 90, 4, 'Hat', 'Person',
'Odelia', 'F', 186, 76, 4, 'Bag', 'Person',
'Paul', 'M', 158, 69, 4, 'Bag', 'Person',
'Qui', 'F', 168, 62, 4, 'Bag', 'Person',
'Ronnie', 'M', 108, 26, 4, 'Toy', 'Person',
'Sonic', 'F', 52, 20, 6, 'Tail', 'Pet',
'Tweety', 'F', 52, 20, 6, 'Tail', 'Pet' ,
'Ulfie', 'M', 39, 29, 4, 'Wings', 'Pet',
'Vinnie', 'F', 53, 22, 1, 'Tail', 'Pet',
'Waldo', 'F', 51, 21, 4, 'Tail', 'Pet',
'Xander', 'M', 50, 24, 4, 'Tail', 'Pet'
];
raw_data
| invoke pairwise_dist_fl('name', 'type')
| where _partition == 'Person' | sort by entity asc, entity1 asc
| evaluate pivot (entity, max(dist), entity1) | sort by entity1 asc
Stored
let raw_data = datatable(name:string, gender: string, height:int, weight:int, limbs:int, accessory:string, type:string)[
'Andy', 'M', 160, 80, 4, 'Hat', 'Person',
'Betsy', 'F', 170, 70, 4, 'Bag', 'Person',
'Cindy', 'F', 130, 30, 4, 'Hat', 'Person',
'Dan', 'M', 190, 105, 4, 'Hat', 'Person',
'Elmie', 'M', 110, 30, 4, 'Toy', 'Person',
'Franny', 'F', 170, 65, 4, 'Bag', 'Person',
'Godzilla', '?', 260, 210, 5, 'Tail', 'Person',
'Hannie', 'F', 112, 28, 4, 'Toy', 'Person',
'Ivie', 'F', 105, 20, 4, 'Toy', 'Person',
'Johnnie', 'M', 107, 21, 4, 'Toy', 'Person',
'Kyle', 'M', 175, 76, 4, 'Hat', 'Person',
'Laura', 'F', 180, 70, 4, 'Bag', 'Person',
'Mary', 'F', 160, 60, 4, 'Bag', 'Person',
'Noah', 'M', 178, 90, 4, 'Hat', 'Person',
'Odelia', 'F', 186, 76, 4, 'Bag', 'Person',
'Paul', 'M', 158, 69, 4, 'Bag', 'Person',
'Qui', 'F', 168, 62, 4, 'Bag', 'Person',
'Ronnie', 'M', 108, 26, 4, 'Toy', 'Person',
'Sonic', 'F', 52, 20, 6, 'Tail', 'Pet',
'Tweety', 'F', 52, 20, 6, 'Tail', 'Pet' ,
'Ulfie', 'M', 39, 29, 4, 'Wings', 'Pet',
'Vinnie', 'F', 53, 22, 1, 'Tail', 'Pet',
'Woody', 'F', 51, 21, 4, 'Tail', 'Pet',
'Xander', 'M', 50, 24, 4, 'Tail', 'Pet'
];
raw_data
| invoke pairwise_dist_fl('name', 'type')
| where _partition == 'Person' | sort by entity asc, entity1 asc
| evaluate pivot (entity, max(dist), entity1) | sort by entity1 asc
Output
entity1 | Andy | Betsy | Cindy | Dan | Elmie | Franny | Godzilla | Hannie | … |
---|---|---|---|---|---|---|---|---|---|
Andy | 0.354 | 0.4125 | 0.1887 | 0.4843 | 0.3702 | 1.2087 | 0.6265 | … | |
Betsy | 0.354 | 0.416 | 0.4708 | 0.6307 | 0.0161 | 1.2051 | 0.4872 | … | |
Cindy | 0.4125 | 0.416 | 0.6012 | 0.3575 | 0.3998 | 1.4783 | 0.214 | … | |
Dan | 0.1887 | 0.4708 | 0.6012 | 0.673 | 0.487 | 1.0199 | 0.8152 | … | |
Elmie | 0.4843 | 0.6307 | 0.3575 | 0.673 | 0.6145 | 1.5502 | 0.1565 | … | |
Franny | 0.3702 | 0.0161 | 0.3998 | 0.487 | 0.6145 | 1.2213 | 0.471 | … | |
Godzilla | 1.2087 | 1.2051 | 1.4783 | 1.0199 | 1.5502 | 1.2213 | 1.5495 | … | |
Hannie | 0.6265 | 0.4872 | 0.214 | 0.8152 | 0.1565 | 0.471 | 1.5495 | … | |
… | … | … | … | … | … | … | … | … | … |
Looking at entities of two different types, we would like to calculate distance between entities belonging to the same type, by taking into account both nominal variables (such as gender or preferred accessory) and numerical variables (such as the number of limbs, height, and weight). The numerical variables are on different scales and must be centralized and scaled, which is done automatically. The output is pairs of entities under the same partition with calculated multivariate distance. It can be analyzed directly, visualized as a distance matrix or scatterplot, or used as input data for outlier detection algorithm by calculating mean distance per entity, with entities with high values indicating global outliers. For example, when adding an optional visualization using a distance matrix, you get a table as shown in the sample. From the sample, you can see that:
- Some pairs of entities (Betsy and Franny) have a low distance value (close to 0) indicating they’re similar.
- Some pairs of entities (Godzilla and Elmie) have a high distance value (1 or above) indicating they’re different.
The output can further be used to calculate the average distance per entity. A high average distance might indicate global outliers. For example, we can see that on average Godzilla has a high distance from the others indicating that it’s a probable global outlier.
5.25 - percentiles_linear_fl()
The function percentiles_linear_fl()
is a user-defined function (UDF) that calculates percentiles using linear interpolation between closest ranks, the same method used by Excel’s PERCENTILES.INC function. Kusto native percentile functions use the nearest rank method. For large sets of values the difference between both methods is insignificant, and we recommend using the native function for best performance. For further details on these and additional percentile calculation methods have a look at percentile article on Wikipedia.
The function accepts a table containing the column to calculate on and an optional grouping key, and a dynamic array of the required percentiles, and returns a column containing dynamic array of the percentiles’ values per each group.
Syntax
T | invoke percentiles_linear_fl(
val_col,
pct_arr [,
aggr_col ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
val_col | string | ✔️ | The name of the column that contains the values with which to calculate the percentiles. |
pct_arr | dynamic | ✔️ | A numerical array containing the required percentiles. Each percentile should be in the range [0-100]. |
aggr_col | string | The name of the column that contains the grouping key. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let percentiles_linear_fl=(tbl:(*), val_col:string, pct_arr:dynamic, aggr_col:string='')
{
tbl
| extend _vals = column_ifexists(val_col, 0.0)
| extend _key = column_ifexists(aggr_col, 'ALL')
| order by _key asc, _vals asc
| summarize _vals=make_list(_vals) by _key
| extend n = array_length(_vals)
| extend pct=pct_arr
| mv-apply pct to typeof(real) on (
extend index=pct/100.0*(n-1)
| extend low_index=tolong(floor(index, 1)), high_index=tolong(ceiling(index))
| extend interval=todouble(_vals[high_index])-todouble(_vals[low_index])
| extend pct_val=todouble(_vals[low_index])+(index-low_index)*interval
| summarize pct_arr=make_list(pct), pct_val=make_list(pct_val))
| project-away n
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Stats", docstring = "Calculate linear interpolated percentiles (identical to Excel's PERCENTILE.INC)")
percentiles_linear_fl(tbl:(*), val_col:string, pct_arr:dynamic, aggr_col:string='')
{
tbl
| extend _vals = column_ifexists(val_col, 0.0)
| extend _key = column_ifexists(aggr_col, 'ALL')
| order by _key asc, _vals asc
| summarize _vals=make_list(_vals) by _key
| extend n = array_length(_vals)
| extend pct=pct_arr
| mv-apply pct to typeof(real) on (
extend index=pct/100.0*(n-1)
| extend low_index=tolong(floor(index, 1)), high_index=tolong(ceiling(index))
| extend interval=todouble(_vals[high_index])-todouble(_vals[low_index])
| extend pct_val=todouble(_vals[low_index])+(index-low_index)*interval
| summarize pct_arr=make_list(pct), pct_val=make_list(pct_val))
| project-away n
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let percentiles_linear_fl=(tbl:(*), val_col:string, pct_arr:dynamic, aggr_col:string='')
{
tbl
| extend _vals = column_ifexists(val_col, 0.0)
| extend _key = column_ifexists(aggr_col, 'ALL')
| order by _key asc, _vals asc
| summarize _vals=make_list(_vals) by _key
| extend n = array_length(_vals)
| extend pct=pct_arr
| mv-apply pct to typeof(real) on (
extend index=pct/100.0*(n-1)
| extend low_index=tolong(floor(index, 1)), high_index=tolong(ceiling(index))
| extend interval=todouble(_vals[high_index])-todouble(_vals[low_index])
| extend pct_val=todouble(_vals[low_index])+(index-low_index)*interval
| summarize pct_arr=make_list(pct), pct_val=make_list(pct_val))
| project-away n
};
datatable(x:long, name:string) [
5, 'A',
9, 'A',
7, 'A',
5, 'B',
7, 'B',
7, 'B',
10, 'B',
]
| invoke percentiles_linear_fl('x', dynamic([0, 25, 50, 75, 100]), 'name')
| project-rename name=_key, x=_vals
Stored
datatable(x:long, name:string) [
5, 'A',
9, 'A',
7, 'A',
5, 'B',
7, 'B',
7, 'B',
10, 'B',
]
| invoke percentiles_linear_fl('x', dynamic([0, 25, 50, 75, 100]), 'name')
| project-rename name=_key, x=_vals
Output
name | x | pct_arr | pct_val |
---|---|---|---|
A | [5,7,9] | [0,25,50,75,100] | [5,6,7,8,9] |
B | [5,7,7,10] | [0,25,50,75,100] | [5,6.5,7,7.75,10] |
5.26 - perm_fl()
Calculate P(n, k)
The function perm_fl()
is a user-defined function (UDF) that calculates P(n, k), the number of permutations for selection of k items out of n, with order. It’s based on the native gamma() function to calculate factorial, (see facorial_fl()). For selection of k items without order, use comb_fl().
Syntax
perm_fl(
n, k)
Parameters
Name | Type | Required | Description |
---|
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let perm_fl=(n:int, k:int)
{
let fact_n = gamma(n+1);
let fact_nk = gamma(n-k+1);
tolong(fact_n/fact_nk)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Stats", docstring = "Calculate number of permutations for selection of k items out of n items with order")
perm_fl(n:int, k:int)
{
let fact_n = gamma(n+1);
let fact_nk = gamma(n-k+1);
tolong(fact_n/fact_nk)
}
Example
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let perm_fl=(n:int, k:int)
{
let fact_n = gamma(n+1);
let fact_nk = gamma(n-k+1);
tolong(fact_n/fact_nk)
}
;
range n from 3 to 10 step 3
| extend k = n-2
| extend pnk = perm_fl(n, k)
Stored
range n from 3 to 10 step 3
| extend k = n-2
| extend pnk = perm_fl(n, k)
Output
n | k | pnk |
---|---|---|
3 | 1 | 3 |
6 | 4 | 360 |
9 | 7 | 181440 |
5.27 - plotly_anomaly_fl()
The function plotly_anomaly_fl()
is a user-defined function (UDF) that allows you to customize a plotly template to create an interactive anomaly chart.
The function accepts a table containing the source and the baseline time series, lists of positive and negative anomalies with their respective sizes, and chart labeling string. The function returns a single cell table containing plotly JSON. Optionally, you can render the data in an Azure Data Explorer dashboard tile. For more information, see Plotly (preview). The function accepts a table containing the source and the baseline time series, lists of positive and negative anomalies with their respective sizes, and chart labeling string. The function returns a single cell table containing plotly JSON. Optionally, you can render the data in a Real-Time dashboard tile. For more information, see Plotly (preview).
Prerequisite
Extract the required ‘anomaly’ template from the publicly available PlotlyTemplate
table. Copy this table from the Samples database to your database by running the following KQL command from your target database:
.set PlotlyTemplate <| cluster('help.kusto.windows.net').database('Samples').PlotlyTemplate
Syntax
T | invoke plotly_anomaly_fl(
time_col,
val_col,
baseline_col,
time_high_col,
val_high_col,
size_high_col,
time_low_col,
val_low__col,
size_low_col,
chart_title,
series_name,
val_name)
Parameters
Name | Type | Required | Description |
---|---|---|---|
time_col | string | ✔️ | The name of the column containing the dynamic array of the time points of the original time series |
val_col | string | ✔️ | The name of the column containing the values of the original time series |
baseline_col | string | ✔️ | The name of the column containing the values of the baseline time series. Anomalies are usually detected by large value offset from the expected baseline value. |
time_high_col | string | ✔️ | The name of the column containing the time points of high (above the baseline) anomalies |
val_high_col | string | ✔️ | The name of the column containing the values of the high anomalies |
size_high_col | string | ✔️ | The name of the column containing the marker sizes of the high anomalies |
time_low_col | string | ✔️ | The name of the column containing the time points of low anomalies |
val_low_col | string | ✔️ | The name of the column containing the values of the low anomalies |
size_low_col | string | ✔️ | The name of the column containing the marker sizes of the low anomalies |
chart_title | string | Chart title, default is ‘Anomaly Chart’ | |
series_name | string | Time series name, default is ‘Metric’ | |
val_name | string | Value axis name, default is ‘Value’ |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let plotly_anomaly_fl=(tbl:(*), time_col:string, val_col:string, baseline_col:string, time_high_col:string , val_high_col:string, size_high_col:string,
time_low_col:string, val_low_col:string, size_low_col:string,
chart_title:string='Anomaly chart', series_name:string='Metric', val_name:string='Value')
{
let anomaly_chart = toscalar(PlotlyTemplate | where name == "anomaly" | project plotly);
let tbl_ex = tbl | extend _timestamp = column_ifexists(time_col, datetime(null)), _values = column_ifexists(val_col, 0.0), _baseline = column_ifexists(baseline_col, 0.0),
_high_timestamp = column_ifexists(time_high_col, datetime(null)), _high_values = column_ifexists(val_high_col, 0.0), _high_size = column_ifexists(size_high_col, 1),
_low_timestamp = column_ifexists(time_low_col, datetime(null)), _low_values = column_ifexists(val_low_col, 0.0), _low_size = column_ifexists(size_low_col, 1);
tbl_ex
| extend plotly = anomaly_chart
| extend plotly=replace_string(plotly, '$TIME_STAMPS$', tostring(_timestamp))
| extend plotly=replace_string(plotly, '$SERIES_VALS$', tostring(_values))
| extend plotly=replace_string(plotly, '$BASELINE_VALS$', tostring(_baseline))
| extend plotly=replace_string(plotly, '$TIME_STAMPS_HIGH_ANOMALIES$', tostring(_high_timestamp))
| extend plotly=replace_string(plotly, '$HIGH_ANOMALIES_VALS$', tostring(_high_values))
| extend plotly=replace_string(plotly, '$HIGH_ANOMALIES_MARKER_SIZE$', tostring(_high_size))
| extend plotly=replace_string(plotly, '$TIME_STAMPS_LOW_ANOMALIES$', tostring(_low_timestamp))
| extend plotly=replace_string(plotly, '$LOW_ANOMALIES_VALS$', tostring(_low_values))
| extend plotly=replace_string(plotly, '$LOW_ANOMALIES_MARKER_SIZE$', tostring(_low_size))
| extend plotly=replace_string(plotly, '$TITLE$', chart_title)
| extend plotly=replace_string(plotly, '$SERIES_NAME$', series_name)
| extend plotly=replace_string(plotly, '$Y_NAME$', val_name)
| project plotly
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Plotly", docstring = "Render anomaly chart using plotly template")
plotly_anomaly_fl(tbl:(*), time_col:string, val_col:string, baseline_col:string, time_high_col:string , val_high_col:string, size_high_col:string,
time_low_col:string, val_low_col:string, size_low_col:string,
chart_title:string='Anomaly chart', series_name:string='Metric', val_name:string='Value')
{
let anomaly_chart = toscalar(PlotlyTemplate | where name == "anomaly" | project plotly);
let tbl_ex = tbl | extend _timestamp = column_ifexists(time_col, datetime(null)), _values = column_ifexists(val_col, 0.0), _baseline = column_ifexists(baseline_col, 0.0),
_high_timestamp = column_ifexists(time_high_col, datetime(null)), _high_values = column_ifexists(val_high_col, 0.0), _high_size = column_ifexists(size_high_col, 1),
_low_timestamp = column_ifexists(time_low_col, datetime(null)), _low_values = column_ifexists(val_low_col, 0.0), _low_size = column_ifexists(size_low_col, 1);
tbl_ex
| extend plotly = anomaly_chart
| extend plotly=replace_string(plotly, '$TIME_STAMPS$', tostring(_timestamp))
| extend plotly=replace_string(plotly, '$SERIES_VALS$', tostring(_values))
| extend plotly=replace_string(plotly, '$BASELINE_VALS$', tostring(_baseline))
| extend plotly=replace_string(plotly, '$TIME_STAMPS_HIGH_ANOMALIES$', tostring(_high_timestamp))
| extend plotly=replace_string(plotly, '$HIGH_ANOMALIES_VALS$', tostring(_high_values))
| extend plotly=replace_string(plotly, '$HIGH_ANOMALIES_MARKER_SIZE$', tostring(_high_size))
| extend plotly=replace_string(plotly, '$TIME_STAMPS_LOW_ANOMALIES$', tostring(_low_timestamp))
| extend plotly=replace_string(plotly, '$LOW_ANOMALIES_VALS$', tostring(_low_values))
| extend plotly=replace_string(plotly, '$LOW_ANOMALIES_MARKER_SIZE$', tostring(_low_size))
| extend plotly=replace_string(plotly, '$TITLE$', chart_title)
| extend plotly=replace_string(plotly, '$SERIES_NAME$', series_name)
| extend plotly=replace_string(plotly, '$Y_NAME$', val_name)
| project plotly
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let plotly_anomaly_fl=(tbl:(*), time_col:string, val_col:string, baseline_col:string, time_high_col:string , val_high_col:string, size_high_col:string,
time_low_col:string, val_low_col:string, size_low_col:string,
chart_title:string='Anomaly chart', series_name:string='Metric', val_name:string='Value')
{
let anomaly_chart = toscalar(PlotlyTemplate | where name == "anomaly" | project plotly);
let tbl_ex = tbl | extend _timestamp = column_ifexists(time_col, datetime(null)), _values = column_ifexists(val_col, 0.0), _baseline = column_ifexists(baseline_col, 0.0),
_high_timestamp = column_ifexists(time_high_col, datetime(null)), _high_values = column_ifexists(val_high_col, 0.0), _high_size = column_ifexists(size_high_col, 1),
_low_timestamp = column_ifexists(time_low_col, datetime(null)), _low_values = column_ifexists(val_low_col, 0.0), _low_size = column_ifexists(size_low_col, 1);
tbl_ex
| extend plotly = anomaly_chart
| extend plotly=replace_string(plotly, '$TIME_STAMPS$', tostring(_timestamp))
| extend plotly=replace_string(plotly, '$SERIES_VALS$', tostring(_values))
| extend plotly=replace_string(plotly, '$BASELINE_VALS$', tostring(_baseline))
| extend plotly=replace_string(plotly, '$TIME_STAMPS_HIGH_ANOMALIES$', tostring(_high_timestamp))
| extend plotly=replace_string(plotly, '$HIGH_ANOMALIES_VALS$', tostring(_high_values))
| extend plotly=replace_string(plotly, '$HIGH_ANOMALIES_MARKER_SIZE$', tostring(_high_size))
| extend plotly=replace_string(plotly, '$TIME_STAMPS_LOW_ANOMALIES$', tostring(_low_timestamp))
| extend plotly=replace_string(plotly, '$LOW_ANOMALIES_VALS$', tostring(_low_values))
| extend plotly=replace_string(plotly, '$LOW_ANOMALIES_MARKER_SIZE$', tostring(_low_size))
| extend plotly=replace_string(plotly, '$TITLE$', chart_title)
| extend plotly=replace_string(plotly, '$SERIES_NAME$', series_name)
| extend plotly=replace_string(plotly, '$Y_NAME$', val_name)
| project plotly
};
let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
let marker_scale = 8;
let s_name = 'TS1';
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid
| where sid == s_name
| extend (anomalies, score, baseline) = series_decompose_anomalies(num, 1.5, -1, 'linefit')
| mv-apply num1=num to typeof(double), anomalies1=anomalies to typeof(double), score1=score to typeof(double), TimeStamp1=TimeStamp to typeof(datetime) on (
summarize pAnomalies=make_list_if(num1, anomalies1 > 0), pTimeStamp=make_list_if(TimeStamp1, anomalies1 > 0), pSize=make_list_if(toint(score1*marker_scale), anomalies1 > 0),
nAnomalies=make_list_if(num1, anomalies1 < 0), nTimeStamp=make_list_if(TimeStamp1, anomalies1 < 0), nSize=make_list_if(toint(-score1*marker_scale), anomalies1 < 0)
)
| invoke plotly_anomaly_fl('TimeStamp', 'num', 'baseline', 'pTimeStamp', 'pAnomalies', 'pSize', 'nTimeStamp', 'nAnomalies', 'nSize',
chart_title='Anomaly chart using plotly_anomaly_fl()', series_name=s_name, val_name='# of requests')
| render plotly
Stored
let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
let marker_scale = 8;
let s_name = 'TS1';
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid
| where sid == s_name
| extend (anomalies, score, baseline) = series_decompose_anomalies(num, 1.5, -1, 'linefit')
| mv-apply num1=num to typeof(double), anomalies1=anomalies to typeof(double), score1=score to typeof(double), TimeStamp1=TimeStamp to typeof(datetime) on (
summarize pAnomalies=make_list_if(num1, anomalies1 > 0), pTimeStamp=make_list_if(TimeStamp1, anomalies1 > 0), pSize=make_list_if(toint(score1*marker_scale), anomalies1 > 0),
nAnomalies=make_list_if(num1, anomalies1 < 0), nTimeStamp=make_list_if(TimeStamp1, anomalies1 < 0), nSize=make_list_if(toint(-score1*marker_scale), anomalies1 < 0)
)
| invoke plotly_anomaly_fl('TimeStamp', 'num', 'baseline', 'pTimeStamp', 'pAnomalies', 'pSize', 'nTimeStamp', 'nAnomalies', 'nSize',
chart_title='Anomaly chart using plotly_anomaly_fl()', series_name=s_name, val_name='# of requests')
| render plotly
Output
The output is a Plotly JSON string that can be rendered using ‘| render plotly’ or in an Azure Data Explorer dashboard tile. For more information on creating dashboard tiles, see Visualize data with Azure Data Explorer dashboards . The output is a Plotly JSON string that can be rendered in a Real-Time dashboard tile. For more information on creating dashboard tiles, see Real-Time dashboards.
The following image shows a sample anomaly chart using the above function:
You can zoom in and hover over anomalies:
5.28 - plotly_gauge_fl()
The function plotly_gauge_fl()
is a user-defined function (UDF) that allows you to customize a plotly template to create a gauge chart.
The function accepts few parameters to customize the gauge chart and returns a single cell table containing plotly JSON. Optionally, you can render the data in an Azure Data Explorer dashboard tile. For more information, see Plotly (preview). The function accepts few parameters to customize the gauge chart and returns a single cell table containing plotly JSON. Optionally, you can render the data in a Real-Time dashboard tile. For more information, see Plotly (preview).
Prerequisite
Extract the required ‘gauge’ template from the publicly available PlotlyTemplate
table. Copy this table from the Samples database to your database by running the following KQL command from your target database:
.set PlotlyTemplate <| cluster('help.kusto.windows.net').database('Samples').PlotlyTemplate
Syntax
T | invoke plotly_gauge_fl(
value,
max_range,
mode,
chart_title,
font_color,
bar_color,
bar_bg_color,
tick_color,
tick_width)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | real | ✔️ | The number to be displayed. |
max_range | range | The maximum range of the gauge. | |
mode | string | Specifies how the value is displayed on the graph. Default is ‘gauge+number’. | |
chart_title | string | The chart title. The default is empty title. | |
font_color | string | The chart’s font color. Default is ‘black’. | |
bar_color | string | The gauge’s filled bar color. Default is ‘green’. | |
bar_bg_color | string | The gauge’s not filled bar color. Default is ’lightgreen’. | |
tick_color | string | The gauge’s ticks color. Default is ‘darkblue’. | |
tick_width | int | The gauge’s ticks width. Default is 1. |
Plotly gauge charts support many parameters, still this function exposes only the above ones. For more information, see indicator traces reference.
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let plotly_gauge_fl=(value:real, max_range:real=real(null), mode:string='gauge+number', chart_title:string='',font_color:string='black',
bar_color:string='green', bar_bg_color:string='lightgreen', tick_color:string='darkblue', tick_width:int=1)
{
let gauge_chart = toscalar(PlotlyTemplate | where name == "gauge" | project plotly);
print plotly = gauge_chart
| extend plotly=replace_string(plotly, '$VALUE$', tostring(value))
| extend plotly=replace_string(plotly, '$MAX_RANGE$', iff(isnull(max_range), 'null', tostring(max_range)))
| extend plotly=replace_string(plotly, '$MODE$', mode)
| extend plotly=replace_string(plotly, '$TITLE$', chart_title)
| extend plotly=replace_string(plotly, '$FONT_COLOR$', font_color)
| extend plotly=replace_string(plotly, '$BAR_COLOR$', bar_color)
| extend plotly=replace_string(plotly, '$BAR_BG_COLOR$', bar_bg_color)
| extend plotly=replace_string(plotly, '$TICK_COLOR$', tick_color)
| extend plotly=replace_string(plotly, '$TICK_WIDTH$', tostring(tick_width))
| project plotly
};
// Write your query to use your function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Plotly", docstring = "Render gauge chart using plotly template")
plotly_gauge_fl(value:real, max_range:real=real(null), mode:string='gauge+number', chart_title:string='',font_color:string='black',
bar_color:string='green', bar_bg_color:string='lightgreen', tick_color:string='darkblue', tick_width:int=1)
{
let gauge_chart = toscalar(PlotlyTemplate | where name == "gauge" | project plotly);
print plotly = gauge_chart
| extend plotly=replace_string(plotly, '$VALUE$', tostring(value))
| extend plotly=replace_string(plotly, '$MAX_RANGE$', iff(isnull(max_range), 'null', tostring(max_range)))
| extend plotly=replace_string(plotly, '$MODE$', mode)
| extend plotly=replace_string(plotly, '$TITLE$', chart_title)
| extend plotly=replace_string(plotly, '$FONT_COLOR$', font_color)
| extend plotly=replace_string(plotly, '$BAR_COLOR$', bar_color)
| extend plotly=replace_string(plotly, '$BAR_BG_COLOR$', bar_bg_color)
| extend plotly=replace_string(plotly, '$TICK_COLOR$', tick_color)
| extend plotly=replace_string(plotly, '$TICK_WIDTH$', tostring(tick_width))
| project plotly
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let plotly_gauge_fl=(value:real, max_range:real=real(null), mode:string='gauge+number', chart_title:string='',font_color:string='black',
bar_color:string='green', bar_bg_color:string='lightgreen', tick_color:string='darkblue', tick_width:int=1)
{
let gauge_chart = toscalar(PlotlyTemplate | where name == "gauge" | project plotly);
print plotly = gauge_chart
| extend plotly=replace_string(plotly, '$VALUE$', tostring(value))
| extend plotly=replace_string(plotly, '$MAX_RANGE$', iff(isnull(max_range), 'null', tostring(max_range)))
| extend plotly=replace_string(plotly, '$MODE$', mode)
| extend plotly=replace_string(plotly, '$TITLE$', chart_title)
| extend plotly=replace_string(plotly, '$FONT_COLOR$', font_color)
| extend plotly=replace_string(plotly, '$BAR_COLOR$', bar_color)
| extend plotly=replace_string(plotly, '$BAR_BG_COLOR$', bar_bg_color)
| extend plotly=replace_string(plotly, '$TICK_COLOR$', tick_color)
| extend plotly=replace_string(plotly, '$TICK_WIDTH$', tostring(tick_width))
| project plotly
};
plotly_gauge_fl(value=180, chart_title='Speed', font_color='purple', tick_width=5)
| render plotly
Stored
plotly_gauge_fl(value=180, chart_title='Speed', font_color='purple', tick_width=5)
| render plotly
Output
The output is a Plotly JSON string that can be rendered in an Azure Data Explorer dashboard tile. For more information on creating dashboard tiles, see Visualize data with Azure Data Explorer dashboards. The output is a Plotly JSON string that can be rendered in a Real-Time dashboard tile. For more information on creating dashboard tiles, see Real-Time dashboards.
5.29 - plotly_scatter3d_fl()
The function plotly_scatter3d_fl()
is a user-defined function (UDF) that allows you to customize a plotly template to create an interactive 3D scatter chart.
The function accepts a table containing the records to be rendered, the names of the x, y, z & aggregation columns, and the chart title string. The function returns a single cell table containing plotly JSON. Optionally, you can render the data in an Azure Data Explorer dashboard tile. For more information, see Plotly (preview). The function accepts a table containing the records to be rendered, the names of the x, y, z & aggregation columns, and the chart title string. The function returns a single cell table containing plotly JSON. Optionally, you can render the data in a Real-Time dashboard tile. For more information, see Plotly (preview).
Prerequisite
Extract the required ‘scatter3d’ template from the publicly available PlotlyTemplate
table. Copy this table from the Samples database to your database by running the following KQL command from your target database:
.set PlotlyTemplate <| cluster('help.kusto.windows.net').database('Samples').PlotlyTemplate
Syntax
T | invoke plotly_scatter3d_fl(
x_col,
y_col,
z_col,
aggr_col [,
chart_title ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
x_col | string | ✔️ | The name of the column for the X coordinated of the 3D plot. |
y_col | string | ✔️ | The name of the column for the Y coordinated of the 3D plot. |
z_col | string | ✔️ | The name of the column for the Z coordinated of the 3D plot. |
aggr_col | string | ✔️ | The name of the grouping column. Records in the same group are rendered in distinct color. |
chart_title | string | The chart title. The default is ‘3D Scatter chart’. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let plotly_scatter3d_fl=(tbl:(*), x_col:string, y_col:string, z_col:string, aggr_col:string='', chart_title:string='3D Scatter chart')
{
let scatter3d_chart = toscalar(PlotlyTemplate | where name == "scatter3d" | project plotly);
let tbl_ex = tbl | extend _x = column_ifexists(x_col, 0.0), _y = column_ifexists(y_col, 0.0), _z = column_ifexists(z_col, 0.0), _aggr = column_ifexists(aggr_col, 'ALL');
tbl_ex
| serialize
| summarize _x=pack_array(make_list(_x)), _y=pack_array(make_list(_y)), _z=pack_array(make_list(_z)) by _aggr
| summarize _aggr=make_list(_aggr), _x=make_list(_x), _y=make_list(_y), _z=make_list(_z)
| extend plotly = scatter3d_chart
| extend plotly=replace_string(plotly, '$CLASS1$', tostring(_aggr[0]))
| extend plotly=replace_string(plotly, '$CLASS2$', tostring(_aggr[1]))
| extend plotly=replace_string(plotly, '$CLASS3$', tostring(_aggr[2]))
| extend plotly=replace_string(plotly, '$X_NAME$', x_col)
| extend plotly=replace_string(plotly, '$Y_NAME$', y_col)
| extend plotly=replace_string(plotly, '$Z_NAME$', z_col)
| extend plotly=replace_string(plotly, '$CLASS1_X$', tostring(_x[0]))
| extend plotly=replace_string(plotly, '$CLASS1_Y$', tostring(_y[0]))
| extend plotly=replace_string(plotly, '$CLASS1_Z$', tostring(_z[0]))
| extend plotly=replace_string(plotly, '$CLASS2_X$', tostring(_x[1]))
| extend plotly=replace_string(plotly, '$CLASS2_Y$', tostring(_y[1]))
| extend plotly=replace_string(plotly, '$CLASS2_Z$', tostring(_z[1]))
| extend plotly=replace_string(plotly, '$CLASS3_X$', tostring(_x[2]))
| extend plotly=replace_string(plotly, '$CLASS3_Y$', tostring(_y[2]))
| extend plotly=replace_string(plotly, '$CLASS3_Z$', tostring(_z[2]))
| extend plotly=replace_string(plotly, '$TITLE$', chart_title)
| project plotly
};
// Write your query to use your function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Plotly", docstring = "Render 3D scatter chart using plotly template")
plotly_scatter3d_fl(tbl:(*), x_col:string, y_col:string, z_col:string, aggr_col:string='', chart_title:string='3D Scatter chart')
{
let scatter3d_chart = toscalar(PlotlyTemplate | where name == "scatter3d" | project plotly);
let tbl_ex = tbl | extend _x = column_ifexists(x_col, 0.0), _y = column_ifexists(y_col, 0.0), _z = column_ifexists(z_col, 0.0), _aggr = column_ifexists(aggr_col, 'ALL');
tbl_ex
| serialize
| summarize _x=pack_array(make_list(_x)), _y=pack_array(make_list(_y)), _z=pack_array(make_list(_z)) by _aggr
| summarize _aggr=make_list(_aggr), _x=make_list(_x), _y=make_list(_y), _z=make_list(_z)
| extend plotly = scatter3d_chart
| extend plotly=replace_string(plotly, '$CLASS1$', tostring(_aggr[0]))
| extend plotly=replace_string(plotly, '$CLASS2$', tostring(_aggr[1]))
| extend plotly=replace_string(plotly, '$CLASS3$', tostring(_aggr[2]))
| extend plotly=replace_string(plotly, '$X_NAME$', x_col)
| extend plotly=replace_string(plotly, '$Y_NAME$', y_col)
| extend plotly=replace_string(plotly, '$Z_NAME$', z_col)
| extend plotly=replace_string(plotly, '$CLASS1_X$', tostring(_x[0]))
| extend plotly=replace_string(plotly, '$CLASS1_Y$', tostring(_y[0]))
| extend plotly=replace_string(plotly, '$CLASS1_Z$', tostring(_z[0]))
| extend plotly=replace_string(plotly, '$CLASS2_X$', tostring(_x[1]))
| extend plotly=replace_string(plotly, '$CLASS2_Y$', tostring(_y[1]))
| extend plotly=replace_string(plotly, '$CLASS2_Z$', tostring(_z[1]))
| extend plotly=replace_string(plotly, '$CLASS3_X$', tostring(_x[2]))
| extend plotly=replace_string(plotly, '$CLASS3_Y$', tostring(_y[2]))
| extend plotly=replace_string(plotly, '$CLASS3_Z$', tostring(_z[2]))
| extend plotly=replace_string(plotly, '$TITLE$', chart_title)
| project plotly
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let plotly_scatter3d_fl=(tbl:(*), x_col:string, y_col:string, z_col:string, aggr_col:string='', chart_title:string='3D Scatter chart')
{
let scatter3d_chart = toscalar(PlotlyTemplate | where name == "scatter3d" | project plotly);
let tbl_ex = tbl | extend _x = column_ifexists(x_col, 0.0), _y = column_ifexists(y_col, 0.0), _z = column_ifexists(z_col, 0.0), _aggr = column_ifexists(aggr_col, 'ALL');
tbl_ex
| serialize
| summarize _x=pack_array(make_list(_x)), _y=pack_array(make_list(_y)), _z=pack_array(make_list(_z)) by _aggr
| summarize _aggr=make_list(_aggr), _x=make_list(_x), _y=make_list(_y), _z=make_list(_z)
| extend plotly = scatter3d_chart
| extend plotly=replace_string(plotly, '$CLASS1$', tostring(_aggr[0]))
| extend plotly=replace_string(plotly, '$CLASS2$', tostring(_aggr[1]))
| extend plotly=replace_string(plotly, '$CLASS3$', tostring(_aggr[2]))
| extend plotly=replace_string(plotly, '$X_NAME$', x_col)
| extend plotly=replace_string(plotly, '$Y_NAME$', y_col)
| extend plotly=replace_string(plotly, '$Z_NAME$', z_col)
| extend plotly=replace_string(plotly, '$CLASS1_X$', tostring(_x[0]))
| extend plotly=replace_string(plotly, '$CLASS1_Y$', tostring(_y[0]))
| extend plotly=replace_string(plotly, '$CLASS1_Z$', tostring(_z[0]))
| extend plotly=replace_string(plotly, '$CLASS2_X$', tostring(_x[1]))
| extend plotly=replace_string(plotly, '$CLASS2_Y$', tostring(_y[1]))
| extend plotly=replace_string(plotly, '$CLASS2_Z$', tostring(_z[1]))
| extend plotly=replace_string(plotly, '$CLASS3_X$', tostring(_x[2]))
| extend plotly=replace_string(plotly, '$CLASS3_Y$', tostring(_y[2]))
| extend plotly=replace_string(plotly, '$CLASS3_Z$', tostring(_z[2]))
| extend plotly=replace_string(plotly, '$TITLE$', chart_title)
| project plotly
};
Iris
| invoke plotly_scatter3d_fl(x_col='SepalLength', y_col='PetalLength', z_col='SepalWidth', aggr_col='Class', chart_title='3D scatter chart using plotly_scatter3d_fl()')
| render plotly
Stored
Iris
| invoke plotly_scatter3d_fl(x_col='SepalLength', y_col='PetalLength', z_col='SepalWidth', aggr_col='Class', chart_title='3D scatter chart using plotly_scatter3d_fl()')
Output
The output is a Plotly JSON string that can be rendered in an Azure Data Explorer dashboard tile. For more information on creating dashboard tiles, see Visualize data with Azure Data Explorer dashboards. The output is a Plotly JSON string that can be rendered in a Real-Time dashboard tile. For more information on creating dashboard tiles, see Real-Time dashboards.
You can rotate, zoom and hover over specific records:
5.30 - predict_fl()
The function predict_fl()
is a user-defined function (UDF) that predicts using an existing trained machine learning model. This model was built using Scikit-learn, serialized to string, and saved in a standard table.
Syntax
T | invoke predict_fl(
models_tbl,
model_name,
features_cols,
pred_col)
Parameters
Name | Type | Required | Description |
---|---|---|---|
models_tbl | string | ✔️ | The name of the table that contains all serialized models. The table must have the following columns:name : the model nametimestamp : time of model trainingmodel : string representation of the serialized model |
model_name | string | ✔️ | The name of the specific model to use. |
features_cols | synamic | ✔️ | An array containing the names of the features columns that are used by the model for prediction. |
pred_col | string | ✔️ | The name of the column that stores the predictions. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let predict_fl=(samples:(*), models_tbl:(name:string, timestamp:datetime, model:string), model_name:string, features_cols:dynamic, pred_col:string)
{
let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
let kwargs = bag_pack('smodel', model_str, 'features_cols', features_cols, 'pred_col', pred_col);
let code = ```if 1:
import pickle
import binascii
smodel = kargs["smodel"]
features_cols = kargs["features_cols"]
pred_col = kargs["pred_col"]
bmodel = binascii.unhexlify(smodel)
clf1 = pickle.loads(bmodel)
df1 = df[features_cols]
predictions = clf1.predict(df1)
result = df
result[pred_col] = pd.DataFrame(predictions, columns=[pred_col])
```;
samples
| evaluate python(typeof(*), code, kwargs)
};
// Write your code to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create function with (folder = "Packages\\ML", docstring = "Predict using ML model, build by Scikit-learn")
predict_fl(samples:(*), models_tbl:(name:string, timestamp:datetime, model:string), model_name:string, features_cols:dynamic, pred_col:string)
{
let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
let kwargs = bag_pack('smodel', model_str, 'features_cols', features_cols, 'pred_col', pred_col);
let code = ```if 1:
import pickle
import binascii
smodel = kargs["smodel"]
features_cols = kargs["features_cols"]
pred_col = kargs["pred_col"]
bmodel = binascii.unhexlify(smodel)
clf1 = pickle.loads(bmodel)
df1 = df[features_cols]
predictions = clf1.predict(df1)
result = df
result[pred_col] = pd.DataFrame(predictions, columns=[pred_col])
```;
samples
| evaluate python(typeof(*), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let predict_fl=(samples:(*), models_tbl:(name:string, timestamp:datetime, model:string), model_name:string, features_cols:dynamic, pred_col:string)
{
let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
let kwargs = bag_pack('smodel', model_str, 'features_cols', features_cols, 'pred_col', pred_col);
let code = ```if 1:
import pickle
import binascii
smodel = kargs["smodel"]
features_cols = kargs["features_cols"]
pred_col = kargs["pred_col"]
bmodel = binascii.unhexlify(smodel)
clf1 = pickle.loads(bmodel)
df1 = df[features_cols]
predictions = clf1.predict(df1)
result = df
result[pred_col] = pd.DataFrame(predictions, columns=[pred_col])
```;
samples
| evaluate python(typeof(*), code, kwargs)
};
//
// Predicts room occupancy from sensors measurements, and calculates the confusion matrix
//
// Occupancy Detection is an open dataset from UCI Repository at https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+
// It contains experimental data for binary classification of room occupancy from Temperature,Humidity,Light and CO2.
// Ground-truth labels were obtained from time stamped pictures that were taken every minute
//
OccupancyDetection
| where Test == 1
| extend pred_Occupancy=false
| invoke predict_fl(ML_Models, 'Occupancy', pack_array('Temperature', 'Humidity', 'Light', 'CO2', 'HumidityRatio'), 'pred_Occupancy')
| summarize n=count() by Occupancy, pred_Occupancy
Stored
//
// Predicts room occupancy from sensors measurements, and calculates the confusion matrix
//
// Occupancy Detection is an open dataset from UCI Repository at https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+
// It contains experimental data for binary classification of room occupancy from Temperature,Humidity,Light and CO2.
// Ground-truth labels were obtained from time stamped pictures that were taken every minute
//
OccupancyDetection
| where Test == 1
| extend pred_Occupancy=false
| invoke predict_fl(ML_Models, 'Occupancy', pack_array('Temperature', 'Humidity', 'Light', 'CO2', 'HumidityRatio'), 'pred_Occupancy')
| summarize n=count() by Occupancy, pred_Occupancy
Output
Occupancy | pred_Occupancy | n |
---|---|---|
TRUE | TRUE | 3006 |
FALSE | TRUE | 112 |
TRUE | FALSE | 15 |
FALSE | FALSE | 9284 |
Model asset
Get sample dataset and pre-trained model with Python plugin enabled.
//dataset
.set OccupancyDetection <| cluster('help').database('Samples').OccupancyDetection
//model
.set ML_Models <| datatable(name:string, timestamp:datetime, model:string) [
'Occupancy', datetime(now), '800363736b6c6561726e2e6c696e6561725f6d6f64656c2e6c6f6769737469630a4c6f67697374696352656772657373696f6e0a7100298171017d710228580700000070656e616c7479710358020000006c32710458040000006475616c7105895803000000746f6c7106473f1a36e2eb1c432d5801000000437107473ff0000000000000580d0000006669745f696e746572636570747108885811000000696e746572636570745f7363616c696e6771094b01580c000000636c6173735f776569676874710a4e580c00000072616e646f6d5f7374617465710b4e5806000000736f6c766572710c58090000006c69626c696e656172710d58080000006d61785f69746572710e4b64580b0000006d756c74695f636c617373710f58030000006f767271105807000000766572626f736571114b00580a0000007761726d5f737461727471128958060000006e5f6a6f627371134b015808000000636c61737365735f7114636e756d70792e636f72652e6d756c746961727261790a5f7265636f6e7374727563740a7115636e756d70790a6e6461727261790a71164b00857117430162711887711952711a284b014b0285711b636e756d70790a64747970650a711c58020000006231711d4b004b0187711e52711f284b0358010000007c71204e4e4e4affffffff4affffffff4b007471216289430200017122747123625805000000636f65665f7124681568164b008571256818877126527127284b014b014b05867128681c5802000000663871294b004b0187712a52712b284b0358010000003c712c4e4e4e4affffffff4affffffff4b0074712d628943286a02e0d50687e0bfc6d7c974fa93a63fb3d3b8080e6e943ffceb15defdad713f14c3a76bd73202bf712e74712f62580a000000696e746572636570745f7130681568164b008571316818877132527133284b014b01857134682b894308f1e89f57711290bf71357471366258070000006e5f697465725f7137681568164b00857138681887713952713a284b014b0185713b681c58020000006934713c4b004b0187713d52713e284b03682c4e4e4e4affffffff4affffffff4b0074713f628943040c00000071407471416258100000005f736b6c6561726e5f76657273696f6e71425806000000302e31392e32714375622e'
]
5.31 - predict_onnx_fl()
The function predict_onnx_fl()
is a user-defined function (UDF) that predicts using an existing trained machine learning model. This model has been converted to ONNX format, serialized to string, and saved in a standard table.
Syntax
T | invoke predict_onnx_fl(
models_tbl,
model_name,
features_cols,
pred_col)
Parameters
Name | Type | Required | Description |
---|---|---|---|
models_tbl | string | ✔️ | The name of the table that contains all serialized models. The table must have the following columns:name : the model nametimestamp : time of model trainingmodel : string representation of the serialized model |
model_name | string | ✔️ | The name of the specific model to use. |
features_cols | synamic | ✔️ | An array containing the names of the features columns that are used by the model for prediction. |
pred_col | string | ✔️ | The name of the column that stores the predictions. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let predict_onnx_fl=(samples:(*), models_tbl:(name:string, timestamp:datetime, model:string), model_name:string, features_cols:dynamic, pred_col:string)
{
let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
let kwargs = bag_pack('smodel', model_str, 'features_cols', features_cols, 'pred_col', pred_col);
let code = ```if 1:
import binascii
smodel = kargs["smodel"]
features_cols = kargs["features_cols"]
pred_col = kargs["pred_col"]
bmodel = binascii.unhexlify(smodel)
features_cols = kargs["features_cols"]
pred_col = kargs["pred_col"]
import onnxruntime as rt
sess = rt.InferenceSession(bmodel)
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name
df1 = df[features_cols]
predictions = sess.run([label_name], {input_name: df1.values.astype(np.float32)})[0]
result = df
result[pred_col] = pd.DataFrame(predictions, columns=[pred_col])
```;
samples | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\ML", docstring = "Predict using ONNX model")
predict_onnx_fl(samples:(*), models_tbl:(name:string, timestamp:datetime, model:string), model_name:string, features_cols:dynamic, pred_col:string)
{
let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
let kwargs = bag_pack('smodel', model_str, 'features_cols', features_cols, 'pred_col', pred_col);
let code = ```if 1:
import binascii
smodel = kargs["smodel"]
features_cols = kargs["features_cols"]
pred_col = kargs["pred_col"]
bmodel = binascii.unhexlify(smodel)
features_cols = kargs["features_cols"]
pred_col = kargs["pred_col"]
import onnxruntime as rt
sess = rt.InferenceSession(bmodel)
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name
df1 = df[features_cols]
predictions = sess.run([label_name], {input_name: df1.values.astype(np.float32)})[0]
result = df
result[pred_col] = pd.DataFrame(predictions, columns=[pred_col])
```;
samples | evaluate python(typeof(*), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let predict_onnx_fl=(samples:(*), models_tbl:(name:string, timestamp:datetime, model:string), model_name:string, features_cols:dynamic, pred_col:string)
{
let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
let kwargs = bag_pack('smodel', model_str, 'features_cols', features_cols, 'pred_col', pred_col);
let code = ```if 1:
import binascii
smodel = kargs["smodel"]
features_cols = kargs["features_cols"]
pred_col = kargs["pred_col"]
bmodel = binascii.unhexlify(smodel)
features_cols = kargs["features_cols"]
pred_col = kargs["pred_col"]
import onnxruntime as rt
sess = rt.InferenceSession(bmodel)
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name
df1 = df[features_cols]
predictions = sess.run([label_name], {input_name: df1.values.astype(np.float32)})[0]
result = df
result[pred_col] = pd.DataFrame(predictions, columns=[pred_col])
```;
samples | evaluate python(typeof(*), code, kwargs)
};
//
// Predicts room occupancy from sensors measurements, and calculates the confusion matrix
//
// Occupancy Detection is an open dataset from UCI Repository at https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+
// It contains experimental data for binary classification of room occupancy from Temperature,Humidity,Light and CO2.
// Ground-truth labels were obtained from time stamped pictures that were taken every minute
//
OccupancyDetection
| where Test == 1
| extend pred_Occupancy=bool(0)
| invoke predict_onnx_fl(ML_Models, 'ONNX-Occupancy', pack_array('Temperature', 'Humidity', 'Light', 'CO2', 'HumidityRatio'), 'pred_Occupancy')
| summarize n=count() by Occupancy, pred_Occupancy
Stored
//
// Predicts room occupancy from sensors measurements, and calculates the confusion matrix
//
// Occupancy Detection is an open dataset from UCI Repository at https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+
// It contains experimental data for binary classification of room occupancy from Temperature,Humidity,Light and CO2.
// Ground-truth labels were obtained from time stamped pictures that were taken every minute
//
OccupancyDetection
| where Test == 1
| extend pred_Occupancy=bool(0)
| invoke predict_onnx_fl(ML_Models, 'ONNX-Occupancy', pack_array('Temperature', 'Humidity', 'Light', 'CO2', 'HumidityRatio'), 'pred_Occupancy')
| summarize n=count() by Occupancy, pred_Occupancy
Output
Occupancy | pred_Occupancy | n |
---|---|---|
TRUE | TRUE | 3006 |
FALSE | TRUE | 112 |
TRUE | FALSE | 15 |
FALSE | FALSE | 9284 |
5.32 - quantize_fl()
The function quantize_fl()
is a user-defined function (UDF) that bins metric columns. It quantizes metric columns to categorical labels, based on the K-Means algorithm.
Syntax
T | invoke quantize_fl(
num_bins,
in_cols,
out_cols [,
labels ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
num_bins | int | ✔️ | The required number of bins. |
in_cols | dynamic | ✔️ | An array containing the names of the columns to quantize. |
out_cols | dynamic | ✔️ | An array containing the names of the respective output columns for the binned values. |
labels | dynamic | An array containing the label names. If unspecified, bin ranges will be used. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let quantize_fl=(tbl:(*), num_bins:int, in_cols:dynamic, out_cols:dynamic, labels:dynamic=dynamic(null))
{
let kwargs = bag_pack('num_bins', num_bins, 'in_cols', in_cols, 'out_cols', out_cols, 'labels', labels);
let code = ```if 1:
from sklearn.preprocessing import KBinsDiscretizer
num_bins = kargs["num_bins"]
in_cols = kargs["in_cols"]
out_cols = kargs["out_cols"]
labels = kargs["labels"]
result = df
binner = KBinsDiscretizer(n_bins=num_bins, encode="ordinal", strategy="kmeans")
df_in = df[in_cols]
bdata = binner.fit_transform(df_in)
if labels is None:
for i in range(len(out_cols)): # loop on each column and convert it to binned labels
ii = np.round(binner.bin_edges_[i], 3)
labels = [str(ii[j-1]) + '-' + str(ii[j]) for j in range(1, num_bins+1)]
result.loc[:,out_cols[i]] = np.take(labels, bdata[:, i].astype(int))
else:
result[out_cols] = np.take(labels, bdata.astype(int))
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create function with (folder = "Packages\\ML", docstring = "Binning metric columns")
quantize_fl(tbl:(*), num_bins:int, in_cols:dynamic, out_cols:dynamic, labels:dynamic)
{
let kwargs = bag_pack('num_bins', num_bins, 'in_cols', in_cols, 'out_cols', out_cols, 'labels', labels);
let code = ```if 1:
from sklearn.preprocessing import KBinsDiscretizer
num_bins = kargs["num_bins"]
in_cols = kargs["in_cols"]
out_cols = kargs["out_cols"]
labels = kargs["labels"]
result = df
binner = KBinsDiscretizer(n_bins=num_bins, encode="ordinal", strategy="kmeans")
df_in = df[in_cols]
bdata = binner.fit_transform(df_in)
if labels is None:
for i in range(len(out_cols)): # loop on each column and convert it to binned labels
ii = np.round(binner.bin_edges_[i], 3)
labels = [str(ii[j-1]) + '-' + str(ii[j]) for j in range(1, num_bins+1)]
result.loc[:,out_cols[i]] = np.take(labels, bdata[:, i].astype(int))
else:
result[out_cols] = np.take(labels, bdata.astype(int))
```;
tbl
| evaluate python(typeof(*), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let quantize_fl=(tbl:(*), num_bins:int, in_cols:dynamic, out_cols:dynamic, labels:dynamic=dynamic(null))
{
let kwargs = bag_pack('num_bins', num_bins, 'in_cols', in_cols, 'out_cols', out_cols, 'labels', labels);
let code = ```if 1:
from sklearn.preprocessing import KBinsDiscretizer
num_bins = kargs["num_bins"]
in_cols = kargs["in_cols"]
out_cols = kargs["out_cols"]
labels = kargs["labels"]
result = df
binner = KBinsDiscretizer(n_bins=num_bins, encode="ordinal", strategy="kmeans")
df_in = df[in_cols]
bdata = binner.fit_transform(df_in)
if labels is None:
for i in range(len(out_cols)): # loop on each column and convert it to binned labels
ii = np.round(binner.bin_edges_[i], 3)
labels = [str(ii[j-1]) + '-' + str(ii[j]) for j in range(1, num_bins+1)]
result.loc[:,out_cols[i]] = np.take(labels, bdata[:, i].astype(int))
else:
result[out_cols] = np.take(labels, bdata.astype(int))
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
//
union
(range x from 1 to 5 step 1),
(range x from 10 to 15 step 1),
(range x from 20 to 25 step 1)
| extend x_label='', x_bin=''
| invoke quantize_fl(3, pack_array('x'), pack_array('x_label'), pack_array('Low', 'Med', 'High'))
| invoke quantize_fl(3, pack_array('x'), pack_array('x_bin'), dynamic(null))
Stored
union
(range x from 1 to 5 step 1),
(range x from 10 to 15 step 1),
(range x from 20 to 25 step 1)
| extend x_label='', x_bin=''
| invoke quantize_fl(3, pack_array('x'), pack_array('x_label'), pack_array('Low', 'Med', 'High'))
| invoke quantize_fl(3, pack_array('x'), pack_array('x_bin'), dynamic(null))
Output
x | x_label | x_bin |
---|---|---|
1 | Low | 1.0-7.75 |
2 | Low | 1.0-7.75 |
3 | Low | 1.0-7.75 |
4 | Low | 1.0-7.75 |
5 | Low | 1.0-7.75 |
20 | High | 17.5-25.0 |
21 | High | 17.5-25.0 |
22 | High | 17.5-25.0 |
23 | High | 17.5-25.0 |
24 | High | 17.5-25.0 |
25 | High | 17.5-25.0 |
10 | Med | 7.75-17.5 |
11 | Med | 7.75-17.5 |
12 | Med | 7.75-17.5 |
13 | Med | 7.75-17.5 |
14 | Med | 7.75-17.5 |
15 | Med | 7.75-17.5 |
5.33 - series_clean_anomalies_fl()
Cleans anomalous points in a series.
The function series_clean_anomalies_fl()
is a user-defined function (UDF) that takes a dynamic numerical array as input and another numerical array of anomalies and replaces the anomalies in the input array with interpolated value of their adjacent points.
Syntax
series_clean_anomalies_fl(
y_series,
anomalies)
Parameters
Name | Type | Required | Description |
---|---|---|---|
y_series | dynamic | ✔️ | The input array of numeric values. |
anomalies | dynamic | ✔️ | The anomalies array containing either 0 for normal points or any other value for anomalous points. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_clean_anomalies_fl = (y_series:dynamic, anomalies:dynamic)
{
let fnum = array_iff(series_not_equals(anomalies, 0), real(null), y_series); // replace anomalies with null values
series_fill_linear(fnum)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Replace anomalies by interpolated value", skipvalidation = "true")
series_clean_anomalies_fl(y_series:dynamic, anomalies:dynamic)
{
let fnum = array_iff(series_not_equals(anomalies, 0), real(null), y_series); // replace anomalies with null values
series_fill_linear(fnum)
}
Example
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_clean_anomalies_fl = (y_series:dynamic, anomalies:dynamic)
{
let fnum = array_iff(series_not_equals(anomalies, 0), real(null), y_series); // replace anomalies with null values
series_fill_linear(fnum)
}
;
let min_t = datetime(2016-08-29);
let max_t = datetime(2016-08-31);
demo_make_series1
| make-series num=count() on TimeStamp from min_t to max_t step 20m by OsVer
| extend anomalies = series_decompose_anomalies(num, 0.8)
| extend num_c = series_clean_anomalies_fl(num, anomalies)
| render anomalychart with (anomalycolumns=anomalies)
Stored
let min_t = datetime(2016-08-29);
let max_t = datetime(2016-08-31);
demo_make_series1
| make-series num=count() on TimeStamp from min_t to max_t step 20m by OsVer
| extend anomalies = series_decompose_anomalies(num, 0.8)
| extend num_c = series_clean_anomalies_fl(num, anomalies)
| render anomalychart with (anomalycolumns=anomalies)
Output
5.34 - series_cosine_similarity_fl()
Calculates the cosine similarity of two numerical vectors.
The function series_cosine_similarity_fl()
is a user-defined function (UDF) that takes an expression containing two dynamic numerical arrays as input and calculates their cosine similarity.
Syntax
series_cosine_similarity_fl(
vec1,
vec2,
[ vec1_size [,
vec2_size ]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
vec1 | dynamic | ✔️ | An array of numeric values. |
vec2 | dynamic | ✔️ | An array of numeric values that is the same length as vec1. |
vec1_size | real | The size of vec1. This is equivalent to the square root of the dot product of the vector with itself. | |
vec2_size | real | The size of vec2. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_cosine_similarity_fl=(vec1:dynamic, vec2:dynamic, vec1_size:real=double(null), vec2_size:real=double(null))
{
let dp = series_dot_product(vec1, vec2);
let v1l = iff(isnull(vec1_size), sqrt(series_dot_product(vec1, vec1)), vec1_size);
let v2l = iff(isnull(vec2_size), sqrt(series_dot_product(vec2, vec2)), vec2_size);
dp/(v1l*v2l)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Calculate the Cosine similarity of 2 numerical arrays")
series_cosine_similarity_fl(vec1:dynamic, vec2:dynamic, vec1_size:real=double(null), vec2_size:real=double(null))
{
let dp = series_dot_product(vec1, vec2);
let v1l = iff(isnull(vec1_size), sqrt(series_dot_product(vec1, vec1)), vec1_size);
let v2l = iff(isnull(vec2_size), sqrt(series_dot_product(vec2, vec2)), vec2_size);
dp/(v1l*v2l)
}
Example
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_cosine_similarity_fl=(vec1:dynamic, vec2:dynamic, vec1_size:real=double(null), vec2_size:real=double(null))
{
let dp = series_dot_product(vec1, vec2);
let v1l = iff(isnull(vec1_size), sqrt(series_dot_product(vec1, vec1)), vec1_size);
let v2l = iff(isnull(vec2_size), sqrt(series_dot_product(vec2, vec2)), vec2_size);
dp/(v1l*v2l)
};
let s1=pack_array(0, 1);
let s2=pack_array(sqrt(2), sqrt(2));
print angle=acos(series_cosine_similarity_fl(s1, s2))/(2*pi())*360
Stored
let s1=pack_array(0, 1);
let s2=pack_array(sqrt(2), sqrt(2));
print angle=acos(series_cosine_similarity_fl(s1, s2))/(2*pi())*360
Output
angle |
---|
45 |
5.35 - series_dbl_exp_smoothing_fl()
Applies a double exponential smoothing filter on a series.
The function series_dbl_exp_smoothing_fl()
is a user-defined function (UDF) that takes an expression containing a dynamic numerical array as input and applies a double exponential smoothing filter. When there is trend in the series, this function is superior to the series_exp_smoothing_fl() function, which implements a basic exponential smoothing filter.
Syntax
series_dbl_exp_smoothing_fl(
y_series [,
alpha [,
beta ]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
y_series | dynamic | ✔️ | An array of numeric values. |
alpha | real | A value in the range [0-1] that specifies the weight of the last point vs. the weight of the previous points, which is 1 - alpha . The default is 0.5. | |
beta | real | A value in the range [0-1] that specifies the weight of the last slope vs. the weight of the previous slopes, which is 1 - beta . The default is 0.5. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_dbl_exp_smoothing_fl = (y_series:dynamic, alpha:double=0.5, beta:double=0.5)
{
series_iir(y_series, pack_array(alpha, alpha*(beta-1)), pack_array(1, alpha*(1+beta)-2, 1-alpha))
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Double exponential smoothing for a series")
series_dbl_exp_smoothing_fl(y_series:dynamic, alpha:double=0.5, beta:double=0.5)
{
series_iir(y_series, pack_array(alpha, alpha*(beta-1)), pack_array(1, alpha*(1+beta)-2, 1-alpha))
}
Example
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_dbl_exp_smoothing_fl = (y_series:dynamic, alpha:double=0.5, beta:double=0.5)
{
series_iir(y_series, pack_array(alpha, alpha*(beta-1)), pack_array(1, alpha*(1+beta)-2, 1-alpha))
};
range x from 1 to 50 step 1
| extend y = x + rand()*10
| summarize x = make_list(x), y = make_list(y)
| extend dbl_exp_smooth_y = series_dbl_exp_smoothing_fl(y, 0.2, 0.4)
| render linechart
Stored
range x from 1 to 50 step 1
| extend y = x + rand()*10
| summarize x = make_list(x), y = make_list(y)
| extend dbl_exp_smooth_y = series_dbl_exp_smoothing_fl(y, 0.2, 0.4)
| render linechart
Output
5.36 - series_dot_product_fl()
Calculates the dot product of two numerical vectors.
The function series_dot_product_fl()
is a user-defined function (UDF) that takes an expression containing two dynamic numerical arrays as input and calculates their dot product.
Syntax
series_dot_product_fl(
vec1,
vec2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
vec1 | dynamic | ✔️ | An array of numeric values. |
vec2 | dynamic | ✔️ | An array of numeric values that is the same length as vec1. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_dot_product_fl=(vec1:dynamic, vec2:dynamic)
{
let elem_prod = series_multiply(vec1, vec2);
let cum_sum = series_iir(elem_prod, dynamic([1]), dynamic([1,-1]));
todouble(cum_sum[-1])
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Calculate the dot product of 2 numerical arrays")
series_dot_product_fl(vec1:dynamic, vec2:dynamic)
{
let elem_prod = series_multiply(vec1, vec2);
let cum_sum = series_iir(elem_prod, dynamic([1]), dynamic([1,-1]));
todouble(cum_sum[-1])
}
Example
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_dot_product_fl=(vec1:dynamic, vec2:dynamic)
{
let elem_prod = series_multiply(vec1, vec2);
let cum_sum = series_iir(elem_prod, dynamic([1]), dynamic([1,-1]));
todouble(cum_sum[-1])
};
union
(print 1 | project v1=range(1, 3, 1), v2=range(4, 6, 1)),
(print 1 | project v1=range(11, 13, 1), v2=range(14, 16, 1))
| extend v3=series_dot_product_fl(v1, v2)
Stored
union
(print 1 | project v1=range(1, 3, 1), v2=range(4, 6, 1)),
(print 1 | project v1=range(11, 13, 1), v2=range(14, 16, 1))
| extend v3=series_dot_product_fl(v1, v2)
Output
5.37 - series_downsample_fl()
The function series_downsample_fl()
is a user-defined function (UDF) that downsamples a time series by an integer factor. This function takes a table containing multiple time series (dynamic numerical array), and downsamples each series. The output contains both the coarser series and its respective times array. To avoid aliasing, the function applies a simple low pass filter on each series before subsampling.
Syntax
T | invoke series_downsample_fl(
t_col,
y_col,
ds_t_col,
ds_y_col,
sampling_factor)
Parameters
Name | Type | Required | Description |
---|---|---|---|
t_col | string | ✔️ | The name of the column that contains the time axis of the series to downsample. |
y_col | string | ✔️ | The name of the column that contains the series to downsample. |
ds_t_col | string | ✔️ | The name of the column to store the down sampled time axis of each series. |
ds_y_col | string | ✔️ | The name of the column to store the down sampled series. |
sampling_factor | int | ✔️ | An integer specifying the required down sampling. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_downsample_fl=(tbl:(*), t_col:string, y_col:string, ds_t_col:string, ds_y_col:string, sampling_factor:int)
{
tbl
| extend _t_ = column_ifexists(t_col, dynamic(0)), _y_ = column_ifexists(y_col, dynamic(0))
| extend _y_ = series_fir(_y_, repeat(1, sampling_factor), true, true) // apply a simple low pass filter before sub-sampling
| mv-apply _t_ to typeof(DateTime), _y_ to typeof(double) on
(extend rid=row_number()-1
| where rid % sampling_factor == ceiling(sampling_factor/2.0)-1 // sub-sampling
| summarize _t_ = make_list(_t_), _y_ = make_list(_y_))
| extend cols = bag_pack(ds_t_col, _t_, ds_y_col, _y_)
| project-away _t_, _y_
| evaluate bag_unpack(cols)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Downsampling a series by an integer factor")
series_downsample_fl(tbl:(*), t_col:string, y_col:string, ds_t_col:string, ds_y_col:string, sampling_factor:int)
{
tbl
| extend _t_ = column_ifexists(t_col, dynamic(0)), _y_ = column_ifexists(y_col, dynamic(0))
| extend _y_ = series_fir(_y_, repeat(1, sampling_factor), true, true) // apply a simple low pass filter before sub-sampling
| mv-apply _t_ to typeof(DateTime), _y_ to typeof(double) on
(extend rid=row_number()-1
| where rid % sampling_factor == ceiling(sampling_factor/2.0)-1 // sub-sampling
| summarize _t_ = make_list(_t_), _y_ = make_list(_y_))
| extend cols = bag_pack(ds_t_col, _t_, ds_y_col, _y_)
| project-away _t_, _y_
| evaluate bag_unpack(cols)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_downsample_fl=(tbl:(*), t_col:string, y_col:string, ds_t_col:string, ds_y_col:string, sampling_factor:int)
{
tbl
| extend _t_ = column_ifexists(t_col, dynamic(0)), _y_ = column_ifexists(y_col, dynamic(0))
| extend _y_ = series_fir(_y_, repeat(1, sampling_factor), true, true) // apply a simple low pass filter before sub-sampling
| mv-apply _t_ to typeof(DateTime), _y_ to typeof(double) on
(extend rid=row_number()-1
| where rid % sampling_factor == ceiling(sampling_factor/2.0)-1 // sub-sampling
| summarize _t_ = make_list(_t_), _y_ = make_list(_y_))
| extend cols = bag_pack(ds_t_col, _t_, ds_y_col, _y_)
| project-away _t_, _y_
| evaluate bag_unpack(cols)
};
demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| invoke series_downsample_fl('TimeStamp', 'num', 'coarse_TimeStamp', 'coarse_num', 4)
| render timechart with(xcolumn=coarse_TimeStamp, ycolumns=coarse_num)
Stored
demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| invoke series_downsample_fl('TimeStamp', 'num', 'coarse_TimeStamp', 'coarse_num', 4)
| render timechart with(xcolumn=coarse_TimeStamp, ycolumns=coarse_num)
Output
The time series downsampled by 4:
For reference, here is the original time series (before downsampling):
demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| render timechart with(xcolumn=TimeStamp, ycolumns=num)
5.38 - series_exp_smoothing_fl()
Applies a basic exponential smoothing filter on a series.
The function series_exp_smoothing_fl()
is a user-defined function (UDF) that takes an expression containing a dynamic numerical array as input and applies a basic exponential smoothing filter.
Syntax
series_exp_smoothing_fl(
y_series [,
alpha ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
y_series | dynamic | ✔️ | An array cell of numeric values. |
alpha | real | A value in the range [0-1] that specifies the weight of the last point vs. the weight of the previous points, which is 1 - alpha . The default is 0.5. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_exp_smoothing_fl = (y_series:dynamic, alpha:double=0.5)
{
series_iir(y_series, pack_array(alpha), pack_array(1, alpha-1))
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Basic exponential smoothing for a series")
series_exp_smoothing_fl(y_series:dynamic, alpha:double=0.5)
{
series_iir(y_series, pack_array(alpha), pack_array(1, alpha-1))
}
Example
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_exp_smoothing_fl = (y_series:dynamic, alpha:double=0.5)
{
series_iir(y_series, pack_array(alpha), pack_array(1, alpha-1))
};
range x from 1 to 50 step 1
| extend y = x % 10
| summarize x = make_list(x), y = make_list(y)
| extend exp_smooth_y = series_exp_smoothing_fl(y, 0.4)
| render linechart
Stored
range x from 1 to 50 step 1
| extend y = x % 10
| summarize x = make_list(x), y = make_list(y)
| extend exp_smooth_y = series_exp_smoothing_fl(y, 0.4)
| render linechart
Output
5.39 - series_fbprophet_forecast_fl()
The function series_fbprophet_forecast_fl()
is a user-defined function (UDF) that takes an expression containing a time series as input, and predicts the values of the last trailing points using the Prophet algorithm. The function returns both the forecasted points and their confidence intervals. This function is a Kusto Query Language (KQL) wrapper to Prophet() class, and exposes only the parameters that are mandatory for prediction. Feel free to modify your copy to support more parameters. such as holidays, change points, Fourier order, and so on.
- Install the
fbprophet
package since it isn’t included in the Python image. To install the package, do the following:- Follow the guidelines for Installing packages for the Python plugin.
- To save time in the above guidelines, you can download the
prophet
ZIP file, containing the wheel files ofprophet
and its dependencies, from https://artifactswestusnew.blob.core.windows.net/public/prophet-1.1.5.zip. Save this file to your allowlisted blob container.
- To save time in the above guidelines, you can download the
- Create a SAS token with read access to your ZIP file. To create a SAS token, see get the SAS for a blob container.
- In the Example, replace the URL reference in the
external_artifacts
parameter with your file path and its SAS token.
- Follow the guidelines for Installing packages for the Python plugin.
Syntax
T | invoke series_fbprophet_forecast_fl(
ts_series,
y_series,
y_pred_series,
[ points ],
[ y_pred_low_series ],
[ y_pred_high_series ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
ts_series | string | ✔️ | The name of the input table column containing the time stamps of the series to predict. |
y_series | string | ✔️ | The name of the input table column containing the values of the series to predict. |
y_pred_series | string | ✔️ | The name of the column to store the predicted series. |
points | int | ✔️ | The number of points at the end of the series to predict (forecast). These points are excluded from the learning (regression) process. The default is 0. |
y_pred_low_series | string | The name of the column to store the series of the lowest values of the confidence interval. Omit if the confidence interval isn’t needed. | |
y_pred_high_series | string | The name of the column to store the series of the highest values of the confidence interval. Omit if the confidence interval isn’t needed. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_fbprophet_forecast_fl=(tbl:(*), ts_series:string, y_series:string, y_pred_series:string, points:int=0, y_pred_low_series:string='', y_pred_high_series:string='')
{
let kwargs = bag_pack('ts_series', ts_series, 'y_series', y_series, 'y_pred_series', y_pred_series, 'points', points, 'y_pred_low_series', y_pred_low_series, 'y_pred_high_series', y_pred_high_series);
let code = ```if 1:
from sandbox_utils import Zipackage
Zipackage.install("prophet.zip")
ts_series = kargs["ts_series"]
y_series = kargs["y_series"]
y_pred_series = kargs["y_pred_series"]
points = kargs["points"]
y_pred_low_series = kargs["y_pred_low_series"]
y_pred_high_series = kargs["y_pred_high_series"]
result = df
sr = pd.Series(df[y_pred_series])
if y_pred_low_series != '':
srl = pd.Series(df[y_pred_low_series])
if y_pred_high_series != '':
srh = pd.Series(df[y_pred_high_series])
from prophet import Prophet
df1 = pd.DataFrame(columns=["ds", "y"])
for i in range(df.shape[0]):
df1["ds"] = pd.to_datetime(df[ts_series][i])
df1["ds"] = df1["ds"].dt.tz_convert(None)
df1["y"] = df[y_series][i]
df2 = df1[:-points]
m = Prophet()
m.fit(df2)
future = df1[["ds"]]
forecast = m.predict(future)
sr[i] = list(forecast["yhat"])
if y_pred_low_series != '':
srl[i] = list(forecast["yhat_lower"])
if y_pred_high_series != '':
srh[i] = list(forecast["yhat_upper"])
result[y_pred_series] = sr
if y_pred_low_series != '':
result[y_pred_low_series] = srl
if y_pred_high_series != '':
result[y_pred_high_series] = srh
```;
tbl
| evaluate python(typeof(*), code, kwargs
, external_artifacts=bag_pack('prophet.zip', 'https://artifactswestusnew.blob.core.windows.net/public/prophet-1.1.5.zip?*** YOUR SAS TOKEN ***'))
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Time Series Forecast using Facebook fbprophet package")
series_fbprophet_forecast_fl(tbl:(*), ts_series:string, y_series:string, y_pred_series:string, points:int=0, y_pred_low_series:string='', y_pred_high_series:string='')
{
let kwargs = bag_pack('ts_series', ts_series, 'y_series', y_series, 'y_pred_series', y_pred_series, 'points', points, 'y_pred_low_series', y_pred_low_series, 'y_pred_high_series', y_pred_high_series);
let code = ```if 1:
from sandbox_utils import Zipackage
Zipackage.install("prophet.zip")
ts_series = kargs["ts_series"]
y_series = kargs["y_series"]
y_pred_series = kargs["y_pred_series"]
points = kargs["points"]
y_pred_low_series = kargs["y_pred_low_series"]
y_pred_high_series = kargs["y_pred_high_series"]
result = df
sr = pd.Series(df[y_pred_series])
if y_pred_low_series != '':
srl = pd.Series(df[y_pred_low_series])
if y_pred_high_series != '':
srh = pd.Series(df[y_pred_high_series])
from prophet import Prophet
df1 = pd.DataFrame(columns=["ds", "y"])
for i in range(df.shape[0]):
df1["ds"] = pd.to_datetime(df[ts_series][i])
df1["ds"] = df1["ds"].dt.tz_convert(None)
df1["y"] = df[y_series][i]
df2 = df1[:-points]
m = Prophet()
m.fit(df2)
future = df1[["ds"]]
forecast = m.predict(future)
sr[i] = list(forecast["yhat"])
if y_pred_low_series != '':
srl[i] = list(forecast["yhat_lower"])
if y_pred_high_series != '':
srh[i] = list(forecast["yhat_upper"])
result[y_pred_series] = sr
if y_pred_low_series != '':
result[y_pred_low_series] = srl
if y_pred_high_series != '':
result[y_pred_high_series] = srh
```;
tbl
| evaluate python(typeof(*), code, kwargs
, external_artifacts=bag_pack('prophet.zip', 'https://artifactswestusnew.blob.core.windows.net/public/prophet-1.1.5.zip?*** YOUR SAS TOKEN ***'))
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_fbprophet_forecast_fl=(tbl:(*), ts_series:string, y_series:string, y_pred_series:string, points:int=0, y_pred_low_series:string='', y_pred_high_series:string='')
{
let kwargs = bag_pack('ts_series', ts_series, 'y_series', y_series, 'y_pred_series', y_pred_series, 'points', points, 'y_pred_low_series', y_pred_low_series, 'y_pred_high_series', y_pred_high_series);
let code = ```if 1:
from sandbox_utils import Zipackage
Zipackage.install("prophet.zip")
ts_series = kargs["ts_series"]
y_series = kargs["y_series"]
y_pred_series = kargs["y_pred_series"]
points = kargs["points"]
y_pred_low_series = kargs["y_pred_low_series"]
y_pred_high_series = kargs["y_pred_high_series"]
result = df
sr = pd.Series(df[y_pred_series])
if y_pred_low_series != '':
srl = pd.Series(df[y_pred_low_series])
if y_pred_high_series != '':
srh = pd.Series(df[y_pred_high_series])
from prophet import Prophet
df1 = pd.DataFrame(columns=["ds", "y"])
for i in range(df.shape[0]):
df1["ds"] = pd.to_datetime(df[ts_series][i])
df1["ds"] = df1["ds"].dt.tz_convert(None)
df1["y"] = df[y_series][i]
df2 = df1[:-points]
m = Prophet()
m.fit(df2)
future = df1[["ds"]]
forecast = m.predict(future)
sr[i] = list(forecast["yhat"])
if y_pred_low_series != '':
srl[i] = list(forecast["yhat_lower"])
if y_pred_high_series != '':
srh[i] = list(forecast["yhat_upper"])
result[y_pred_series] = sr
if y_pred_low_series != '':
result[y_pred_low_series] = srl
if y_pred_high_series != '':
result[y_pred_high_series] = srh
```;
tbl
| evaluate python(typeof(*), code, kwargs
, external_artifacts=bag_pack('prophet.zip', 'https://artifactswestusnew.blob.core.windows.net/public/prophet-1.1.5.zip?*** YOUR SAS TOKEN ***'))
};
//
// Forecasting 3 time series using fbprophet, compare to forecasting using the native function series_decompose_forecast()
//
let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
let horizon=7d;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t+horizon step dt by sid
| extend pred_num_native = series_decompose_forecast(num, toint(horizon/dt))
| extend pred_num=dynamic(null), pred_num_lower=dynamic(null), pred_num_upper=dynamic(null)
| invoke series_fbprophet_forecast_fl('TimeStamp', 'num', 'pred_num', toint(horizon/dt), 'pred_num_lower', 'pred_num_upper')
| render timechart
Stored
//
// Forecasting 3 time series using fbprophet, compare to forecasting using the native function series_decompose_forecast()
//
let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
let horizon=7d;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t+horizon step dt by sid
| extend pred_num_native = series_decompose_forecast(num, toint(horizon/dt))
| extend pred_num=dynamic(null), pred_num_lower=dynamic(null), pred_num_upper=dynamic(null)
| invoke series_fbprophet_forecast_fl('TimeStamp', 'num', 'pred_num', toint(horizon/dt), 'pred_num_lower', 'pred_num_upper')
| render timechart
Output
5.40 - series_fit_lowess_fl()
no-loc: LOWESS
series_fit_lowess_fl()
The function series_fit_lowess_fl()
is a user-defined function (UDF) that applies a LOWESS regression on a series. This function takes a table with multiple series (dynamic numerical arrays) and generates a LOWESS Curve, which is a smoothed version of the original series.
Syntax
T | invoke series_fit_lowess_fl(
y_series,
y_fit_series,
[ fit_size ],
[ x_series ],
[ x_istime ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
y_series | string | ✔️ | The name of the input table column containing the dependent variable. This column is the series to fit. |
y_fit_series | string | ✔️ | The name of the column to store the fitted series. |
fit_size | int | For each point, the local regression is applied on its respective fit_size closest points. The default is 5. | |
x_series | string | The name of the column containing the independent variable, that is, the x or time axis. This parameter is optional, and is needed only for unevenly spaced series. The default value is an empty string, as x is redundant for the regression of an evenly spaced series. | |
x_istime | bool | This boolean parameter is needed only if x_series is specified and it’s a vector of datetime. The default is false . |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_fit_lowess_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_size:int=5, x_series:string='', x_istime:bool=False)
{
let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_size', fit_size, 'x_series', x_series, 'x_istime', x_istime);
let code = ```if 1:
y_series = kargs["y_series"]
y_fit_series = kargs["y_fit_series"]
fit_size = kargs["fit_size"]
x_series = kargs["x_series"]
x_istime = kargs["x_istime"]
import statsmodels.api as sm
def lowess_fit(ts_row, x_col, y_col, fsize):
y = ts_row[y_col]
fraction = fsize/len(y)
if x_col == "": # If there is no x column creates sequential range [1, len(y)]
x = np.arange(len(y)) + 1
else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
if x_istime:
x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
x = x - x.min()
x = x / x.max()
x = x * (len(x) - 1) + 1
else:
x = ts_row[x_col]
lowess = sm.nonparametric.lowess
z = lowess(y, x, return_sorted=False, frac=fraction)
return list(z)
result = df
result[y_fit_series] = df.apply(lowess_fit, axis=1, args=(x_series, y_series, fit_size))
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Fits a local polynomial using LOWESS method to a series")
series_fit_lowess_fl(tbl:(*), y_series:string, y_fit_series:string, fit_size:int=5, x_series:string='', x_istime:bool=False)
{
let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_size', fit_size, 'x_series', x_series, 'x_istime', x_istime);
let code = ```if 1:
y_series = kargs["y_series"]
y_fit_series = kargs["y_fit_series"]
fit_size = kargs["fit_size"]
x_series = kargs["x_series"]
x_istime = kargs["x_istime"]
import statsmodels.api as sm
def lowess_fit(ts_row, x_col, y_col, fsize):
y = ts_row[y_col]
fraction = fsize/len(y)
if x_col == "": # If there is no x column creates sequential range [1, len(y)]
x = np.arange(len(y)) + 1
else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
if x_istime:
x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
x = x - x.min()
x = x / x.max()
x = x * (len(x) - 1) + 1
else:
x = ts_row[x_col]
lowess = sm.nonparametric.lowess
z = lowess(y, x, return_sorted=False, frac=fraction)
return list(z)
result = df
result[y_fit_series] = df.apply(lowess_fit, axis=1, args=(x_series, y_series, fit_size))
```;
tbl
| evaluate python(typeof(*), code, kwargs)
}
Examples
The following examples use the invoke operator to run the function.
LOWESS regression on regular time series
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_fit_lowess_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_size:int=5, x_series:string='', x_istime:bool=False)
{
let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_size', fit_size, 'x_series', x_series, 'x_istime', x_istime);
let code = ```if 1:
y_series = kargs["y_series"]
y_fit_series = kargs["y_fit_series"]
fit_size = kargs["fit_size"]
x_series = kargs["x_series"]
x_istime = kargs["x_istime"]
import statsmodels.api as sm
def lowess_fit(ts_row, x_col, y_col, fsize):
y = ts_row[y_col]
fraction = fsize/len(y)
if x_col == "": # If there is no x column creates sequential range [1, len(y)]
x = np.arange(len(y)) + 1
else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
if x_istime:
x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
x = x - x.min()
x = x / x.max()
x = x * (len(x) - 1) + 1
else:
x = ts_row[x_col]
lowess = sm.nonparametric.lowess
z = lowess(y, x, return_sorted=False, frac=fraction)
return list(z)
result = df
result[y_fit_series] = df.apply(lowess_fit, axis=1, args=(x_series, y_series, fit_size))
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
//
// Apply 9 points LOWESS regression on regular time series
//
let max_t = datetime(2016-09-03);
demo_make_series1
| make-series num=count() on TimeStamp from max_t-1d to max_t step 5m by OsVer
| extend fnum = dynamic(null)
| invoke series_fit_lowess_fl('num', 'fnum', 9)
| render timechart
Stored
//
// Apply 9 points LOWESS regression on regular time series
//
let max_t = datetime(2016-09-03);
demo_make_series1
| make-series num=count() on TimeStamp from max_t-1d to max_t step 5m by OsVer
| extend fnum = dynamic(null)
| invoke series_fit_lowess_fl('num', 'fnum', 9)
| render timechart
Output
Test irregular time series
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_fit_lowess_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_size:int=5, x_series:string='', x_istime:bool=False)
{
let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_size', fit_size, 'x_series', x_series, 'x_istime', x_istime);
let code = ```if 1:
y_series = kargs["y_series"]
y_fit_series = kargs["y_fit_series"]
fit_size = kargs["fit_size"]
x_series = kargs["x_series"]
x_istime = kargs["x_istime"]
import statsmodels.api as sm
def lowess_fit(ts_row, x_col, y_col, fsize):
y = ts_row[y_col]
fraction = fsize/len(y)
if x_col == "": # If there is no x column creates sequential range [1, len(y)]
x = np.arange(len(y)) + 1
else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
if x_istime:
x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
x = x - x.min()
x = x / x.max()
x = x * (len(x) - 1) + 1
else:
x = ts_row[x_col]
lowess = sm.nonparametric.lowess
z = lowess(y, x, return_sorted=False, frac=fraction)
return list(z)
result = df
result[y_fit_series] = df.apply(lowess_fit, axis=1, args=(x_series, y_series, fit_size))
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
let max_t = datetime(2016-09-03);
demo_make_series1
| where TimeStamp between ((max_t-1d)..max_t)
| summarize num=count() by bin(TimeStamp, 5m), OsVer
| order by TimeStamp asc
| where hourofday(TimeStamp) % 6 != 0 // delete every 6th hour to create irregular time series
| summarize TimeStamp=make_list(TimeStamp), num=make_list(num) by OsVer
| extend fnum = dynamic(null)
| invoke series_fit_lowess_fl('num', 'fnum', 9, 'TimeStamp', True)
| render timechart
Stored
let max_t = datetime(2016-09-03);
demo_make_series1
| where TimeStamp between ((max_t-1d)..max_t)
| summarize num=count() by bin(TimeStamp, 5m), OsVer
| order by TimeStamp asc
| where hourofday(TimeStamp) % 6 != 0 // delete every 6th hour to create irregular time series
| summarize TimeStamp=make_list(TimeStamp), num=make_list(num) by OsVer
| extend fnum = dynamic(null)
| invoke series_fit_lowess_fl('num', 'fnum', 9, 'TimeStamp', True)
| render timechart
Output
Compare LOWESS versus polynomial fit
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_fit_lowess_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_size:int=5, x_series:string='', x_istime:bool=False)
{
let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_size', fit_size, 'x_series', x_series, 'x_istime', x_istime);
let code = ```if 1:
y_series = kargs["y_series"]
y_fit_series = kargs["y_fit_series"]
fit_size = kargs["fit_size"]
x_series = kargs["x_series"]
x_istime = kargs["x_istime"]
import statsmodels.api as sm
def lowess_fit(ts_row, x_col, y_col, fsize):
y = ts_row[y_col]
fraction = fsize/len(y)
if x_col == "": # If there is no x column creates sequential range [1, len(y)]
x = np.arange(len(y)) + 1
else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
if x_istime:
x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
x = x - x.min()
x = x / x.max()
x = x * (len(x) - 1) + 1
else:
x = ts_row[x_col]
lowess = sm.nonparametric.lowess
z = lowess(y, x, return_sorted=False, frac=fraction)
return list(z)
result = df
result[y_fit_series] = df.apply(lowess_fit, axis=1, args=(x_series, y_series, fit_size))
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
range x from 1 to 200 step 1
| project x = rand()*5 - 2.3
| extend y = pow(x, 5)-8*pow(x, 3)+10*x+6
| extend y = y + (rand() - 0.5)*0.5*y
| summarize x=make_list(x), y=make_list(y)
| extend y_lowess = dynamic(null)
| invoke series_fit_lowess_fl('y', 'y_lowess', 15, 'x')
| extend series_fit_poly(y, x, 5)
| project x, y, y_lowess, y_polynomial=series_fit_poly_y_poly_fit
| render linechart
Stored
range x from 1 to 200 step 1
| project x = rand()*5 - 2.3
| extend y = pow(x, 5)-8*pow(x, 3)+10*x+6
| extend y = y + (rand() - 0.5)*0.5*y
| summarize x=make_list(x), y=make_list(y)
| extend y_lowess = dynamic(null)
| invoke series_fit_lowess_fl('y', 'y_lowess', 15, 'x')
| extend series_fit_poly(y, x, 5)
| project x, y, y_lowess, y_polynomial=series_fit_poly_y_poly_fit
| render linechart
Output
5.41 - series_fit_poly_fl()
The function series_fit_poly_fl()
is a user-defined function (UDF) that applies a polynomial regression on a series. This function takes a table containing multiple series (dynamic numerical arrays) and generates the best fit high-order polynomial for each series using polynomial regression. This function returns both the polynomial coefficients and the interpolated polynomial over the range of the series.
Syntax
T | invoke series_fit_poly_fl(
y_series,
y_fit_series,
fit_coeff,
degree,
[ x_series ],
[ x_istime ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
y_series | string | ✔️ | The name of the input table column containing the dependent variable. That is, the series to fit. |
y_fit_series | string | ✔️ | The name of the column to store the best fit series. |
fit_coeff | string | ✔️ | The name of the column to store the best fit polynomial coefficients. |
degree | int | ✔️ | The required order of the polynomial to fit. For example, 1 for linear regression, 2 for quadratic regression, and so on. |
x_series | string | The name of the column containing the independent variable, that is, the x or time axis. This parameter is optional, and is needed only for unevenly spaced series. The default value is an empty string, as x is redundant for the regression of an evenly spaced series. | |
x_istime | bool | This parameter is needed only if x_series is specified and it’s a vector of datetime. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_fit_poly_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_coeff:string, degree:int, x_series:string='', x_istime:bool=False)
{
let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_coeff', fit_coeff, 'degree', degree, 'x_series', x_series, 'x_istime', x_istime);
let code = ```if 1:
y_series = kargs["y_series"]
y_fit_series = kargs["y_fit_series"]
fit_coeff = kargs["fit_coeff"]
degree = kargs["degree"]
x_series = kargs["x_series"]
x_istime = kargs["x_istime"]
def fit(ts_row, x_col, y_col, deg):
y = ts_row[y_col]
if x_col == "": # If there is no x column creates sequential range [1, len(y)]
x = np.arange(len(y)) + 1
else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
if x_istime:
x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
x = x - x.min()
x = x / x.max()
x = x * (len(x) - 1) + 1
else:
x = ts_row[x_col]
coeff = np.polyfit(x, y, deg)
p = np.poly1d(coeff)
z = p(x)
return z, coeff
result = df
if len(df):
result[[y_fit_series, fit_coeff]] = df.apply(fit, axis=1, args=(x_series, y_series, degree,), result_type="expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Fit a polynomial of a specified degree to a series")
series_fit_poly_fl(tbl:(*), y_series:string, y_fit_series:string, fit_coeff:string, degree:int, x_series:string='', x_istime:bool=false)
{
let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_coeff', fit_coeff, 'degree', degree, 'x_series', x_series, 'x_istime', x_istime);
let code = ```if 1:
y_series = kargs["y_series"]
y_fit_series = kargs["y_fit_series"]
fit_coeff = kargs["fit_coeff"]
degree = kargs["degree"]
x_series = kargs["x_series"]
x_istime = kargs["x_istime"]
def fit(ts_row, x_col, y_col, deg):
y = ts_row[y_col]
if x_col == "": # If there is no x column creates sequential range [1, len(y)]
x = np.arange(len(y)) + 1
else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
if x_istime:
x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
x = x - x.min()
x = x / x.max()
x = x * (len(x) - 1) + 1
else:
x = ts_row[x_col]
coeff = np.polyfit(x, y, deg)
p = np.poly1d(coeff)
z = p(x)
return z, coeff
result = df
if len(df):
result[[y_fit_series, fit_coeff]] = df.apply(fit, axis=1, args=(x_series, y_series, degree,), result_type="expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
}
Examples
The following examples use the invoke operator to run the function.
Fit fifth order polynomial to a regular time series
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_fit_poly_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_coeff:string, degree:int, x_series:string='', x_istime:bool=False)
{
let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_coeff', fit_coeff, 'degree', degree, 'x_series', x_series, 'x_istime', x_istime);
let code = ```if 1:
y_series = kargs["y_series"]
y_fit_series = kargs["y_fit_series"]
fit_coeff = kargs["fit_coeff"]
degree = kargs["degree"]
x_series = kargs["x_series"]
x_istime = kargs["x_istime"]
def fit(ts_row, x_col, y_col, deg):
y = ts_row[y_col]
if x_col == "": # If there is no x column creates sequential range [1, len(y)]
x = np.arange(len(y)) + 1
else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
if x_istime:
x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
x = x - x.min()
x = x / x.max()
x = x * (len(x) - 1) + 1
else:
x = ts_row[x_col]
coeff = np.polyfit(x, y, deg)
p = np.poly1d(coeff)
z = p(x)
return z, coeff
result = df
if len(df):
result[[y_fit_series, fit_coeff]] = df.apply(fit, axis=1, args=(x_series, y_series, degree,), result_type="expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
//
// Fit fifth order polynomial to a regular (evenly spaced) time series, created with make-series
//
let max_t = datetime(2016-09-03);
demo_make_series1
| make-series num=count() on TimeStamp from max_t-1d to max_t step 5m by OsVer
| extend fnum = dynamic(null), coeff=dynamic(null), fnum1 = dynamic(null), coeff1=dynamic(null)
| invoke series_fit_poly_fl('num', 'fnum', 'coeff', 5)
| render timechart with(ycolumns=num, fnum)
Stored
//
// Fit fifth order polynomial to a regular (evenly spaced) time series, created with make-series
//
let max_t = datetime(2016-09-03);
demo_make_series1
| make-series num=count() on TimeStamp from max_t-1d to max_t step 5m by OsVer
| extend fnum = dynamic(null), coeff=dynamic(null), fnum1 = dynamic(null), coeff1=dynamic(null)
| invoke series_fit_poly_fl('num', 'fnum', 'coeff', 5)
| render timechart with(ycolumns=num, fnum)
Output
Test irregular time series
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_fit_poly_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_coeff:string, degree:int, x_series:string='', x_istime:bool=False)
{
let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_coeff', fit_coeff, 'degree', degree, 'x_series', x_series, 'x_istime', x_istime);
let code = ```if 1:
y_series = kargs["y_series"]
y_fit_series = kargs["y_fit_series"]
fit_coeff = kargs["fit_coeff"]
degree = kargs["degree"]
x_series = kargs["x_series"]
x_istime = kargs["x_istime"]
def fit(ts_row, x_col, y_col, deg):
y = ts_row[y_col]
if x_col == "": # If there is no x column creates sequential range [1, len(y)]
x = np.arange(len(y)) + 1
else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
if x_istime:
x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
x = x - x.min()
x = x / x.max()
x = x * (len(x) - 1) + 1
else:
x = ts_row[x_col]
coeff = np.polyfit(x, y, deg)
p = np.poly1d(coeff)
z = p(x)
return z, coeff
result = df
if len(df):
result[[y_fit_series, fit_coeff]] = df.apply(fit, axis=1, args=(x_series, y_series, degree,), result_type="expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
let max_t = datetime(2016-09-03);
demo_make_series1
| where TimeStamp between ((max_t-2d)..max_t)
| summarize num=count() by bin(TimeStamp, 5m), OsVer
| order by TimeStamp asc
| where hourofday(TimeStamp) % 6 != 0 // delete every 6th hour to create unevenly spaced time series
| summarize TimeStamp=make_list(TimeStamp), num=make_list(num) by OsVer
| extend fnum = dynamic(null), coeff=dynamic(null)
| invoke series_fit_poly_fl('num', 'fnum', 'coeff', 8, 'TimeStamp', True)
| render timechart with(ycolumns=num, fnum)
Stored
let max_t = datetime(2016-09-03);
demo_make_series1
| where TimeStamp between ((max_t-2d)..max_t)
| summarize num=count() by bin(TimeStamp, 5m), OsVer
| order by TimeStamp asc
| where hourofday(TimeStamp) % 6 != 0 // delete every 6th hour to create unevenly spaced time series
| summarize TimeStamp=make_list(TimeStamp), num=make_list(num) by OsVer
| extend fnum = dynamic(null), coeff=dynamic(null)
| invoke series_fit_poly_fl('num', 'fnum', 'coeff', 8, 'TimeStamp', True)
| render timechart with(ycolumns=num, fnum)
Output
Fifth order polynomial with noise on x & y axes
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_fit_poly_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_coeff:string, degree:int, x_series:string='', x_istime:bool=False)
{
let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_coeff', fit_coeff, 'degree', degree, 'x_series', x_series, 'x_istime', x_istime);
let code = ```if 1:
y_series = kargs["y_series"]
y_fit_series = kargs["y_fit_series"]
fit_coeff = kargs["fit_coeff"]
degree = kargs["degree"]
x_series = kargs["x_series"]
x_istime = kargs["x_istime"]
def fit(ts_row, x_col, y_col, deg):
y = ts_row[y_col]
if x_col == "": # If there is no x column creates sequential range [1, len(y)]
x = np.arange(len(y)) + 1
else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
if x_istime:
x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
x = x - x.min()
x = x / x.max()
x = x * (len(x) - 1) + 1
else:
x = ts_row[x_col]
coeff = np.polyfit(x, y, deg)
p = np.poly1d(coeff)
z = p(x)
return z, coeff
result = df
if len(df):
result[[y_fit_series, fit_coeff]] = df.apply(fit, axis=1, args=(x_series, y_series, degree,), result_type="expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
range x from 1 to 200 step 1
| project x = rand()*5 - 2.3
| extend y = pow(x, 5)-8*pow(x, 3)+10*x+6
| extend y = y + (rand() - 0.5)*0.5*y
| summarize x=make_list(x), y=make_list(y)
| extend y_fit = dynamic(null), coeff=dynamic(null)
| invoke series_fit_poly_fl('y', 'y_fit', 'coeff', 5, 'x')
|fork (project-away coeff) (project coeff | mv-expand coeff)
| render linechart
Stored
range x from 1 to 200 step 1
| project x = rand()*5 - 2.3
| extend y = pow(x, 5)-8*pow(x, 3)+10*x+6
| extend y = y + (rand() - 0.5)*0.5*y
| summarize x=make_list(x), y=make_list(y)
| extend y_fit = dynamic(null), coeff=dynamic(null)
| invoke series_fit_poly_fl('y', 'y_fit', 'coeff', 5, 'x')
|fork (project-away coeff) (project coeff | mv-expand coeff)
| render linechart
Output
5.42 - series_lag_fl()
Applies a lag on a series.
The function series_lag_fl()
is a user-defined function (UDF) that takes an expression containing a dynamic numerical array as input and shifts it backward. It’s commonly used for shifting time series to test whether a pattern is new or it matches historical data.
Syntax
series_lag_fl(
y_series,
offset)
Parameters
Name | Type | Required | Description |
---|---|---|---|
y_series | dynamic | ✔️ | An array cell of numeric values. |
offset | int | ✔️ | An integer specifying the required offset in bins. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_lag_fl = (series:dynamic, offset:int)
{
let lag_f = toscalar(range x from 1 to offset+1 step 1
| project y=iff(x == offset+1, 1, 0)
| summarize lag_filter = make_list(y));
fir(series, lag_f, false)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Shift a series by a specified offset")
series_lag_fl(series:dynamic, offset:int)
{
let lag_f = toscalar(range x from 1 to offset+1 step 1
| project y=iff(x == offset+1, 1, 0)
| summarize lag_filter = make_list(y));
fir(series, lag_f, false)
}
Example
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_lag_fl = (series:dynamic, offset:int)
{
let lag_f = toscalar(range x from 1 to offset+1 step 1
| project y=iff(x == offset+1, 1, 0)
| summarize lag_filter = make_list(y));
fir(series, lag_f, false)
};
let dt = 1h;
let time_shift = 1d;
let bins_shift = toint(time_shift/dt);
demo_make_series1
| make-series num=count() on TimeStamp step dt by OsVer
| extend num_shifted=series_lag_fl(num, bins_shift)
| render timechart
Stored
let dt = 1h;
let time_shift = 1d;
let bins_shift = toint(time_shift/dt);
demo_make_series1
| make-series num=count() on TimeStamp step dt by OsVer
| extend num_shifted=series_lag_fl(num, bins_shift)
| render timechart
Output
5.43 - series_metric_fl()
The series_metric_fl()
function is a user-defined function (UDF) that selects and retrieves time series of metrics ingested to your database using the Prometheus monitoring system. This function assumes the data stored in your database is structured following the Prometheus data model. Specifically, each record contains:
- timestamp
- metric name
- metric value
- a variable set of labels (
"key":"value"
pairs)
Prometheus defines a time series by its metric name and a distinct set of labels. You can retrieve sets of time series using Prometheus Query Language (PromQL) by specifying the metric name and time series selector (a set of labels).
Syntax
T | invoke series_metric_fl(
timestamp_col,
name_col,
labels_col,
value_col,
metric_name,
labels_selector,
lookback,
offset)
Parameters
Name | Type | Required | Description |
---|---|---|---|
timestamp_col | string | ✔️ | The name of the column containing the timestamp. |
name_col | string | ✔️ | The name of the column containing the metric name. |
labels_col | string | ✔️ | The name of the column containing the labels dictionary. |
value_col | string | ✔️ | The name of the column containing the metric value. |
metric_name | string | ✔️ | The metric time series to retrieve. |
labels_selector | string | Time series selector string, similar to PromQL. It’s a string containing a list of "key":"value" pairs, for example '"key1":"val1","key2":"val2"' . The default is an empty string, which means no filtering. Note that regular expressions are not supported. | |
lookback | timespan | The range vector to retrieve, similar to PromQL. The default is 10 minutes. | |
offset | datetime | Offset back from current time to retrieve, similar to PromQL. Data is retrieved from ago(offset)-lookback to ago(offset). The default is 0, which means that data is retrieved up to now() . |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_metric_fl=(metrics_tbl:(*), timestamp_col:string, name_col:string, labels_col:string, value_col:string, metric_name:string, labels_selector:string='', lookback:timespan=timespan(10m), offset:timespan=timespan(0))
{
let selector_d=iff(labels_selector == '', dynamic(['']), split(labels_selector, ','));
let etime = ago(offset);
let stime = etime - lookback;
metrics_tbl
| extend timestamp = column_ifexists(timestamp_col, datetime(null)), name = column_ifexists(name_col, ''), labels = column_ifexists(labels_col, dynamic(null)), value = column_ifexists(value_col, 0)
| extend labels = dynamic_to_json(labels) // convert to string and sort by key
| where name == metric_name and timestamp between(stime..etime)
| order by timestamp asc
| summarize timestamp = make_list(timestamp), value=make_list(value) by name, labels
| where labels has_all (selector_d)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create function with (folder = "Packages\\Series", docstring = "Selecting & retrieving metrics like PromQL")
series_metric_fl(metrics_tbl:(*), timestamp_col:string, name_col:string, labels_col:string, value_col:string, metric_name:string, labels_selector:string='', lookback:timespan=timespan(10m), offset:timespan=timespan(0))
{
let selector_d=iff(labels_selector == '', dynamic(['']), split(labels_selector, ','));
let etime = ago(offset);
let stime = etime - lookback;
metrics_tbl
| extend timestamp = column_ifexists(timestamp_col, datetime(null)), name = column_ifexists(name_col, ''), labels = column_ifexists(labels_col, dynamic(null)), value = column_ifexists(value_col, 0)
| extend labels = dynamic_to_json(labels) // convert to string and sort by key
| where name == metric_name and timestamp between(stime..etime)
| order by timestamp asc
| summarize timestamp = make_list(timestamp), value=make_list(value) by name, labels
| where labels has_all (selector_d)
}
Examples
The following examples use the invoke operator to run the function.
With specifying selector
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_metric_fl=(metrics_tbl:(*), timestamp_col:string, name_col:string, labels_col:string, value_col:string, metric_name:string, labels_selector:string='', lookback:timespan=timespan(10m), offset:timespan=timespan(0))
{
let selector_d=iff(labels_selector == '', dynamic(['']), split(labels_selector, ','));
let etime = ago(offset);
let stime = etime - lookback;
metrics_tbl
| extend timestamp = column_ifexists(timestamp_col, datetime(null)), name = column_ifexists(name_col, ''), labels = column_ifexists(labels_col, dynamic(null)), value = column_ifexists(value_col, 0)
| extend labels = dynamic_to_json(labels) // convert to string and sort by key
| where name == metric_name and timestamp between(stime..etime)
| order by timestamp asc
| summarize timestamp = make_list(timestamp), value=make_list(value) by name, labels
| where labels has_all (selector_d)
};
demo_prometheus
| invoke series_metric_fl('TimeStamp', 'Name', 'Labels', 'Val', 'writes', '"disk":"sda1","host":"aks-agentpool-88086459-vmss000001"', offset=now()-datetime(2020-12-08 00:00))
| render timechart with(series=labels)
Stored
demo_prometheus
| invoke series_metric_fl('TimeStamp', 'Name', 'Labels', 'Val', 'writes', '"disk":"sda1","host":"aks-agentpool-88086459-vmss000001"', offset=now()-datetime(2020-12-08 00:00))
| render timechart with(series=labels)
Output
Without specifying selector
The following example doesn’t specify selector, so all ‘writes’ metrics are selected. This example assumes that the function is already installed, and uses alternative direct calling syntax, specifying the input table as the first parameter:
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_metric_fl=(metrics_tbl:(*), timestamp_col:string, name_col:string, labels_col:string, value_col:string, metric_name:string, labels_selector:string='', lookback:timespan=timespan(10m), offset:timespan=timespan(0))
{
let selector_d=iff(labels_selector == '', dynamic(['']), split(labels_selector, ','));
let etime = ago(offset);
let stime = etime - lookback;
metrics_tbl
| extend timestamp = column_ifexists(timestamp_col, datetime(null)), name = column_ifexists(name_col, ''), labels = column_ifexists(labels_col, dynamic(null)), value = column_ifexists(value_col, 0)
| extend labels = dynamic_to_json(labels) // convert to string and sort by key
| where name == metric_name and timestamp between(stime..etime)
| order by timestamp asc
| summarize timestamp = make_list(timestamp), value=make_list(value) by name, labels
| where labels has_all (selector_d)
};
series_metric_fl(demo_prometheus, 'TimeStamp', 'Name', 'Labels', 'Val', 'writes', offset=now()-datetime(2020-12-08 00:00))
| render timechart with(series=labels, ysplit=axes)
Stored
series_metric_fl(demo_prometheus, 'TimeStamp', 'Name', 'Labels', 'Val', 'writes', offset=now()-datetime(2020-12-08 00:00))
| render timechart with(series=labels, ysplit=axes)
Output
5.44 - series_monthly_decompose_anomalies_fl()
Detect anomalous points in a daily series with monthly seasonality.
The function series_monthly_decompose_anomalies_fl()
is a user-defined function (UDF) that detects anomalies in multiple time series that have monthly seasonality. The function is built on top of series_decompose_anomalies(). The challenge is that the length of a month is variable between 28 to 31 days, so building a baseline by using series_decompose_anomalies() out of the box detects fixed seasonality thus fails to match spikes or other patterns that occur in the 1st or other day in each month.
Syntax
series_monthly_decompose_anomalies_fl(
threshold)
Parameters
Name | Type | Required | Description |
---|---|---|---|
threshold | real | Anomaly threshold. Default is 1.5. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_monthly_decompose_anomalies_fl=(tbl:(_key:string, _date:datetime, _val:real), threshold:real=1.5)
{
let _tbl=materialize(tbl
| extend _year=getyear(_date), _dom = dayofmonth(_date), _moy=monthofyear(_date), _doy=dayofyear(_date)
| extend _vdoy = 31*(_moy-1)+_dom // virtual day of year (assuming all months have 31 days)
);
let median_tbl = _tbl | summarize p50=percentiles(_val, 50) by _key, _dom;
let keys = _tbl | summarize by _key | extend dummy=1;
let years = _tbl | summarize by _year | extend dummy=1;
let vdoys = range _vdoy from 0 to 31*12-1 step 1 | extend _moy=_vdoy/31+1, _vdom=_vdoy%31+1, _vdoy=_vdoy+1 | extend dummy=1
| join kind=fullouter years on dummy | join kind=fullouter keys on dummy | project-away dummy, dummy1, dummy2;
vdoys
| join kind=leftouter _tbl on _key, _year, _vdoy
| project-away _key1, _year1, _moy1, _vdoy1
| extend _adoy=31*12*_year+_doy, _vadoy = 31*12*_year+_vdoy
| partition by _key (as T
| where _vadoy >= toscalar(T | summarize (_adoy, _vadoy)=arg_min(_adoy, _vadoy) | project _vadoy) and
_vadoy <= toscalar(T | summarize (_adoy, _vadoy)=arg_max(_adoy, _vadoy) | project _vadoy)
)
| join kind=inner median_tbl on _key, $left._vdom == $right._dom
| extend _vval = coalesce(_val, p50)
//| order by _key asc, _vadoy asc // for debugging
| make-series _vval=avg(_vval), _date=any(_date) default=datetime(null) on _vadoy step 1 by _key
| extend (anomalies, score, baseline) = series_decompose_anomalies(_vval, threshold, 31)
| mv-expand _date to typeof(datetime), _vval to typeof(real), _vadoy to typeof(long), anomalies to typeof(int), score to typeof(real), baseline to typeof(real)
| project-away _vadoy
| project-rename _val=_vval
| where isnotnull(_date)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Anomaly Detection for daily time series with monthly seasonality")
series_monthly_decompose_anomalies_fl(tbl:(_key:string, _date:datetime, _val:real), threshold:real=1.5)
{
let _tbl=materialize(tbl
| extend _year=getyear(_date), _dom = dayofmonth(_date), _moy=monthofyear(_date), _doy=dayofyear(_date)
| extend _vdoy = 31*(_moy-1)+_dom // virtual day of year (assuming all months have 31 days)
);
let median_tbl = _tbl | summarize p50=percentiles(_val, 50) by _key, _dom;
let keys = _tbl | summarize by _key | extend dummy=1;
let years = _tbl | summarize by _year | extend dummy=1;
let vdoys = range _vdoy from 0 to 31*12-1 step 1 | extend _moy=_vdoy/31+1, _vdom=_vdoy%31+1, _vdoy=_vdoy+1 | extend dummy=1
| join kind=fullouter years on dummy | join kind=fullouter keys on dummy | project-away dummy, dummy1, dummy2;
vdoys
| join kind=leftouter _tbl on _key, _year, _vdoy
| project-away _key1, _year1, _moy1, _vdoy1
| extend _adoy=31*12*_year+_doy, _vadoy = 31*12*_year+_vdoy
| partition by _key (as T
| where _vadoy >= toscalar(T | summarize (_adoy, _vadoy)=arg_min(_adoy, _vadoy) | project _vadoy) and
_vadoy <= toscalar(T | summarize (_adoy, _vadoy)=arg_max(_adoy, _vadoy) | project _vadoy)
)
| join kind=inner median_tbl on _key, $left._vdom == $right._dom
| extend _vval = coalesce(_val, p50)
//| order by _key asc, _vadoy asc // for debugging
| make-series _vval=avg(_vval), _date=any(_date) default=datetime(null) on _vadoy step 1 by _key
| extend (anomalies, score, baseline) = series_decompose_anomalies(_vval, threshold, 31)
| mv-expand _date to typeof(datetime), _vval to typeof(real), _vadoy to typeof(long), anomalies to typeof(int), score to typeof(real), baseline to typeof(real)
| project-away _vadoy
| project-rename _val=_vval
| where isnotnull(_date)
}
Example
The input table must contain _key
, _date
and _val
columns. The query builds a set of time series of _val
per each _key
and adds anomalies, score and baseline columns.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_monthly_decompose_anomalies_fl=(tbl:(_key:string, _date:datetime, _val:real), threshold:real=1.5)
{
let _tbl=materialize(tbl
| extend _year=getyear(_date), _dom = dayofmonth(_date), _moy=monthofyear(_date), _doy=dayofyear(_date)
| extend _vdoy = 31*(_moy-1)+_dom // virtual day of year (assuming all months have 31 days)
);
let median_tbl = _tbl | summarize p50=percentiles(_val, 50) by _key, _dom;
let keys = _tbl | summarize by _key | extend dummy=1;
let years = _tbl | summarize by _year | extend dummy=1;
let vdoys = range _vdoy from 0 to 31*12-1 step 1 | extend _moy=_vdoy/31+1, _vdom=_vdoy%31+1, _vdoy=_vdoy+1 | extend dummy=1
| join kind=fullouter years on dummy | join kind=fullouter keys on dummy | project-away dummy, dummy1, dummy2;
vdoys
| join kind=leftouter _tbl on _key, _year, _vdoy
| project-away _key1, _year1, _moy1, _vdoy1
| extend _adoy=31*12*_year+_doy, _vadoy = 31*12*_year+_vdoy
| partition by _key (as T
| where _vadoy >= toscalar(T | summarize (_adoy, _vadoy)=arg_min(_adoy, _vadoy) | project _vadoy) and
_vadoy <= toscalar(T | summarize (_adoy, _vadoy)=arg_max(_adoy, _vadoy) | project _vadoy)
)
| join kind=inner median_tbl on _key, $left._vdom == $right._dom
| extend _vval = coalesce(_val, p50)
//| order by _key asc, _vadoy asc // for debugging
| make-series _vval=avg(_vval), _date=any(_date) default=datetime(null) on _vadoy step 1 by _key
| extend (anomalies, score, baseline) = series_decompose_anomalies(_vval, threshold, 31)
| mv-expand _date to typeof(datetime), _vval to typeof(real), _vadoy to typeof(long), anomalies to typeof(int), score to typeof(real), baseline to typeof(real)
| project-away _vadoy
| project-rename _val=_vval
| where isnotnull(_date)
};
demo_monthly_ts
| project _key=key, _date=ts, _val=val
| invoke series_monthly_decompose_anomalies_fl()
| project-rename key=_key, ts=_date, val=_val
| render anomalychart with(anomalycolumns=anomalies, xcolumn=ts, ycolumns=val)
Stored
demo_monthly_ts
| project _key=key, _date=ts, _val=val
| invoke series_monthly_decompose_anomalies_fl()
| project-rename key=_key, ts=_date, val=_val
| render anomalychart with(anomalycolumns=anomalies, xcolumn=ts, ycolumns=val)
Output
Series A with monthly anomalies:
Series B with monthly anomalies:
5.45 - series_moving_avg_fl()
Applies a moving average filter on a series.
The function series_moving_avg_fl()
is a user-defined function (UDF) that takes an expression containing a dynamic numerical array as input and applies on it a simple moving average filter.
Syntax
series_moving_avg_fl(
y_series,
n [,
center ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
y_series | dynamic | ✔️ | An array cell of numeric values. |
n | int | ✔️ | The width of the moving average filter. |
center | bool | Indicates whether the moving average is either applied symmetrically on a window before and after the current point or applied on a window from the current point backwards. By default, center is false . |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_moving_avg_fl = (y_series:dynamic, n:int, center:bool=false)
{
series_fir(y_series, repeat(1, n), true, center)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Calculate moving average of specified width")
series_moving_avg_fl(y_series:dynamic, n:int, center:bool=false)
{
series_fir(y_series, repeat(1, n), true, center)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_moving_avg_fl = (y_series:dynamic, n:int, center:bool=false)
{
series_fir(y_series, repeat(1, n), true, center)
};
//
// Moving average of 5 bins
//
demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| extend num_ma=series_moving_avg_fl(num, 5, True)
| render timechart
Stored
//
// Moving average of 5 bins
//
demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| extend num_ma=series_moving_avg_fl(num, 5, True)
| render timechart
Output
5.46 - series_moving_var_fl()
Applies a moving variance filter on a series.
The function series_moving_var_fl()
is a user-defined function (UDF) that takes an expression containing a dynamic numerical array as input and applies on it a moving variance filter.
Syntax
series_moving_var_fl(
y_series,
n [,
center ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
y_series | dynamic | ✔️ | An array cell of numeric values. |
n | int | ✔️ | The width of the moving variance filter. |
center | bool | Indicates whether the moving variance is either applied symmetrically on a window before and after the current point or applied on a window from the current point backwards. By default, center is false . |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_moving_var_fl = (y_series:dynamic, n:int, center:bool=false)
{
let ey = series_fir(y_series, repeat(1, n), true, center);
let e2y = series_multiply(ey, ey);
let y2 = series_multiply(y_series, y_series);
let ey2 = series_fir(y2, repeat(1, n), true, center);
let var_series = series_subtract(ey2, e2y);
var_series
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Calculate moving variance of specified width")
series_moving_var_fl(y_series:dynamic, n:int, center:bool=false)
{
let ey = series_fir(y_series, repeat(1, n), true, center);
let e2y = series_multiply(ey, ey);
let y2 = series_multiply(y_series, y_series);
let ey2 = series_fir(y2, repeat(1, n), true, center);
let var_series = series_subtract(ey2, e2y);
var_series
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_moving_var_fl = (y_series:dynamic, n:int, center:bool=false)
{
let ey = series_fir(y_series, repeat(1, n), true, center);
let e2y = series_multiply(ey, ey);
let y2 = series_multiply(y_series, y_series);
let ey2 = series_fir(y2, repeat(1, n), true, center);
let var_series = series_subtract(ey2, e2y);
var_series
}
;
let sinewave=(x:double, period:double, gain:double=1.0, phase:double=0.0)
{
gain*sin(2*pi()/period*(x+phase))
}
;
let n=128;
let T=10;
let window=T*2;
union
(range x from 0 to n-1 step 1 | extend y=sinewave(x, T)),
(range x from n to 2*n-1 step 1 | extend y=0.0),
(range x from 2*n to 3*n-1 step 1 | extend y=sinewave(x, T)),
(range x from 3*n to 4*n-1 step 1 | extend y=(x-3.0*n)/128.0),
(range x from 4*n to 5*n-1 step 1 | extend y=sinewave(x, T))
| order by x asc
| summarize x=make_list(x), y=make_list(y)
| extend y_var=series_moving_var_fl(y, T, true)
| render linechart
Stored
let sinewave=(x:double, period:double, gain:double=1.0, phase:double=0.0)
{
gain*sin(2*pi()/period*(x+phase))
}
;
let n=128;
let T=10;
let window=T*2;
union
(range x from 0 to n-1 step 1 | extend y=sinewave(x, T)),
(range x from n to 2*n-1 step 1 | extend y=0.0),
(range x from 2*n to 3*n-1 step 1 | extend y=sinewave(x, T)),
(range x from 3*n to 4*n-1 step 1 | extend y=(x-3.0*n)/128.0),
(range x from 4*n to 5*n-1 step 1 | extend y=sinewave(x, T))
| order by x asc
| summarize x=make_list(x), y=make_list(y)
| extend y_var=series_moving_var_fl(y, T, true)
| render linechart
Output
5.47 - series_mv_ee_anomalies_fl()
The function series_mv_ee_anomalies_fl()
is a user-defined function (UDF) that detects multivariate anomalies in series by applying elliptic envelope model from scikit-learn. This model assumes that the source of the multivariate data is multi-dimensional normal distribution. The function accepts a set of series as numerical dynamic arrays, the names of the features columns and the expected percentage of anomalies out of the whole series. The function builds a multi-dimensional elliptical envelope for each series and marks the points that fall outside this normal envelope as anomalies.
Syntax
T | invoke series_mv_ee_anomalies_fl(
features_cols,
anomaly_col [,
score_col [,
anomalies_pct ]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
features_cols | dynamic | ✔️ | An array containing the names of the columns that are used for the multivariate anomaly detection model. |
anomaly_col | string | ✔️ | The name of the column to store the detected anomalies. |
score_col | string | The name of the column to store the scores of the anomalies. | |
anomalies_pct | real | A real number in the range [0-50] specifying the expected percentage of anomalies in the data. Default value: 4%. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
// Define function
let series_mv_ee_anomalies_fl=(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0)
{
let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct);
let code = ```if 1:
from sklearn.covariance import EllipticEnvelope
features_cols = kargs['features_cols']
anomaly_col = kargs['anomaly_col']
score_col = kargs['score_col']
anomalies_pct = kargs['anomalies_pct']
dff = df[features_cols]
ellipsoid = EllipticEnvelope(contamination=anomalies_pct/100.0)
for i in range(len(dff)):
dffi = dff.iloc[[i], :]
dffe = dffi.explode(features_cols)
ellipsoid.fit(dffe)
df.loc[i, anomaly_col] = (ellipsoid.predict(dffe) < 0).astype(int).tolist()
if score_col != '':
df.loc[i, score_col] = ellipsoid.decision_function(dffe).tolist()
result = df
```;
tbl
| evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Anomaly Detection for multi dimensional normally distributed data using elliptical envelope model")
series_mv_ee_anomalies_fl(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0)
{
let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct);
let code = ```if 1:
from sklearn.covariance import EllipticEnvelope
features_cols = kargs['features_cols']
anomaly_col = kargs['anomaly_col']
score_col = kargs['score_col']
anomalies_pct = kargs['anomalies_pct']
dff = df[features_cols]
ellipsoid = EllipticEnvelope(contamination=anomalies_pct/100.0)
for i in range(len(dff)):
dffi = dff.iloc[[i], :]
dffe = dffi.explode(features_cols)
ellipsoid.fit(dffe)
df.loc[i, anomaly_col] = (ellipsoid.predict(dffe) < 0).astype(int).tolist()
if score_col != '':
df.loc[i, score_col] = ellipsoid.decision_function(dffe).tolist()
result = df
```;
tbl
| evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
// Define function
let series_mv_ee_anomalies_fl=(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0)
{
let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct);
let code = ```if 1:
from sklearn.covariance import EllipticEnvelope
features_cols = kargs['features_cols']
anomaly_col = kargs['anomaly_col']
score_col = kargs['score_col']
anomalies_pct = kargs['anomalies_pct']
dff = df[features_cols]
ellipsoid = EllipticEnvelope(contamination=anomalies_pct/100.0)
for i in range(len(dff)):
dffi = dff.iloc[[i], :]
dffe = dffi.explode(features_cols)
ellipsoid.fit(dffe)
df.loc[i, anomaly_col] = (ellipsoid.predict(dffe) < 0).astype(int).tolist()
if score_col != '':
df.loc[i, score_col] = ellipsoid.decision_function(dffe).tolist()
result = df
```;
tbl
| evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
};
// Usage
normal_2d_with_anomalies
| extend anomalies=dynamic(null), scores=dynamic(null)
| invoke series_mv_ee_anomalies_fl(pack_array('x', 'y'), 'anomalies', 'scores')
| extend anomalies=series_multiply(80, anomalies)
| render timechart
Stored
normal_2d_with_anomalies
| extend anomalies=dynamic(null), scores=dynamic(null)
| invoke series_mv_ee_anomalies_fl(pack_array('x', 'y'), 'anomalies', 'scores')
| extend anomalies=series_multiply(80, anomalies)
| render timechart
Output
The table normal_2d_with_anomalies contains a set of 3 time series. Each time series has two-dimensional normal distribution with daily anomalies added at midnight, 8am, and 4pm respectively. You can create this sample dataset using an example query.
To view the data as a scatter chart, replace the usage code with the following:
normal_2d_with_anomalies
| extend anomalies=dynamic(null)
| invoke series_mv_ee_anomalies_fl(pack_array('x', 'y'), 'anomalies')
| where name == 'TS1'
| project x, y, anomalies
| mv-expand x to typeof(real), y to typeof(real), anomalies to typeof(string)
| render scatterchart with(series=anomalies)
You can see that on TS1 most of the midnight anomalies were detected using this multivariate model.
Create a sample dataset
.set normal_2d_with_anomalies <|
//
let window=14d;
let dt=1h;
let n=toint(window/dt);
let rand_normal_fl=(avg:real=0.0, stdv:real=1.0)
{
let x =rand()+rand()+rand()+rand()+rand()+rand()+rand()+rand()+rand()+rand()+rand()+rand();
(x - 6)*stdv + avg
};
union
(range s from 0 to n step 1
| project t=startofday(now())-s*dt
| extend x=rand_normal_fl(10, 5)
| extend y=iff(hourofday(t) == 0, 2*(10-x)+7+rand_normal_fl(0, 3), 2*x+7+rand_normal_fl(0, 3)) // anomalies every midnight
| extend name='TS1'),
(range s from 0 to n step 1
| project t=startofday(now())-s*dt
| extend x=rand_normal_fl(15, 3)
| extend y=iff(hourofday(t) == 8, (15-x)+10+rand_normal_fl(0, 2), x-7+rand_normal_fl(0, 1)) // anomalies every 8am
| extend name='TS2'),
(range s from 0 to n step 1
| project t=startofday(now())-s*dt
| extend x=rand_normal_fl(8, 6)
| extend y=iff(hourofday(t) == 16, x+5+rand_normal_fl(0, 4), (12-x)+rand_normal_fl(0, 4)) // anomalies every 4pm
| extend name='TS3')
| summarize t=make_list(t), x=make_list(x), y=make_list(y) by name
5.48 - series_mv_if_anomalies_fl()
The function series_mv_if_anomalies_fl()
is a user-defined function (UDF) that detects multivariate anomalies in series by applying isolation forest model from scikit-learn. The function accepts a set of series as numerical dynamic arrays, the names of the features columns and the expected percentage of anomalies out of the whole series. The function builds an ensemble of isolation trees for each series and marks the points that are quickly isolated as anomalies.
Syntax
T | invoke series_mv_if_anomalies_fl(
features_cols,
anomaly_col [,
score_col [,
anomalies_pct [,
num_trees [,
samples_pct ]]]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
features_cols | dynamic | ✔️ | An array containing the names of the columns that are used for the multivariate anomaly detection model. |
anomaly_col | string | ✔️ | The name of the column to store the detected anomalies. |
score_col | string | The name of the column to store the scores of the anomalies. | |
anomalies_pct | real | A real number in the range [0-50] specifying the expected percentage of anomalies in the data. Default value: 4%. | |
num_trees | int | The number of isolation trees to build for each time series. Default value: 100. | |
samples_pct | real | A real number in the range [0-100] specifying the percentage of samples used to build each tree. Default value: 100%, i.e. use the full series. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
// Define function
let series_mv_if_anomalies_fl=(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0, num_trees:int=100, samples_pct:real=100.0)
{
let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct, 'num_trees', num_trees, 'samples_pct', samples_pct);
let code = ```if 1:
from sklearn.ensemble import IsolationForest
features_cols = kargs['features_cols']
anomaly_col = kargs['anomaly_col']
score_col = kargs['score_col']
anomalies_pct = kargs['anomalies_pct']
num_trees = kargs['num_trees']
samples_pct = kargs['samples_pct']
dff = df[features_cols]
iforest = IsolationForest(contamination=anomalies_pct/100.0, random_state=0, n_estimators=num_trees, max_samples=samples_pct/100.0)
for i in range(len(dff)):
dffi = dff.iloc[[i], :]
dffe = dffi.explode(features_cols)
iforest.fit(dffe)
df.loc[i, anomaly_col] = (iforest.predict(dffe) < 0).astype(int).tolist()
if score_col != '':
df.loc[i, score_col] = iforest.decision_function(dffe).tolist()
result = df
```;
tbl
| evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Anomaly Detection for multi dimensional data using isolation forest model")
series_mv_if_anomalies_fl(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0, num_trees:int=100, samples_pct:real=100.0)
{
let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct, 'num_trees', num_trees, 'samples_pct', samples_pct);
let code = ```if 1:
from sklearn.ensemble import IsolationForest
features_cols = kargs['features_cols']
anomaly_col = kargs['anomaly_col']
score_col = kargs['score_col']
anomalies_pct = kargs['anomalies_pct']
num_trees = kargs['num_trees']
samples_pct = kargs['samples_pct']
dff = df[features_cols]
iforest = IsolationForest(contamination=anomalies_pct/100.0, random_state=0, n_estimators=num_trees, max_samples=samples_pct/100.0)
for i in range(len(dff)):
dffi = dff.iloc[[i], :]
dffe = dffi.explode(features_cols)
iforest.fit(dffe)
df.loc[i, anomaly_col] = (iforest.predict(dffe) < 0).astype(int).tolist()
if score_col != '':
df.loc[i, score_col] = iforest.decision_function(dffe).tolist()
result = df
```;
tbl
| evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
// Define function
let series_mv_if_anomalies_fl=(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0, num_trees:int=100, samples_pct:real=100.0)
{
let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct, 'num_trees', num_trees, 'samples_pct', samples_pct);
let code = ```if 1:
from sklearn.ensemble import IsolationForest
features_cols = kargs['features_cols']
anomaly_col = kargs['anomaly_col']
score_col = kargs['score_col']
anomalies_pct = kargs['anomalies_pct']
num_trees = kargs['num_trees']
samples_pct = kargs['samples_pct']
dff = df[features_cols]
iforest = IsolationForest(contamination=anomalies_pct/100.0, random_state=0, n_estimators=num_trees, max_samples=samples_pct/100.0)
for i in range(len(dff)):
dffi = dff.iloc[[i], :]
dffe = dffi.explode(features_cols)
iforest.fit(dffe)
df.loc[i, anomaly_col] = (iforest.predict(dffe) < 0).astype(int).tolist()
if score_col != '':
df.loc[i, score_col] = iforest.decision_function(dffe).tolist()
result = df
```;
tbl
| evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
};
// Usage
normal_2d_with_anomalies
| extend anomalies=dynamic(null), scores=dynamic(null)
| invoke series_mv_if_anomalies_fl(pack_array('x', 'y'), 'anomalies', 'scores', anomalies_pct=8, num_trees=1000)
| extend anomalies=series_multiply(40, anomalies)
| render timechart
Stored
normal_2d_with_anomalies
| extend anomalies=dynamic(null), scores=dynamic(null)
| invoke series_mv_if_anomalies_fl(pack_array('x', 'y'), 'anomalies', 'scores', anomalies_pct=8, num_trees=1000)
| extend anomalies=series_multiply(40, anomalies)
| render timechart
Output
The table normal_2d_with_anomalies contains a set of 3 time series. Each time series has two-dimensional normal distribution with daily anomalies added at midnight, 8am, and 4pm respectively. You can create this sample dataset using an example query.
To view the data as a scatter chart, replace the usage code with the following:
normal_2d_with_anomalies
| extend anomalies=dynamic(null)
| invoke series_mv_if_anomalies_fl(pack_array('x', 'y'), 'anomalies')
| where name == 'TS1'
| project x, y, anomalies
| mv-expand x to typeof(real), y to typeof(real), anomalies to typeof(string)
| render scatterchart with(series=anomalies)
You can see that on TS2 most of the anomalies occurring at 8am were detected using this multivariate model.
5.49 - series_mv_oc_anomalies_fl()
The function series_mv_oc_anomalies_fl()
is a user-defined function (UDF) that detects multivariate anomalies in series by applying the One Class SVM model from scikit-learn. The function accepts a set of series as numerical dynamic arrays, the names of the features columns and the expected percentage of anomalies out of the whole series. The function trains one class SVM for each series and marks the points that fall outside the hyper sphere as anomalies.
Syntax
T | invoke series_mv_oc_anomalies_fl(
features_cols,
anomaly_col [,
score_col [,
anomalies_pct ]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
features_cols | dynamic | ✔️ | An array containing the names of the columns that are used for the multivariate anomaly detection model. |
anomaly_col | string | ✔️ | The name of the column to store the detected anomalies. |
score_col | string | The name of the column to store the scores of the anomalies. | |
anomalies_pct | real | A real number in the range [0-50] specifying the expected percentage of anomalies in the data. Default value: 4%. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_mv_oc_anomalies_fl=(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0)
{
let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct);
let code = ```if 1:
from sklearn.svm import OneClassSVM
features_cols = kargs['features_cols']
anomaly_col = kargs['anomaly_col']
score_col = kargs['score_col']
anomalies_pct = kargs['anomalies_pct']
dff = df[features_cols]
svm = OneClassSVM(nu=anomalies_pct/100.0)
for i in range(len(dff)):
dffi = dff.iloc[[i], :]
dffe = dffi.explode(features_cols)
svm.fit(dffe)
df.loc[i, anomaly_col] = (svm.predict(dffe) < 0).astype(int).tolist()
if score_col != '':
df.loc[i, score_col] = svm.decision_function(dffe).tolist()
result = df
```;
tbl
| evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
};
// Write your query to use the function.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Anomaly Detection for multi dimensional data using One Class SVM model")
series_mv_oc_anomalies_fl(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0)
{
let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct);
let code = ```if 1:
from sklearn.svm import OneClassSVM
features_cols = kargs['features_cols']
anomaly_col = kargs['anomaly_col']
score_col = kargs['score_col']
anomalies_pct = kargs['anomalies_pct']
dff = df[features_cols]
svm = OneClassSVM(nu=anomalies_pct/100.0)
for i in range(len(dff)):
dffi = dff.iloc[[i], :]
dffe = dffi.explode(features_cols)
svm.fit(dffe)
df.loc[i, anomaly_col] = (svm.predict(dffe) < 0).astype(int).tolist()
if score_col != '':
df.loc[i, score_col] = svm.decision_function(dffe).tolist()
result = df
```;
tbl
| evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_mv_oc_anomalies_fl=(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0)
{
let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct);
let code = ```if 1:
from sklearn.svm import OneClassSVM
features_cols = kargs['features_cols']
anomaly_col = kargs['anomaly_col']
score_col = kargs['score_col']
anomalies_pct = kargs['anomalies_pct']
dff = df[features_cols]
svm = OneClassSVM(nu=anomalies_pct/100.0)
for i in range(len(dff)):
dffi = dff.iloc[[i], :]
dffe = dffi.explode(features_cols)
svm.fit(dffe)
df.loc[i, anomaly_col] = (svm.predict(dffe) < 0).astype(int).tolist()
if score_col != '':
df.loc[i, score_col] = svm.decision_function(dffe).tolist()
result = df
```;
tbl
| evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
};
// Usage
normal_2d_with_anomalies
| extend anomalies=dynamic(null), scores=dynamic(null)
| invoke series_mv_oc_anomalies_fl(pack_array('x', 'y'), 'anomalies', 'scores', anomalies_pct=6)
| extend anomalies=series_multiply(80, anomalies)
| render timechart
Stored
normal_2d_with_anomalies
| extend anomalies=dynamic(null), scores=dynamic(null)
| invoke series_mv_oc_anomalies_fl(pack_array('x', 'y'), 'anomalies', 'scores', anomalies_pct=6)
| extend anomalies=series_multiply(80, anomalies)
| render timechart
Output
The table normal_2d_with_anomalies contains a set of 3 time series. Each time series has two-dimensional normal distribution with daily anomalies added at midnight, 8am, and 4pm respectively. You can create this sample dataset using an example query.
To view the data as a scatter chart, replace the usage code with the following:
normal_2d_with_anomalies
| extend anomalies=dynamic(null)
| invoke series_mv_oc_anomalies_fl(pack_array('x', 'y'), 'anomalies')
| where name == 'TS1'
| project x, y, anomalies
| mv-expand x to typeof(real), y to typeof(real), anomalies to typeof(string)
| render scatterchart with(series=anomalies)
You can see that on TS1 most of the anomalies occurring at midnights were detected using this multivariate model.
5.50 - series_rate_fl()
The function series_rate_fl()
is a user-defined function (UDF) that calculates the average rate of metric increase per second. Its logic follows PromQL rate() function. It should be used for time series of counter metrics ingested to your database by Prometheus monitoring system, and retrieved by series_metric_fl().
Syntax
T | invoke series_rate_fl(
[ n_bins [,
fix_reset ]])
T
is a table returned from series_metric_fl(). Its schema includes (timestamp:dynamic, name:string, labels:string, value:dynamic)
.
Parameters
Name | Type | Required | Description |
---|---|---|---|
n_bins | int | The number of bins to specify the gap between the extracted metric values for calculation of the rate. The function calculates the difference between the current sample and the one n_bins before, and divide it by the difference of their respective timestamps in seconds. The default is one bin. The default settings calculate irate(), the PromQL instantaneous rate function. | |
fix_reset | bool | Controls whether to check for counter resets and correct it like PromQL rate() function. The default is true . Set it to false to save redundant analysis in case no need to check for resets. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_rate_fl=(tbl:(timestamp:dynamic, value:dynamic), n_bins:int=1, fix_reset:bool=true)
{
tbl
| where fix_reset // Prometheus counters can only go up
| mv-apply value to typeof(double) on
( extend correction = iff(value < prev(value), prev(value), 0.0) // if the value decreases we assume it was reset to 0, so add last value
| extend cum_correction = row_cumsum(correction)
| extend corrected_value = value + cum_correction
| summarize value = make_list(corrected_value))
| union (tbl | where not(fix_reset))
| extend timestampS = array_shift_right(timestamp, n_bins), valueS = array_shift_right(value, n_bins)
| extend dt = series_subtract(timestamp, timestampS)
| extend dt = series_divide(dt, 1e7) // converts from ticks to seconds
| extend dv = series_subtract(value, valueS)
| extend rate = series_divide(dv, dt)
| project-away dt, dv, timestampS, value, valueS
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create function with (folder = "Packages\\Series", docstring = "Simulate PromQL rate()")
series_rate_fl(tbl:(timestamp:dynamic, value:dynamic), n_bins:int=1, fix_reset:bool=true)
{
tbl
| where fix_reset // Prometheus counters can only go up
| mv-apply value to typeof(double) on
( extend correction = iff(value < prev(value), prev(value), 0.0) // if the value decreases we assume it was reset to 0, so add last value
| extend cum_correction = row_cumsum(correction)
| extend corrected_value = value + cum_correction
| summarize value = make_list(corrected_value))
| union (tbl | where not(fix_reset))
| extend timestampS = array_shift_right(timestamp, n_bins), valueS = array_shift_right(value, n_bins)
| extend dt = series_subtract(timestamp, timestampS)
| extend dt = series_divide(dt, 1e7) // converts from ticks to seconds
| extend dv = series_subtract(value, valueS)
| extend rate = series_divide(dv, dt)
| project-away dt, dv, timestampS, value, valueS
}
Examples
The following examples use the invoke operator to run the function.
Calculate average rate of metric increase
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_rate_fl=(tbl:(timestamp:dynamic, value:dynamic), n_bins:int=1, fix_reset:bool=true)
{
tbl
| where fix_reset // Prometheus counters can only go up
| mv-apply value to typeof(double) on
( extend correction = iff(value < prev(value), prev(value), 0.0) // if the value decreases we assume it was reset to 0, so add last value
| extend cum_correction = row_cumsum(correction)
| extend corrected_value = value + cum_correction
| summarize value = make_list(corrected_value))
| union (tbl | where not(fix_reset))
| extend timestampS = array_shift_right(timestamp, n_bins), valueS = array_shift_right(value, n_bins)
| extend dt = series_subtract(timestamp, timestampS)
| extend dt = series_divide(dt, 1e7) // converts from ticks to seconds
| extend dv = series_subtract(value, valueS)
| extend rate = series_divide(dv, dt)
| project-away dt, dv, timestampS, value, valueS
};
//
demo_prometheus
| invoke series_metric_fl('TimeStamp', 'Name', 'Labels', 'Val', 'writes', offset=now()-datetime(2020-12-08 00:00))
| invoke series_rate_fl(2)
| render timechart with(series=labels)
Stored
demo_prometheus
| invoke series_metric_fl('TimeStamp', 'Name', 'Labels', 'Val', 'writes', offset=now()-datetime(2020-12-08 00:00))
| invoke series_rate_fl(2)
| render timechart with(series=labels)
Output
Selects the main disk of two hosts
The following example selects the main disk of two hosts, and assumes that the function is already installed. This example uses alternative direct calling syntax, specifying the input table as the first parameter:
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_rate_fl=(tbl:(timestamp:dynamic, value:dynamic), n_bins:int=1, fix_reset:bool=true)
{
tbl
| where fix_reset // Prometheus counters can only go up
| mv-apply value to typeof(double) on
( extend correction = iff(value < prev(value), prev(value), 0.0) // if the value decreases we assume it was reset to 0, so add last value
| extend cum_correction = row_cumsum(correction)
| extend corrected_value = value + cum_correction
| summarize value = make_list(corrected_value))
| union (tbl | where not(fix_reset))
| extend timestampS = array_shift_right(timestamp, n_bins), valueS = array_shift_right(value, n_bins)
| extend dt = series_subtract(timestamp, timestampS)
| extend dt = series_divide(dt, 1e7) // converts from ticks to seconds
| extend dv = series_subtract(value, valueS)
| extend rate = series_divide(dv, dt)
| project-away dt, dv, timestampS, value, valueS
};
//
series_rate_fl(series_metric_fl(demo_prometheus, 'TimeStamp', 'Name', 'Labels', 'Val', 'writes', '"disk":"sda1"', lookback=2h, offset=now()-datetime(2020-12-08 00:00)), n_bins=10)
| render timechart with(series=labels)
Stored
series_rate_fl(series_metric_fl(demo_prometheus, 'TimeStamp', 'Name', 'Labels', 'Val', 'writes', '"disk":"sda1"', lookback=2h, offset=now()-datetime(2020-12-08 00:00)), n_bins=10)
| render timechart with(series=labels)
Output
5.51 - series_rolling_fl()
The function series_rolling_fl()
is a user-defined function (UDF) that applies rolling aggregation on a series. It takes a table containing multiple series (dynamic numerical array) and applies, for each series, a rolling aggregation function.
Syntax
T | invoke series_rolling_fl(
y_series,
y_rolling_series,
n,
aggr,
aggr_params,
center)
Parameters
Name | Type | Required | Description |
---|---|---|---|
y_series | string | ✔️ | The name of the column that contains the series to fit. |
y_rolling_series | string | ✔️ | The name of the column to store the rolling aggregation series. |
n | int | ✔️ | The width of the rolling window. |
aggr | string | ✔️ | The name of the aggregation function to use. See aggregation functions. |
aggr_params | string | Optional parameters for the aggregation function. | |
center | bool | Indicates whether the rolling window is applied symmetrically before and after the current point or applied from the current point backwards. By default, center is false , for calculation on streaming data. |
Aggregation functions
This function supports any aggregation function from numpy or scipy.stats that calculates a scalar out of a series. The following list isn’t exhaustive:
sum
mean
min
max
ptp (max-min)
percentile
median
std
var
gmean
(geometric mean)hmean
(harmonic mean)mode
(most common value)moment
(nth moment)tmean
(trimmed mean)tmin
tmax
tstd
iqr
(inter quantile range)
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_rolling_fl = (tbl:(*), y_series:string, y_rolling_series:string, n:int, aggr:string, aggr_params:dynamic=dynamic([null]), center:bool=true)
{
let kwargs = bag_pack('y_series', y_series, 'y_rolling_series', y_rolling_series, 'n', n, 'aggr', aggr, 'aggr_params', aggr_params, 'center', center);
let code = ```if 1:
y_series = kargs["y_series"]
y_rolling_series = kargs["y_rolling_series"]
n = kargs["n"]
aggr = kargs["aggr"]
aggr_params = kargs["aggr_params"]
center = kargs["center"]
result = df
in_s = df[y_series]
func = getattr(np, aggr, None)
if not func:
import scipy.stats
func = getattr(scipy.stats, aggr)
if func:
result[y_rolling_series] = list(pd.Series(in_s[i]).rolling(n, center=center, min_periods=1).apply(func, args=tuple(aggr_params)).values for i in range(len(in_s)))
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Rolling window functions on a series")
series_rolling_fl(tbl:(*), y_series:string, y_rolling_series:string, n:int, aggr:string, aggr_params:dynamic, center:bool=true)
{
let kwargs = bag_pack('y_series', y_series, 'y_rolling_series', y_rolling_series, 'n', n, 'aggr', aggr, 'aggr_params', aggr_params, 'center', center);
let code = ```if 1:
y_series = kargs["y_series"]
y_rolling_series = kargs["y_rolling_series"]
n = kargs["n"]
aggr = kargs["aggr"]
aggr_params = kargs["aggr_params"]
center = kargs["center"]
result = df
in_s = df[y_series]
func = getattr(np, aggr, None)
if not func:
import scipy.stats
func = getattr(scipy.stats, aggr)
if func:
result[y_rolling_series] = list(pd.Series(in_s[i]).rolling(n, center=center, min_periods=1).apply(func, args=tuple(aggr_params)).values for i in range(len(in_s)))
```;
tbl
| evaluate python(typeof(*), code, kwargs)
}
Examples
The following examples use the invoke operator to run the function.
Calculate rolling median of 9 elements
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_rolling_fl = (tbl:(*), y_series:string, y_rolling_series:string, n:int, aggr:string, aggr_params:dynamic=dynamic([null]), center:bool=true)
{
let kwargs = bag_pack('y_series', y_series, 'y_rolling_series', y_rolling_series, 'n', n, 'aggr', aggr, 'aggr_params', aggr_params, 'center', center);
let code = ```if 1:
y_series = kargs["y_series"]
y_rolling_series = kargs["y_rolling_series"]
n = kargs["n"]
aggr = kargs["aggr"]
aggr_params = kargs["aggr_params"]
center = kargs["center"]
result = df
in_s = df[y_series]
func = getattr(np, aggr, None)
if not func:
import scipy.stats
func = getattr(scipy.stats, aggr)
if func:
result[y_rolling_series] = list(pd.Series(in_s[i]).rolling(n, center=center, min_periods=1).apply(func, args=tuple(aggr_params)).values for i in range(len(in_s)))
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
//
// Calculate rolling median of 9 elements
//
demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| extend rolling_med = dynamic(null)
| invoke series_rolling_fl('num', 'rolling_med', 9, 'median')
| render timechart
Stored
//
// Calculate rolling median of 9 elements
//
demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| extend rolling_med = dynamic(null)
| invoke series_rolling_fl('num', 'rolling_med', 9, 'median', dynamic([null]))
| render timechart
Output
Calculate rolling min, max & 75th percentile of 15 elements
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_rolling_fl = (tbl:(*), y_series:string, y_rolling_series:string, n:int, aggr:string, aggr_params:dynamic=dynamic([null]), center:bool=true)
{
let kwargs = bag_pack('y_series', y_series, 'y_rolling_series', y_rolling_series, 'n', n, 'aggr', aggr, 'aggr_params', aggr_params, 'center', center);
let code = ```if 1:
y_series = kargs["y_series"]
y_rolling_series = kargs["y_rolling_series"]
n = kargs["n"]
aggr = kargs["aggr"]
aggr_params = kargs["aggr_params"]
center = kargs["center"]
result = df
in_s = df[y_series]
func = getattr(np, aggr, None)
if not func:
import scipy.stats
func = getattr(scipy.stats, aggr)
if func:
result[y_rolling_series] = list(pd.Series(in_s[i]).rolling(n, center=center, min_periods=1).apply(func, args=tuple(aggr_params)).values for i in range(len(in_s)))
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
//
// Calculate rolling min, max & 75th percentile of 15 elements
//
demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| extend rolling_min = dynamic(null), rolling_max = dynamic(null), rolling_pct = dynamic(null)
| invoke series_rolling_fl('num', 'rolling_min', 15, 'min', dynamic([null]))
| invoke series_rolling_fl('num', 'rolling_max', 15, 'max', dynamic([null]))
| invoke series_rolling_fl('num', 'rolling_pct', 15, 'percentile', dynamic([75]))
| render timechart
Stored
//
// Calculate rolling min, max & 75th percentile of 15 elements
//
demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| extend rolling_min = dynamic(null), rolling_max = dynamic(null), rolling_pct = dynamic(null)
| invoke series_rolling_fl('num', 'rolling_min', 15, 'min', dynamic([null]))
| invoke series_rolling_fl('num', 'rolling_max', 15, 'max', dynamic([null]))
| invoke series_rolling_fl('num', 'rolling_pct', 15, 'percentile', dynamic([75]))
| render timechart
Output
Calculate the rolling trimmed mean
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_rolling_fl = (tbl:(*), y_series:string, y_rolling_series:string, n:int, aggr:string, aggr_params:dynamic=dynamic([null]), center:bool=true)
{
let kwargs = bag_pack('y_series', y_series, 'y_rolling_series', y_rolling_series, 'n', n, 'aggr', aggr, 'aggr_params', aggr_params, 'center', center);
let code = ```if 1:
y_series = kargs["y_series"]
y_rolling_series = kargs["y_rolling_series"]
n = kargs["n"]
aggr = kargs["aggr"]
aggr_params = kargs["aggr_params"]
center = kargs["center"]
result = df
in_s = df[y_series]
func = getattr(np, aggr, None)
if not func:
import scipy.stats
func = getattr(scipy.stats, aggr)
if func:
result[y_rolling_series] = list(pd.Series(in_s[i]).rolling(n, center=center, min_periods=1).apply(func, args=tuple(aggr_params)).values for i in range(len(in_s)))
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
range x from 1 to 100 step 1
| extend y=iff(x % 13 == 0, 2.0, iff(x % 23 == 0, -2.0, rand()))
| summarize x=make_list(x), y=make_list(y)
| extend yr = dynamic(null)
| invoke series_rolling_fl('y', 'yr', 7, 'tmean', pack_array(pack_array(-2, 2), pack_array(false, false))) // trimmed mean: ignoring values outside [-2,2] inclusive
| render linechart
Stored
range x from 1 to 100 step 1
| extend y=iff(x % 13 == 0, 2.0, iff(x % 23 == 0, -2.0, rand()))
| summarize x=make_list(x), y=make_list(y)
| extend yr = dynamic(null)
| invoke series_rolling_fl('y', 'yr', 7, 'tmean', pack_array(pack_array(-2, 2), pack_array(false, false))) // trimmed mean: ignoring values outside [-2,2] inclusive
| render linechart
Output
5.52 - series_shapes_fl()
The function series_shapes_fl()
is a user-defined function (UDF) that detects positive/negative trend or jump in a series. This function takes a table containing multiple time series (dynamic numerical array), and calculates trend and jump scores for each series. The output is a dictionary (dynamic) containing the scores.
Syntax
T | extend series_shapes_fl(
y_series,
advanced)
Parameters
Name | Type | Required | Description |
---|---|---|---|
y_series | dynamic | ✔️ | An array cell of numeric values. |
advanced | bool | The default is false . Set to true to output additional calculated parameters. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_shapes_fl=(series:dynamic, advanced:bool=false)
{
let n = array_length(series);
// calculate normal dynamic range between 10th and 90th percentiles
let xs = array_sort_asc(series);
let low_idx = tolong(n*0.1);
let high_idx = tolong(n*0.9);
let low_pct = todouble(xs[low_idx]);
let high_pct = todouble(xs[high_idx]);
let norm_range = high_pct-low_pct;
// trend score
let lf = series_fit_line_dynamic(series);
let slope = todouble(lf.slope);
let rsquare = todouble(lf.rsquare);
let rel_slope = abs(n*slope/norm_range);
let sign_slope = iff(slope >= 0.0, 1.0, -1.0);
let norm_slope = sign_slope*rel_slope/(rel_slope+0.1); // map rel_slope from [-Inf, +Inf] to [-1, 1]; 0.1 is a clibration constant
let trend_score = norm_slope*rsquare;
// jump score
let lf2=series_fit_2lines_dynamic(series);
let lslope = todouble(lf2.left.slope);
let rslope = todouble(lf2.right.slope);
let rsquare2 = todouble(lf2.rsquare);
let split_idx = tolong(lf2.split_idx);
let last_left = todouble(lf2.left.interception)+lslope*split_idx;
let first_right = todouble(lf2.right.interception)+rslope;
let jump = first_right-last_left;
let rel_jump = abs(jump/norm_range);
let sign_jump = iff(first_right >= last_left, 1.0, -1.0);
let norm_jump = sign_jump*rel_jump/(rel_jump+0.1); // map rel_jump from [-Inf, +Inf] to [-1, 1]; 0.1 is a clibration constant
let jump_score1 = norm_jump*rsquare2;
// filter for jumps that are not close to the series edges and the right slope has the same direction
let norm_rslope = abs(rslope/norm_range);
let jump_score = iff((sign_jump*rslope >= 0.0 or norm_rslope < 0.02) and split_idx between((0.1*n)..(0.9*n)), jump_score1, 0.0);
let res = iff(advanced, bag_pack("n", n, "low_pct", low_pct, "high_pct", high_pct, "norm_range", norm_range, "slope", slope, "rsquare", rsquare, "rel_slope", rel_slope, "norm_slope", norm_slope,
"trend_score", trend_score, "split_idx", split_idx, "jump", jump, "rsquare2", rsquare2, "last_left", last_left, "first_right", first_right, "rel_jump", rel_jump,
"lslope", lslope, "rslope", rslope, "norm_rslope", norm_rslope, "norm_jump", norm_jump, "jump_score", jump_score)
, bag_pack("trend_score", trend_score, "jump_score", jump_score));
res
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Series detector for positive/negative trend or step. Returns a dynamic with trend and jump scores")
series_shapes_fl(series:dynamic, advanced:bool=false)
{
let n = array_length(series);
// calculate normal dynamic range between 10th and 90th percentiles
let xs = array_sort_asc(series);
let low_idx = tolong(n*0.1);
let high_idx = tolong(n*0.9);
let low_pct = todouble(xs[low_idx]);
let high_pct = todouble(xs[high_idx]);
let norm_range = high_pct-low_pct;
// trend score
let lf = series_fit_line_dynamic(series);
let slope = todouble(lf.slope);
let rsquare = todouble(lf.rsquare);
let rel_slope = abs(n*slope/norm_range);
let sign_slope = iff(slope >= 0.0, 1.0, -1.0);
let norm_slope = sign_slope*rel_slope/(rel_slope+0.1); // map rel_slope from [-Inf, +Inf] to [-1, 1]; 0.1 is a clibration constant
let trend_score = norm_slope*rsquare;
// jump score
let lf2=series_fit_2lines_dynamic(series);
let lslope = todouble(lf2.left.slope);
let rslope = todouble(lf2.right.slope);
let rsquare2 = todouble(lf2.rsquare);
let split_idx = tolong(lf2.split_idx);
let last_left = todouble(lf2.left.interception)+lslope*split_idx;
let first_right = todouble(lf2.right.interception)+rslope;
let jump = first_right-last_left;
let rel_jump = abs(jump/norm_range);
let sign_jump = iff(first_right >= last_left, 1.0, -1.0);
let norm_jump = sign_jump*rel_jump/(rel_jump+0.1); // map rel_jump from [-Inf, +Inf] to [-1, 1]; 0.1 is a clibration constant
let jump_score1 = norm_jump*rsquare2;
// filter for jumps that are not close to the series edges and the right slope has the same direction
let norm_rslope = abs(rslope/norm_range);
let jump_score = iff((sign_jump*rslope >= 0.0 or norm_rslope < 0.02) and split_idx between((0.1*n)..(0.9*n)), jump_score1, 0.0);
let res = iff(advanced, bag_pack("n", n, "low_pct", low_pct, "high_pct", high_pct, "norm_range", norm_range, "slope", slope, "rsquare", rsquare, "rel_slope", rel_slope, "norm_slope", norm_slope,
"trend_score", trend_score, "split_idx", split_idx, "jump", jump, "rsquare2", rsquare2, "last_left", last_left, "first_right", first_right, "rel_jump", rel_jump,
"lslope", lslope, "rslope", rslope, "norm_rslope", norm_rslope, "norm_jump", norm_jump, "jump_score", jump_score)
, bag_pack("trend_score", trend_score, "jump_score", jump_score));
res
}
Example
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_shapes_fl=(series:dynamic, advanced:bool=false)
{
let n = array_length(series);
// calculate normal dynamic range between 10th and 90th percentiles
let xs = array_sort_asc(series);
let low_idx = tolong(n*0.1);
let high_idx = tolong(n*0.9);
let low_pct = todouble(xs[low_idx]);
let high_pct = todouble(xs[high_idx]);
let norm_range = high_pct-low_pct;
// trend score
let lf = series_fit_line_dynamic(series);
let slope = todouble(lf.slope);
let rsquare = todouble(lf.rsquare);
let rel_slope = abs(n*slope/norm_range);
let sign_slope = iff(slope >= 0.0, 1.0, -1.0);
let norm_slope = sign_slope*rel_slope/(rel_slope+0.1); // map rel_slope from [-Inf, +Inf] to [-1, 1]; 0.1 is a clibration constant
let trend_score = norm_slope*rsquare;
// jump score
let lf2=series_fit_2lines_dynamic(series);
let lslope = todouble(lf2.left.slope);
let rslope = todouble(lf2.right.slope);
let rsquare2 = todouble(lf2.rsquare);
let split_idx = tolong(lf2.split_idx);
let last_left = todouble(lf2.left.interception)+lslope*split_idx;
let first_right = todouble(lf2.right.interception)+rslope;
let jump = first_right-last_left;
let rel_jump = abs(jump/norm_range);
let sign_jump = iff(first_right >= last_left, 1.0, -1.0);
let norm_jump = sign_jump*rel_jump/(rel_jump+0.1); // map rel_jump from [-Inf, +Inf] to [-1, 1]; 0.1 is a clibration constant
let jump_score1 = norm_jump*rsquare2;
// filter for jumps that are not close to the series edges and the right slope has the same direction
let norm_rslope = abs(rslope/norm_range);
let jump_score = iff((sign_jump*rslope >= 0.0 or norm_rslope < 0.02) and split_idx between((0.1*n)..(0.9*n)), jump_score1, 0.0);
let res = iff(advanced, bag_pack("n", n, "low_pct", low_pct, "high_pct", high_pct, "norm_range", norm_range, "slope", slope, "rsquare", rsquare, "rel_slope", rel_slope, "norm_slope", norm_slope,
"trend_score", trend_score, "split_idx", split_idx, "jump", jump, "rsquare2", rsquare2, "last_left", last_left, "first_right", first_right, "rel_jump", rel_jump,
"lslope", lslope, "rslope", rslope, "norm_rslope", norm_rslope, "norm_jump", norm_jump, "jump_score", jump_score)
, bag_pack("trend_score", trend_score, "jump_score", jump_score));
res
};
let ts_len = 100;
let noise_pct = 2;
let noise_gain = 3;
union
(print tsid=1 | extend y = array_concat(repeat(20, ts_len/2), repeat(150, ts_len/2))),
(print tsid=2 | extend y = array_concat(repeat(0, ts_len*3/4), repeat(-50, ts_len/4))),
(print tsid=3 | extend y = range(40, 139, 1)),
(print tsid=4 | extend y = range(-20, -109, -1))
| extend x = range(1, array_length(y), 1)
//
| extend shapes = series_shapes_fl(y)
| order by tsid asc
| fork (take 4) (project tsid, shapes)
| render timechart with(series=tsid, xcolumn=x, ycolumns=y)
Stored
let ts_len = 100;
let noise_pct = 2;
let noise_gain = 3;
union
(print tsid=1 | extend y = array_concat(repeat(20, ts_len/2), repeat(150, ts_len/2))),
(print tsid=2 | extend y = array_concat(repeat(0, ts_len*3/4), repeat(-50, ts_len/4))),
(print tsid=3 | extend y = range(40, 139, 1)),
(print tsid=4 | extend y = range(-20, -109, -1))
| extend x = range(1, array_length(y), 1)
//
| extend shapes = series_shapes_fl(y)
| order by tsid asc
| fork (take 4) (project tsid, shapes)
| render timechart with(series=tsid, xcolumn=x, ycolumns=y)
Output
The respective trend and jump scores:
tsid shapes
1 {
"trend_score": 0.703199714530169,
"jump_score": 0.90909090909090906
}
2 {
"trend_score": -0.51663751343174869,
"jump_score": -0.90909090909090906
}
3 {
"trend_score": 0.92592592592592582,
"jump_score": 0.0
}
4 {
"trend_score": -0.92592592592592582,
"jump_score": 0.0
}
5.53 - series_uv_anomalies_fl()
The function series_uv_anomalies_fl()
is a user-defined function (UDF) that detects anomalies in time series by calling the Univariate Anomaly Detection API, part of Azure Cognitive Services. The function accepts a limited set of time series as numerical dynamic arrays and the required anomaly detection sensitivity level. Each time series is converted into the required JSON format and posts it to the Anomaly Detector service endpoint. The service response contains dynamic arrays of high/low/all anomalies, the modeled baseline time series, its normal high/low boundaries (a value above or below the high/low boundary is an anomaly) and the detected seasonality.
Prerequisites
- An Azure subscription. Create a free Azure account.
- A cluster and database Create a cluster and database or a KQL database with editing permissions and data.
- The Python plugin must be enabled on the cluster. This is required for the inline Python used in the function.
- Enable the http_request plugin / http_request_post plugin on the cluster to access the anomaly detection service endpoint.
- Modify the callout policy for type
webapi
to access the anomaly detection service endpoint.
In the following function example, replace YOUR-AD-RESOURCE-NAME
in the uri and YOUR-KEY
in the Ocp-Apim-Subscription-Key
of the header with your Anomaly Detector resource name and key.
Syntax
T | invoke series_uv_anomalies_fl(
y_series [,
sensitivity [,
tsid]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
y_series | string | ✔️ | The name of the input table column containing the values of the series to be anomaly detected. |
sensitivity | integer | An integer in the range [0-100] specifying the anomaly detection sensitivity. 0 is the least sensitive detection, while 100 is the most sensitive indicating even a small deviation from the expected baseline would be tagged as anomaly. Default value: 85 | |
tsid | string | The name of the input table column containing the time series ID. Can be omitted when analyzing a single time series. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let series_uv_anomalies_fl=(tbl:(*), y_series:string, sensitivity:int=85, tsid:string='_tsid')
{
let uri = 'https://YOUR-AD-RESOURCE-NAME.cognitiveservices.azure.com/anomalydetector/v1.0/timeseries/entire/detect';
let headers=dynamic({'Ocp-Apim-Subscription-Key': h'YOUR-KEY'});
let kwargs = bag_pack('y_series', y_series, 'sensitivity', sensitivity);
let code = ```if 1:
import json
y_series = kargs["y_series"]
sensitivity = kargs["sensitivity"]
json_str = []
for i in range(len(df)):
row = df.iloc[i, :]
ts = [{'value':row[y_series][j]} for j in range(len(row[y_series]))]
json_data = {'series': ts, "sensitivity":sensitivity} # auto-detect period, or we can force 'period': 84. We can also add 'maxAnomalyRatio':0.25 for maximum 25% anomalies
json_str = json_str + [json.dumps(json_data)]
result = df
result['json_str'] = json_str
```;
tbl
| evaluate python(typeof(*, json_str:string), code, kwargs)
| extend _tsid = column_ifexists(tsid, 1)
| partition by _tsid (
project json_str
| evaluate http_request_post(uri, headers, dynamic(null))
| project period=ResponseBody.period, baseline_ama=ResponseBody.expectedValues, ad_ama=series_add(0, ResponseBody.isAnomaly), pos_ad_ama=series_add(0, ResponseBody.isPositiveAnomaly)
, neg_ad_ama=series_add(0, ResponseBody.isNegativeAnomaly), upper_ama=series_add(ResponseBody.expectedValues, ResponseBody.upperMargins), lower_ama=series_subtract(ResponseBody.expectedValues, ResponseBody.lowerMargins)
| extend _tsid=toscalar(_tsid)
)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Time Series Anomaly Detection by Azure Cognitive Service")
series_uv_anomalies_fl(tbl:(*), y_series:string, sensitivity:int=85, tsid:string='_tsid')
{
let uri = 'https://YOUR-AD-RESOURCE-NAME.cognitiveservices.azure.com/anomalydetector/v1.0/timeseries/entire/detect';
let headers=dynamic({'Ocp-Apim-Subscription-Key': h'YOUR-KEY'});
let kwargs = bag_pack('y_series', y_series, 'sensitivity', sensitivity);
let code = ```if 1:
import json
y_series = kargs["y_series"]
sensitivity = kargs["sensitivity"]
json_str = []
for i in range(len(df)):
row = df.iloc[i, :]
ts = [{'value':row[y_series][j]} for j in range(len(row[y_series]))]
json_data = {'series': ts, "sensitivity":sensitivity} # auto-detect period, or we can force 'period': 84. We can also add 'maxAnomalyRatio':0.25 for maximum 25% anomalies
json_str = json_str + [json.dumps(json_data)]
result = df
result['json_str'] = json_str
```;
tbl
| evaluate python(typeof(*, json_str:string), code, kwargs)
| extend _tsid = column_ifexists(tsid, 1)
| partition by _tsid (
project json_str
| evaluate http_request_post(uri, headers, dynamic(null))
| project period=ResponseBody.period, baseline_ama=ResponseBody.expectedValues, ad_ama=series_add(0, ResponseBody.isAnomaly), pos_ad_ama=series_add(0, ResponseBody.isPositiveAnomaly)
, neg_ad_ama=series_add(0, ResponseBody.isNegativeAnomaly), upper_ama=series_add(ResponseBody.expectedValues, ResponseBody.upperMargins), lower_ama=series_subtract(ResponseBody.expectedValues, ResponseBody.lowerMargins)
| extend _tsid=toscalar(_tsid)
)
}
Examples
The following examples use the invoke operator to run the function.
Use series_uv_anomalies_fl()
to detect anomalies
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_uv_anomalies_fl=(tbl:(*), y_series:string, sensitivity:int=85, tsid:string='_tsid')
{
let uri = 'https://YOUR-AD-RESOURCE-NAME.cognitiveservices.azure.com/anomalydetector/v1.0/timeseries/entire/detect';
let headers=dynamic({'Ocp-Apim-Subscription-Key': h'YOUR-KEY'});
let kwargs = bag_pack('y_series', y_series, 'sensitivity', sensitivity);
let code = ```if 1:
import json
y_series = kargs["y_series"]
sensitivity = kargs["sensitivity"]
json_str = []
for i in range(len(df)):
row = df.iloc[i, :]
ts = [{'value':row[y_series][j]} for j in range(len(row[y_series]))]
json_data = {'series': ts, "sensitivity":sensitivity} # auto-detect period, or we can force 'period': 84. We can also add 'maxAnomalyRatio':0.25 for maximum 25% anomalies
json_str = json_str + [json.dumps(json_data)]
result = df
result['json_str'] = json_str
```;
tbl
| evaluate python(typeof(*, json_str:string), code, kwargs)
| extend _tsid = column_ifexists(tsid, 1)
| partition by _tsid (
project json_str
| evaluate http_request_post(uri, headers, dynamic(null))
| project period=ResponseBody.period, baseline_ama=ResponseBody.expectedValues, ad_ama=series_add(0, ResponseBody.isAnomaly), pos_ad_ama=series_add(0, ResponseBody.isPositiveAnomaly)
, neg_ad_ama=series_add(0, ResponseBody.isNegativeAnomaly), upper_ama=series_add(ResponseBody.expectedValues, ResponseBody.upperMargins), lower_ama=series_subtract(ResponseBody.expectedValues, ResponseBody.lowerMargins)
| extend _tsid=toscalar(_tsid)
)
};
let etime=datetime(2017-03-02);
let stime=datetime(2017-01-01);
let dt=1h;
let ts = requests
| make-series value=avg(value) on timestamp from stime to etime step dt
| extend _tsid='TS1';
ts
| invoke series_uv_anomalies_fl('value')
| lookup ts on _tsid
| render anomalychart with(xcolumn=timestamp, ycolumns=value, anomalycolumns=ad_ama)
Stored
let etime=datetime(2017-03-02);
let stime=datetime(2017-01-01);
let dt=1h;
let ts = requests
| make-series value=avg(value) on timestamp from stime to etime step dt
| extend _tsid='TS1';
ts
| invoke series_uv_anomalies_fl('value')
| lookup ts on _tsid
| render anomalychart with(xcolumn=timestamp, ycolumns=value, anomalycolumns=ad_ama)
Output
Compare series_uv_anomalies_fl()
and native series_decompose_anomalies()
The following example compares the Univariate Anomaly Detection API to the native series_decompose_anomalies()
function over three time series and assumes the series_uv_anomalies_fl()
function is already defined in the database:
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_uv_anomalies_fl=(tbl:(*), y_series:string, sensitivity:int=85, tsid:string='_tsid')
{
let uri = 'https://YOUR-AD-RESOURCE-NAME.cognitiveservices.azure.com/anomalydetector/v1.0/timeseries/entire/detect';
let headers=dynamic({'Ocp-Apim-Subscription-Key': h'YOUR-KEY'});
let kwargs = bag_pack('y_series', y_series, 'sensitivity', sensitivity);
let code = ```if 1:
import json
y_series = kargs["y_series"]
sensitivity = kargs["sensitivity"]
json_str = []
for i in range(len(df)):
row = df.iloc[i, :]
ts = [{'value':row[y_series][j]} for j in range(len(row[y_series]))]
json_data = {'series': ts, "sensitivity":sensitivity} # auto-detect period, or we can force 'period': 84. We can also add 'maxAnomalyRatio':0.25 for maximum 25% anomalies
json_str = json_str + [json.dumps(json_data)]
result = df
result['json_str'] = json_str
```;
tbl
| evaluate python(typeof(*, json_str:string), code, kwargs)
| extend _tsid = column_ifexists(tsid, 1)
| partition by _tsid (
project json_str
| evaluate http_request_post(uri, headers, dynamic(null))
| project period=ResponseBody.period, baseline_ama=ResponseBody.expectedValues, ad_ama=series_add(0, ResponseBody.isAnomaly), pos_ad_ama=series_add(0, ResponseBody.isPositiveAnomaly)
, neg_ad_ama=series_add(0, ResponseBody.isNegativeAnomaly), upper_ama=series_add(ResponseBody.expectedValues, ResponseBody.upperMargins), lower_ama=series_subtract(ResponseBody.expectedValues, ResponseBody.lowerMargins)
| extend _tsid=toscalar(_tsid)
)
};
let ts = demo_make_series2
| summarize TimeStamp=make_list(TimeStamp), num=make_list(num) by sid;
ts
| invoke series_uv_anomalies_fl('num', 'sid', 90)
| join ts on $left._tsid == $right.sid
| project-away _tsid
| extend (ad_adx, score_adx, baseline_adx)=series_decompose_anomalies(num, 1.5, -1, 'linefit')
| project-reorder num, *
| render anomalychart with(series=sid, xcolumn=TimeStamp, ycolumns=num, baseline_adx, baseline_ama, lower_ama, upper_ama, anomalycolumns=ad_adx, ad_ama)
Stored
let ts = demo_make_series2
| summarize TimeStamp=make_list(TimeStamp), num=make_list(num) by sid;
ts
| invoke series_uv_anomalies_fl('num', 'sid', 90)
| join ts on $left._tsid == $right.sid
| project-away _tsid
| extend (ad_adx, score_adx, baseline_adx)=series_decompose_anomalies(num, 1.5, -1, 'linefit')
| project-reorder num, *
| render anomalychart with(series=sid, xcolumn=TimeStamp, ycolumns=num, baseline_adx, baseline_ama, lower_ama, upper_ama, anomalycolumns=ad_adx, ad_ama)
Output
The following graph shows anomalies detected by the Univariate Anomaly Detection API on TS1. You can also select TS2 or TS3 in the chart filter box.
The following graph shows the anomalies detected by native function on TS1.
5.54 - series_uv_change_points_fl()
The function series_uv_change_points_fl()
is a user-defined function (UDF) that finds change points in time series by calling the Univariate Anomaly Detection API, part of Azure Cognitive Services. The function accepts a limited set of time series as numerical dynamic arrays, the change point detection threshold, and the minimum size of the stable trend window. Each time series is converted into the required JSON format and posts it to the Anomaly Detector service endpoint. The service response contains dynamic arrays of change points, their respective confidence, and the detected seasonality.
Prerequisites
- An Azure subscription. Create a free Azure account.
- A cluster and database Create a cluster and database or a KQL database with editing permissions and data.
- The Python plugin must be enabled on the cluster. This is required for the inline Python used in the function.
- Enable the http_request plugin / http_request_post plugin on the cluster to access the anomaly detection service endpoint.
- Modify the callout policy for type
webapi
to access the anomaly detection service endpoint.
Syntax
T | invoke series_uv_change_points_fl(
y_series [,
score_threshold [,
trend_window [,
tsid]]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
y_series | string | ✔️ | The name of the input table column containing the values of the series to be anomaly detected. |
score_threshold | real | A value specifying the minimum confidence to declare a change point. Each point whose confidence is above the threshold is defined as a change point. Default value: 0.9 | |
trend_window | integer | A value specifying the minimal window size for robust calculation of trend changes. Default value: 5 | |
tsid | string | The name of the input table column containing the time series ID. Can be omitted when analyzing a single time series. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required. In the following function definition, replace YOUR-AD-RESOURCE-NAME
in the uri and YOUR-KEY
in the Ocp-Apim-Subscription-Key
of the header with your Anomaly Detector resource name and key.
let series_uv_change_points_fl=(tbl:(*), y_series:string, score_threshold:real=0.9, trend_window:int=5, tsid:string='_tsid')
{
let uri = 'https://YOUR-AD-RESOURCE-NAME.cognitiveservices.azure.com/anomalydetector/v1.0/timeseries/changepoint/detect';
let headers=dynamic({'Ocp-Apim-Subscription-Key': h'YOUR-KEY'});
let kwargs = bag_pack('y_series', y_series, 'score_threshold', score_threshold, 'trend_window', trend_window);
let code = ```if 1:
import json
y_series = kargs["y_series"]
score_threshold = kargs["score_threshold"]
trend_window = kargs["trend_window"]
json_str = []
for i in range(len(df)):
row = df.iloc[i, :]
ts = [{'value':row[y_series][j]} for j in range(len(row[y_series]))]
json_data = {'series': ts, "threshold":score_threshold, "stableTrendWindow": trend_window} # auto-detect period, or we can force 'period': 84
json_str = json_str + [json.dumps(json_data)]
result = df
result['json_str'] = json_str
```;
tbl
| evaluate python(typeof(*, json_str:string), code, kwargs)
| extend _tsid = column_ifexists(tsid, 1)
| partition by _tsid (
project json_str
| evaluate http_request_post(uri, headers, dynamic(null))
| project period=ResponseBody.period, change_point=series_add(0, ResponseBody.isChangePoint), confidence=ResponseBody.confidenceScores
| extend _tsid=toscalar(_tsid)
)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required. In the following function definition, replace YOUR-AD-RESOURCE-NAME
in the uri and YOUR-KEY
in the Ocp-Apim-Subscription-Key
of the header with your Anomaly Detector resource name and key.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Time Series Change Points Detection by Azure Cognitive Service")
series_uv_change_points_fl(tbl:(*), y_series:string, score_threshold:real=0.9, trend_window:int=5, tsid:string='_tsid')
{
let uri = 'https://YOUR-AD-RESOURCE-NAME.cognitiveservices.azure.com/anomalydetector/v1.0/timeseries/changepoint/detect';
let headers=dynamic({'Ocp-Apim-Subscription-Key': h'YOUR-KEY'});
let kwargs = bag_pack('y_series', y_series, 'score_threshold', score_threshold, 'trend_window', trend_window);
let code = ```if 1:
import json
y_series = kargs["y_series"]
score_threshold = kargs["score_threshold"]
trend_window = kargs["trend_window"]
json_str = []
for i in range(len(df)):
row = df.iloc[i, :]
ts = [{'value':row[y_series][j]} for j in range(len(row[y_series]))]
json_data = {'series': ts, "threshold":score_threshold, "stableTrendWindow": trend_window} # auto-detect period, or we can force 'period': 84
json_str = json_str + [json.dumps(json_data)]
result = df
result['json_str'] = json_str
```;
tbl
| evaluate python(typeof(*, json_str:string), code, kwargs)
| extend _tsid = column_ifexists(tsid, 1)
| partition by _tsid (
project json_str
| evaluate http_request_post(uri, headers, dynamic(null))
| project period=ResponseBody.period, change_point=series_add(0, ResponseBody.isChangePoint), confidence=ResponseBody.confidenceScores
| extend _tsid=toscalar(_tsid)
)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let series_uv_change_points_fl=(tbl:(*), y_series:string, score_threshold:real=0.9, trend_window:int=5, tsid:string='_tsid')
{
let uri = 'https://YOUR-AD-RESOURCE-NAME.cognitiveservices.azure.com/anomalydetector/v1.0/timeseries/changepoint/detect';
let headers=dynamic({'Ocp-Apim-Subscription-Key': h'YOUR-KEY'});
let kwargs = bag_pack('y_series', y_series, 'score_threshold', score_threshold, 'trend_window', trend_window);
let code = ```if 1:
import json
y_series = kargs["y_series"]
score_threshold = kargs["score_threshold"]
trend_window = kargs["trend_window"]
json_str = []
for i in range(len(df)):
row = df.iloc[i, :]
ts = [{'value':row[y_series][j]} for j in range(len(row[y_series]))]
json_data = {'series': ts, "threshold":score_threshold, "stableTrendWindow": trend_window} # auto-detect period, or we can force 'period': 84
json_str = json_str + [json.dumps(json_data)]
result = df
result['json_str'] = json_str
```;
tbl
| evaluate python(typeof(*, json_str:string), code, kwargs)
| extend _tsid = column_ifexists(tsid, 1)
| partition by _tsid (
project json_str
| evaluate http_request_post(uri, headers, dynamic(null))
| project period=ResponseBody.period, change_point=series_add(0, ResponseBody.isChangePoint), confidence=ResponseBody.confidenceScores
| extend _tsid=toscalar(_tsid)
)
};
let ts = range x from 1 to 300 step 1
| extend y=iff(x between (100 .. 110) or x between (200 .. 220), 20, 5)
| extend ts=datetime(2021-01-01)+x*1d
| extend y=y+4*rand()
| summarize ts=make_list(ts), y=make_list(y)
| extend sid=1;
ts
| invoke series_uv_change_points_fl('y', 0.8, 10, 'sid')
| join ts on $left._tsid == $right.sid
| project-away _tsid
| project-reorder y, * // just to visualize the anomalies on top of y series
| render anomalychart with(xcolumn=ts, ycolumns=y, confidence, anomalycolumns=change_point)
Stored
let ts = range x from 1 to 300 step 1
| extend y=iff(x between (100 .. 110) or x between (200 .. 220), 20, 5)
| extend ts=datetime(2021-01-01)+x*1d
| extend y=y+4*rand()
| summarize ts=make_list(ts), y=make_list(y)
| extend sid=1;
ts
| invoke series_uv_change_points_fl('y', 0.8, 10, 'sid')
| join ts on $left._tsid == $right.sid
| project-away _tsid
| project-reorder y, * // just to visualize the anomalies on top of y series
| render anomalychart with(xcolumn=ts, ycolumns=y, confidence, anomalycolumns=change_point)
Output
The following graph shows change points on a time series.
5.55 - time_weighted_avg_fl()
The function time_weighted_avg_fl()
is a user-defined function (UDF) that calculates the time weighted average of a metric in a given time window, over input time bins. This function is similar to summarize operator. The function aggregates the metric by time bins, but instead of calculating simple avg() of the metric value in each bin, it weights each value by its duration. The duration is defined from the timestamp of the current value to the timestamp of the next value.
There are two options to calculate time weighted average. This function fills forward the value from the current sample until the next one. Alternatively time_weighted_avg2_fl() linearly interpolates the metric value between consecutive samples.
Syntax
T | invoke time_weighted_avg_fl(
t_col,
y_col,
key_col,
stime,
etime,
dt)
Parameters
Name | Type | Required | Description |
---|---|---|---|
t_col | string | ✔️ | The name of the column containing the time stamp of the records. |
y_col | string | ✔️ | The name of the column containing the metric value of the records. |
key_col | string | ✔️ | The name of the column containing the partition key of the records. |
stime | datetime | ✔️ | The start time of the aggregation window. |
etime | datetime | ✔️ | The end time of the aggregation window. |
dt | timespan | ✔️ | The aggregation time bin. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let time_weighted_avg_fl=(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
let _etime = etime + dt;
let gridTimes = range _ts from stime to _etime step dt | extend _val=real(null), dummy=1;
let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
gridTimes
| join kind=fullouter keys on dummy
| project-away dummy, dummy1
| union tbl_ex
| where _ts between (stime.._etime)
| partition hint.strategy=native by _key (
order by _ts asc, _val nulls last
| scan declare(f_value:real=0.0) with (step s: true => f_value = iff(isnull(_val), s.f_value, _val);) // fill forward null values
| extend diff_t=(next(_ts)-_ts)/1m
)
| where isnotnull(diff_t)
| summarize tw_sum=sum(f_value*diff_t), t_sum =sum(diff_t) by bin_at(_ts, dt, stime), _key
| where t_sum > 0 and _ts <= etime
| extend tw_avg = tw_sum/t_sum
| project-away tw_sum, t_sum
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Time weighted average of a metric using fill forward interpolation")
time_weighted_avg_fl(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
let _etime = etime + dt;
let gridTimes = range _ts from stime to _etime step dt | extend _val=real(null), dummy=1;
let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
gridTimes
| join kind=fullouter keys on dummy
| project-away dummy, dummy1
| union tbl_ex
| where _ts between (stime.._etime)
| partition hint.strategy=native by _key (
order by _ts asc, _val nulls last
| scan declare(f_value:real=0.0) with (step s: true => f_value = iff(isnull(_val), s.f_value, _val);) // fill forward null values
| extend diff_t=(next(_ts)-_ts)/1m
)
| where isnotnull(diff_t)
| summarize tw_sum=sum(f_value*diff_t), t_sum =sum(diff_t) by bin_at(_ts, dt, stime), _key
| where t_sum > 0 and _ts <= etime
| extend tw_avg = tw_sum/t_sum
| project-away tw_sum, t_sum
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let time_weighted_avg_fl=(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
let _etime = etime + dt;
let gridTimes = range _ts from stime to _etime step dt | extend _val=real(null), dummy=1;
let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
gridTimes
| join kind=fullouter keys on dummy
| project-away dummy, dummy1
| union tbl_ex
| where _ts between (stime.._etime)
| partition hint.strategy=native by _key (
order by _ts asc, _val nulls last
| scan declare(f_value:real=0.0) with (step s: true => f_value = iff(isnull(_val), s.f_value, _val);) // fill forward null values
| extend diff_t=(next(_ts)-_ts)/1m
)
| where isnotnull(diff_t)
| summarize tw_sum=sum(f_value*diff_t), t_sum =sum(diff_t) by bin_at(_ts, dt, stime), _key
| where t_sum > 0 and _ts <= etime
| extend tw_avg = tw_sum/t_sum
| project-away tw_sum, t_sum
};
let tbl = datatable(ts:datetime, val:real, key:string) [
datetime(2021-04-26 00:00), 100, 'Device1',
datetime(2021-04-26 00:45), 300, 'Device1',
datetime(2021-04-26 01:15), 200, 'Device1',
datetime(2021-04-26 00:00), 600, 'Device2',
datetime(2021-04-26 00:30), 400, 'Device2',
datetime(2021-04-26 01:30), 500, 'Device2',
datetime(2021-04-26 01:45), 300, 'Device2'
];
let minmax=materialize(tbl | summarize mint=min(ts), maxt=max(ts));
let stime=toscalar(minmax | project mint);
let etime=toscalar(minmax | project maxt);
let dt = 1h;
tbl
| invoke time_weighted_avg_fl('ts', 'val', 'key', stime, etime, dt)
| project-rename val = tw_avg
| order by _key asc, _ts asc
Stored
let tbl = datatable(ts:datetime, val:real, key:string) [
datetime(2021-04-26 00:00), 100, 'Device1',
datetime(2021-04-26 00:45), 300, 'Device1',
datetime(2021-04-26 01:15), 200, 'Device1',
datetime(2021-04-26 00:00), 600, 'Device2',
datetime(2021-04-26 00:30), 400, 'Device2',
datetime(2021-04-26 01:30), 500, 'Device2',
datetime(2021-04-26 01:45), 300, 'Device2'
];
let minmax=materialize(tbl | summarize mint=min(ts), maxt=max(ts));
let stime=toscalar(minmax | project mint);
let etime=toscalar(minmax | project maxt);
let dt = 1h;
tbl
| invoke time_weighted_avg_fl('ts', 'val', 'key', stime, etime, dt)
| project-rename val = tw_avg
| order by _key asc, _ts asc
Output
_ts | _key | val |
---|---|---|
2021-04-26 00:00:00.0000000 | Device1 | 150 |
2021-04-26 01:00:00.0000000 | Device1 | 225 |
2021-04-26 00:00:00.0000000 | Device2 | 500 |
2021-04-26 01:00:00.0000000 | Device2 | 400 |
The first value of Device1 is (45m*100 + 15m*300)/60m = 150, the second value is (15m*300 + 45m*200)/60m = 225.
The first value of Device2 is (30m*600 + 30m*400)/60m = 500, the second value is (30m*400 + 15m*500 + 15m*300)/60m = 400.
5.56 - time_weighted_avg2_fl()
The function time_weighted_avg2_fl()
is a user-defined function (UDF) that calculates the time weighted average of a metric in a given time window, over input time bins. This function is similar to summarize operator. The function aggregates the metric by time bins, but instead of calculating simple avg() of the metric value in each bin, it weights each value by its duration. The duration is defined from the timestamp of the current value to the timestamp of the next value.
There are two options to calculate time weighted average. This function linearly interpolates the metric value between consecutive samples. Alternatively time_weighted_avg_fl() fills forward the value from the current sample until the next one.
Syntax
T | invoke time_weighted_avg2_fl(
t_col,
y_col,
key_col,
stime,
etime,
dt)
Parameters
Name | Type | Required | Description |
---|---|---|---|
t_col | string | ✔️ | The name of the column containing the time stamp of the records. |
y_col | string | ✔️ | The name of the column containing the metric value of the records. |
key_col | string | ✔️ | The name of the column containing the partition key of the records. |
stime | datetime | ✔️ | The start time of the aggregation window. |
etime | datetime | ✔️ | The end time of the aggregation window. |
dt | timespan | ✔️ | The aggregation time bin. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let time_weighted_avg2_fl=(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
let _etime = etime + dt;
let gridTimes = range _ts from stime to _etime step dt | extend _val=real(null), dummy=1;
let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
gridTimes
| join kind=fullouter keys on dummy
| project-away dummy, dummy1
| union tbl_ex
| where _ts between (stime.._etime)
| partition hint.strategy=native by _key (
order by _ts desc, _val nulls last
| scan declare(val1:real=0.0, t1:datetime) with ( // fill backward null values
step s: true => val1=iff(isnull(_val), s.val1, _val), t1=iff(isnull(_val), s.t1, _ts);)
| extend dt1=(t1-_ts)/1m
| order by _ts asc, _val nulls last
| scan declare(val0:real=0.0, t0:datetime) with ( // fill forward null values
step s: true => val0=iff(isnull(_val), s.val0, _val), t0=iff(isnull(_val), s.t0, _ts);)
| extend dt0=(_ts-t0)/1m
| extend _twa_val=iff(dt0+dt1 == 0, _val, ((val0*dt1)+(val1*dt0))/(dt0+dt1))
| scan with ( // fill forward null twa values
step s: true => _twa_val=iff(isnull(_twa_val), s._twa_val, _twa_val);)
| extend diff_t=(next(_ts)-_ts)/1m
)
| where isnotnull(diff_t)
| order by _key asc, _ts asc
| extend next_twa_val=iff(_key == next(_key), next(_twa_val), _twa_val)
| summarize tw_sum=sum((_twa_val+next_twa_val)*diff_t/2.0), t_sum =sum(diff_t) by bin_at(_ts, dt, stime), _key
| where t_sum > 0 and _ts <= etime
| extend tw_avg = tw_sum/t_sum
| project-away tw_sum, t_sum
| order by _key asc, _ts asc
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Time weighted average of a metric using linear interpolation")
time_weighted_avg2_fl(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
let _etime = etime + dt;
let gridTimes = range _ts from stime to _etime step dt | extend _val=real(null), dummy=1;
let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
gridTimes
| join kind=fullouter keys on dummy
| project-away dummy, dummy1
| union tbl_ex
| where _ts between (stime.._etime)
| partition hint.strategy=native by _key (
order by _ts desc, _val nulls last
| scan declare(val1:real=0.0, t1:datetime) with ( // fill backward null values
step s: true => val1=iff(isnull(_val), s.val1, _val), t1=iff(isnull(_val), s.t1, _ts);)
| extend dt1=(t1-_ts)/1m
| order by _ts asc, _val nulls last
| scan declare(val0:real=0.0, t0:datetime) with ( // fill forward null values
step s: true => val0=iff(isnull(_val), s.val0, _val), t0=iff(isnull(_val), s.t0, _ts);)
| extend dt0=(_ts-t0)/1m
| extend _twa_val=iff(dt0+dt1 == 0, _val, ((val0*dt1)+(val1*dt0))/(dt0+dt1))
| scan with ( // fill forward null twa values
step s: true => _twa_val=iff(isnull(_twa_val), s._twa_val, _twa_val);)
| extend diff_t=(next(_ts)-_ts)/1m
)
| where isnotnull(diff_t)
| order by _key asc, _ts asc
| extend next_twa_val=iff(_key == next(_key), next(_twa_val), _twa_val)
| summarize tw_sum=sum((_twa_val+next_twa_val)*diff_t/2.0), t_sum =sum(diff_t) by bin_at(_ts, dt, stime), _key
| where t_sum > 0 and _ts <= etime
| extend tw_avg = tw_sum/t_sum
| project-away tw_sum, t_sum
| order by _key asc, _ts asc
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let time_weighted_avg2_fl=(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
let _etime = etime + dt;
let gridTimes = range _ts from stime to _etime step dt | extend _val=real(null), dummy=1;
let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
gridTimes
| join kind=fullouter keys on dummy
| project-away dummy, dummy1
| union tbl_ex
| where _ts between (stime.._etime)
| partition hint.strategy=native by _key (
order by _ts desc, _val nulls last
| scan declare(val1:real=0.0, t1:datetime) with ( // fill backward null values
step s: true => val1=iff(isnull(_val), s.val1, _val), t1=iff(isnull(_val), s.t1, _ts);)
| extend dt1=(t1-_ts)/1m
| order by _ts asc, _val nulls last
| scan declare(val0:real=0.0, t0:datetime) with ( // fill forward null values
step s: true => val0=iff(isnull(_val), s.val0, _val), t0=iff(isnull(_val), s.t0, _ts);)
| extend dt0=(_ts-t0)/1m
| extend _twa_val=iff(dt0+dt1 == 0, _val, ((val0*dt1)+(val1*dt0))/(dt0+dt1))
| scan with ( // fill forward null twa values
step s: true => _twa_val=iff(isnull(_twa_val), s._twa_val, _twa_val);)
| extend diff_t=(next(_ts)-_ts)/1m
)
| where isnotnull(diff_t)
| order by _key asc, _ts asc
| extend next_twa_val=iff(_key == next(_key), next(_twa_val), _twa_val)
| summarize tw_sum=sum((_twa_val+next_twa_val)*diff_t/2.0), t_sum =sum(diff_t) by bin_at(_ts, dt, stime), _key
| where t_sum > 0 and _ts <= etime
| extend tw_avg = tw_sum/t_sum
| project-away tw_sum, t_sum
| order by _key asc, _ts asc
};
let tbl = datatable(ts:datetime, val:real, key:string) [
datetime(2021-04-26 00:00), 100, 'Device1',
datetime(2021-04-26 00:45), 300, 'Device1',
datetime(2021-04-26 01:15), 200, 'Device1',
datetime(2021-04-26 00:00), 600, 'Device2',
datetime(2021-04-26 00:30), 400, 'Device2',
datetime(2021-04-26 01:30), 500, 'Device2',
datetime(2021-04-26 01:45), 300, 'Device2'
];
let minmax=materialize(tbl | summarize mint=min(ts), maxt=max(ts));
let stime=toscalar(minmax | project mint);
let etime=toscalar(minmax | project maxt);
let dt = 1h;
tbl
| invoke time_weighted_avg2_fl('ts', 'val', 'key', stime, etime, dt)
| project-rename val = tw_avg
| order by _key asc, _ts asc
Stored
let tbl = datatable(ts:datetime, val:real, key:string) [
datetime(2021-04-26 00:00), 100, 'Device1',
datetime(2021-04-26 00:45), 300, 'Device1',
datetime(2021-04-26 01:15), 200, 'Device1',
datetime(2021-04-26 00:00), 600, 'Device2',
datetime(2021-04-26 00:30), 400, 'Device2',
datetime(2021-04-26 01:30), 500, 'Device2',
datetime(2021-04-26 01:45), 300, 'Device2'
];
let minmax=materialize(tbl | summarize mint=min(ts), maxt=max(ts));
let stime=toscalar(minmax | project mint);
let etime=toscalar(minmax | project maxt);
let dt = 1h;
tbl
| invoke time_weighted_avg2_fl('ts', 'val', 'key', stime, etime, dt)
| project-rename val = tw_avg
| order by _key asc, _ts asc
Output
_ts | _key | val |
---|---|---|
2021-04-26 00:00:00.0000000 | Device1 | 218.75 |
2021-04-26 01:00:00.0000000 | Device1 | 206.25 |
2021-04-26 00:00:00.0000000 | Device2 | 462.5 |
2021-04-26 01:00:00.0000000 | Device2 | 412.5 |
The first value of Device1 is (45m*(100+300)/2 + 15m*(300+250)/2)/60m = 218.75, the second value is (15m*(250+200)/2 + 45m*200)/60m = 206.25.
The first value of Device2 is (30m*(600+400)/2 + 30m*(400+450)/2)/60m = 462.5, the second value is (30m*(450+500)/2 + 15m*(500+300)/2 + 15m*300)/60m = 412.5.
5.57 - time_weighted_val_fl()
The function time_weighted_val_fl()
is a user-defined function (UDF) that linearly interpolates metric value by time weighted average of the values of its previous point and its next point.
Syntax
T | invoke time_weighted_avg_fl(
t_col,
y_col,
key_col,
stime,
etime,
dt)
Parameters
Name | Type | Required | Description |
---|---|---|---|
t_col | string | ✔️ | The name of the column containing the time stamp of the records. |
y_col | string | ✔️ | The name of the column containing the metric value of the records. |
key_col | string | ✔️ | The name of the column containing the partition key of the records. |
stime | datetime | ✔️ | The start time of the aggregation window. |
etime | datetime | ✔️ | The end time of the aggregation window. |
dt | timespan | ✔️ | The aggregation time bin. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let time_weighted_val_fl=(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
let gridTimes = range _ts from stime to etime step dt | extend _val=real(null), grid=1, dummy=1;
let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
gridTimes
| join kind=fullouter keys on dummy
| project-away dummy, dummy1
| union (tbl_ex | extend grid=0)
| where _ts between (stime..etime)
| partition hint.strategy=native by _key (
order by _ts desc, _val nulls last
| scan declare(val1:real=0.0, t1:datetime) with ( // fill backward null values
step s: true => val1=iff(isnull(_val), s.val1, _val), t1=iff(isnull(_val), s.t1, _ts);)
| extend dt1=(t1-_ts)/1m
| order by _ts asc, _val nulls last
| scan declare(val0:real=0.0, t0:datetime) with ( // fill forward null values
step s: true => val0=iff(isnull(_val), s.val0, _val), t0=iff(isnull(_val), s.t0, _ts);)
| extend dt0=(_ts-t0)/1m
| extend _twa_val=iff(dt0+dt1 == 0, _val, ((val0*dt1)+(val1*dt0))/(dt0+dt1))
| scan with ( // fill forward null twa values
step s: true => _twa_val=iff(isnull(_twa_val), s._twa_val, _twa_val);)
| where grid == 0 or (grid == 1 and _ts != prev(_ts))
)
| project _ts, _key, _twa_val, orig_val=iff(grid == 1, 0, 1)
| order by _key asc, _ts asc
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Linear interpolation of metric value by time weighted average")
time_weighted_val_fl(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
let gridTimes = range _ts from stime to etime step dt | extend _val=real(null), grid=1, dummy=1;
let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
gridTimes
| join kind=fullouter keys on dummy
| project-away dummy, dummy1
| union (tbl_ex | extend grid=0)
| where _ts between (stime..etime)
| partition hint.strategy=native by _key (
order by _ts desc, _val nulls last
| scan declare(val1:real=0.0, t1:datetime) with ( // fill backward null values
step s: true => val1=iff(isnull(_val), s.val1, _val), t1=iff(isnull(_val), s.t1, _ts);)
| extend dt1=(t1-_ts)/1m
| order by _ts asc, _val nulls last
| scan declare(val0:real=0.0, t0:datetime) with ( // fill forward null values
step s: true => val0=iff(isnull(_val), s.val0, _val), t0=iff(isnull(_val), s.t0, _ts);)
| extend dt0=(_ts-t0)/1m
| extend _twa_val=iff(dt0+dt1 == 0, _val, ((val0*dt1)+(val1*dt0))/(dt0+dt1))
| scan with ( // fill forward null twa values
step s: true => _twa_val=iff(isnull(_twa_val), s._twa_val, _twa_val);)
| where grid == 0 or (grid == 1 and _ts != prev(_ts))
)
| project _ts, _key, _twa_val, orig_val=iff(grid == 1, 0, 1)
| order by _key asc, _ts asc
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let time_weighted_val_fl=(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
let gridTimes = range _ts from stime to etime step dt | extend _val=real(null), grid=1, dummy=1;
let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
gridTimes
| join kind=fullouter keys on dummy
| project-away dummy, dummy1
| union (tbl_ex | extend grid=0)
| where _ts between (stime..etime)
| partition hint.strategy=native by _key (
order by _ts desc, _val nulls last
| scan declare(val1:real=0.0, t1:datetime) with ( // fill backward null values
step s: true => val1=iff(isnull(_val), s.val1, _val), t1=iff(isnull(_val), s.t1, _ts);)
| extend dt1=(t1-_ts)/1m
| order by _ts asc, _val nulls last
| scan declare(val0:real=0.0, t0:datetime) with ( // fill forward null values
step s: true => val0=iff(isnull(_val), s.val0, _val), t0=iff(isnull(_val), s.t0, _ts);)
| extend dt0=(_ts-t0)/1m
| extend _twa_val=iff(dt0+dt1 == 0, _val, ((val0*dt1)+(val1*dt0))/(dt0+dt1))
| scan with ( // fill forward null twa values
step s: true => _twa_val=iff(isnull(_twa_val), s._twa_val, _twa_val);)
| where grid == 0 or (grid == 1 and _ts != prev(_ts))
)
| project _ts, _key, _twa_val, orig_val=iff(grid == 1, 0, 1)
| order by _key asc, _ts asc
};
let tbl = datatable(ts:datetime, val:real, key:string) [
datetime(2021-04-26 00:00), 100, 'Device1',
datetime(2021-04-26 00:45), 300, 'Device1',
datetime(2021-04-26 01:15), 200, 'Device1',
datetime(2021-04-26 00:00), 600, 'Device2',
datetime(2021-04-26 00:30), 400, 'Device2',
datetime(2021-04-26 01:30), 500, 'Device2',
datetime(2021-04-26 01:45), 300, 'Device2'
];
let minmax=materialize(tbl | summarize mint=min(ts), maxt=max(ts));
let stime=toscalar(minmax | project mint);
let etime=toscalar(minmax | project maxt);
let dt = 1h;
tbl
| invoke time_weighted_val_fl('ts', 'val', 'key', stime, etime, dt)
| project-rename val = _twa_val
| order by _key asc, _ts asc
Stored
let tbl = datatable(ts:datetime, val:real, key:string) [
datetime(2021-04-26 00:00), 100, 'Device1',
datetime(2021-04-26 00:45), 300, 'Device1',
datetime(2021-04-26 01:15), 200, 'Device1',
datetime(2021-04-26 00:00), 600, 'Device2',
datetime(2021-04-26 00:30), 400, 'Device2',
datetime(2021-04-26 01:30), 500, 'Device2',
datetime(2021-04-26 01:45), 300, 'Device2'
];
let minmax=materialize(tbl | summarize mint=min(ts), maxt=max(ts));
let stime=toscalar(minmax | project mint);
let etime=toscalar(minmax | project maxt);
let dt = 1h;
tbl
| invoke time_weighted_val_fl('ts', 'val', 'key', stime, etime, dt)
| project-rename val = _twa_val
| order by _key asc, _ts asc
Output
_ts | _key | val | orig_val |
---|---|---|---|
2021-04-26 00:00:00.0000000 | Device1 | 100 | 1 |
2021-04-26 00:45:00.0000000 | Device1 | 300 | 1 |
2021-04-26 01:00:00.0000000 | Device1 | 250 | 0 |
2021-04-26 01:15:00.0000000 | Device1 | 200 | 1 |
2021-04-26 00:00:00.0000000 | Device2 | 600 | 1 |
2021-04-26 00:30:00.0000000 | Device2 | 400 | 1 |
2021-04-26 01:00:00.0000000 | Device2 | 450 | 0 |
2021-04-26 01:30:00.0000000 | Device2 | 500 | 1 |
2021-04-26 01:45:00.0000000 | Device2 | 300 | 1 |
5.58 - time_window_rolling_avg_fl()
The function time_window_rolling_avg_fl()
is a user-defined function (UDF) that calculates the rolling average of the required value over a constant duration time window.
Calculating rolling average over a constant time window for regular time series (that is, having constant intervals) can be achieved using series_fir(), as the constant time window can be converted to a fixed width filter of equal coefficients. However, calculating it for irregular time series is more complex, as the actual number of samples in the window varies. Still it can be achieved using the powerful scan operator.
This type of rolling window calculation is required for use cases where the metric values are emitted only when changed (and not in constant intervals). For example in IoT, where edge devices send metrics to the cloud only upon changes, optimizing communication bandwidth.
Syntax
T | invoke time_window_rolling_avg_fl(
t_col,
y_col,
key_col,
dt [,
direction ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
t_col | string | ✔️ | The name of the column containing the time stamp of the records. |
y_col | string | ✔️ | The name of the column containing the metric value of the records. |
key_col | string | ✔️ | The name of the column containing the partition key of the records. |
dt | timespan | ✔️ | The duration of the rolling window. |
direction | int | The aggregation direction. The possible values are +1 or -1. A rolling window is set from current time forward/backward respectively. Default is -1, as backward rolling window is the only possible method for streaming scenarios. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let time_window_rolling_avg_fl=(tbl:(*), t_col:string, y_col:string, key_col:string, dt:timespan, direction:int=int(-1))
{
let tbl_ex = tbl | extend timestamp = column_ifexists(t_col, datetime(null)), value = column_ifexists(y_col, 0.0), key = column_ifexists(key_col, '');
tbl_ex
| partition hint.strategy=shuffle by key
(
extend timestamp=pack_array(timestamp, timestamp - direction*dt), delta = pack_array(-direction, direction)
| mv-expand timestamp to typeof(datetime), delta to typeof(long)
| sort by timestamp asc, delta desc
| scan declare (cum_sum:double=0.0, cum_count:long=0) with
(
step s: true => cum_count = s.cum_count + delta,
cum_sum = s.cum_sum + delta * value;
)
| extend avg_value = iff(direction == 1, prev(cum_sum)/prev(cum_count), cum_sum/cum_count)
| where delta == -direction
| project timestamp, value, avg_value, key
)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Series", docstring = "Time based rolling average of a metric")
time_window_rolling_avg_fl(tbl:(*), t_col:string, y_col:string, key_col:string, dt:timespan, direction:int=int(-1))
{
let tbl_ex = tbl | extend timestamp = column_ifexists(t_col, datetime(null)), value = column_ifexists(y_col, 0.0), key = column_ifexists(key_col, '');
tbl_ex
| partition hint.strategy=shuffle by key
(
extend timestamp=pack_array(timestamp, timestamp - direction*dt), delta = pack_array(-direction, direction)
| mv-expand timestamp to typeof(datetime), delta to typeof(long)
| sort by timestamp asc, delta desc
| scan declare (cum_sum:double=0.0, cum_count:long=0) with
(
step s: true => cum_count = s.cum_count + delta,
cum_sum = s.cum_sum + delta * value;
)
| extend avg_value = iff(direction == 1, prev(cum_sum)/prev(cum_count), cum_sum/cum_count)
| where delta == -direction
| project timestamp, value, avg_value, key
)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let time_window_rolling_avg_fl=(tbl:(*), t_col:string, y_col:string, key_col:string, dt:timespan, direction:int=int(-1))
{
let tbl_ex = tbl | extend timestamp = column_ifexists(t_col, datetime(null)), value = column_ifexists(y_col, 0.0), key = column_ifexists(key_col, '');
tbl_ex
| partition hint.strategy=shuffle by key
(
extend timestamp=pack_array(timestamp, timestamp - direction*dt), delta = pack_array(-direction, direction)
| mv-expand timestamp to typeof(datetime), delta to typeof(long)
| sort by timestamp asc, delta desc
| scan declare (cum_sum:double=0.0, cum_count:long=0) with
(
step s: true => cum_count = s.cum_count + delta,
cum_sum = s.cum_sum + delta * value;
)
| extend avg_value = iff(direction == 1, prev(cum_sum)/prev(cum_count), cum_sum/cum_count)
| where delta == -direction
| project timestamp, value, avg_value, key
)
};
let tbl = datatable(ts:datetime, val:real, key:string) [
datetime(8:00), 1, 'Device1',
datetime(8:01), 2, 'Device1',
datetime(8:05), 3, 'Device1',
datetime(8:05), 10, 'Device2',
datetime(8:09), 20, 'Device2',
datetime(8:40), 4, 'Device1',
datetime(9:00), 5, 'Device1',
datetime(9:01), 6, 'Device1',
datetime(9:05), 30, 'Device2',
datetime(9:50), 7, 'Device1'
];
tbl
| invoke time_window_rolling_avg_fl('ts', 'val', 'key', 10m)
Stored
let tbl = datatable(ts:datetime, val:real, key:string) [
datetime(8:00), 1, 'Device1',
datetime(8:01), 2, 'Device1',
datetime(8:05), 3, 'Device1',
datetime(8:05), 10, 'Device2',
datetime(8:09), 20, 'Device2',
datetime(8:40), 4, 'Device1',
datetime(9:00), 5, 'Device1',
datetime(9:01), 6, 'Device1',
datetime(9:05), 30, 'Device2',
datetime(9:50), 7, 'Device1'
];
tbl
| invoke time_window_rolling_avg_fl('ts', 'val', 'key', 10m)
Output
timestamp | value | avg_value | key |
---|---|---|---|
2021-11-29 08:05:00.0000000 | 10 | 10 | Device2 |
2021-11-29 08:09:00.0000000 | 20 | 15 | Device2 |
2021-11-29 09:05:00.0000000 | 30 | 30 | Device2 |
2021-11-29 08:00:00.0000000 | 1 | 1 | Device1 |
2021-11-29 08:01:00.0000000 | 2 | 1.5 | Device1 |
2021-11-29 08:05:00.0000000 | 3 | 2 | Device1 |
2021-11-29 08:40:00.0000000 | 4 | 4 | Device1 |
2021-11-29 09:00:00.0000000 | 5 | 5 | Device1 |
2021-11-29 09:01:00.0000000 | 6 | 5.5 | Device1 |
2021-11-29 09:50:00.0000000 | 7 | 7 | Device1 |
The first value (10) at 8:05 contains only a single value, which fell in the 10-minute backward window, the second value (15) is the average of two samples at 8:09 and at 8:05, etc.
5.59 - two_sample_t_test_fl()
The function two_sample_t_test_fl()
is a user-defined function (UDF) that performs the Two-Sample T-Test.
Syntax
T | invoke two_sample_t_test_fl(
data1,
data2,
test_statistic,
p_value,
equal_var)
Parameters
Name | Type | Required | Description |
---|---|---|---|
data1 | string | ✔️ | The name of the column containing the first set of data to be used for the test. |
data2 | string | ✔️ | The name of the column containing the second set of data to be used for the test. |
test_statistic | string | ✔️ | The name of the column to store test statistic value for the results. |
p_value | string | ✔️ | The name of the column to store p-value for the results. |
equal_var | bool | If true (default), performs a standard independent 2 sample test that assumes equal population variances. If false , performs Welch’s t-test, which does not assume equal population variance. As mentioned above, consider using the native welch_test(). |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let two_sample_t_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string, equal_var:bool=true)
{
let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value, 'equal_var', equal_var);
let code = ```if 1:
from scipy import stats
import pandas
data1 = kargs["data1"]
data2 = kargs["data2"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
equal_var = kargs["equal_var"]
def func(row):
statistics = stats.ttest_ind(row[data1], row[data2], equal_var=equal_var)
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Stats", docstring = "Two-Sample t-Test")
two_sample_t_test_fl(tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string, equal_var:bool=true)
{
let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value, 'equal_var', equal_var);
let code = ```if 1:
from scipy import stats
import pandas
data1 = kargs["data1"]
data2 = kargs["data2"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
equal_var = kargs["equal_var"]
def func(row):
statistics = stats.ttest_ind(row[data1], row[data2], equal_var=equal_var)
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let two_sample_t_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string, equal_var:bool=true)
{
let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value, 'equal_var', equal_var);
let code = ```if 1:
from scipy import stats
import pandas
data1 = kargs["data1"]
data2 = kargs["data2"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
equal_var = kargs["equal_var"]
def func(row):
statistics = stats.ttest_ind(row[data1], row[data2], equal_var=equal_var)
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke two_sample_t_test_fl('sample1', 'sample2', 'test_stat', 'p_val')
Stored
datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke two_sample_t_test_fl('sample1', 'sample2', 'test_stat', 'p_val')
Output
ID | sample1 | sample2 | test_stat | p_val |
---|---|---|---|---|
Test #1 | [23.64, 20.57, 20.42] | [27.1, 22.12, 33.56] | -1.7415675457565645 | 0.15655096653487446 |
Test #2 | [20.85, 21.89, 23.41] | [35.09, 30.02, 26.52], -3.2711673491022579 | 0.030755331219276136 | |
Test #3 | [20.13, 20.5, 21.7, 22.02] | [32.2, 32.79, 33.9, 34.22] | -18.5515946201742 | 1.5823717131966134E-06 |
5.60 - User-defined functions
User-defined functions are reusable subqueries that can be defined as part of the query itself (query-defined functions), or stored as part of the database metadata (stored functions). User-defined functions are invoked through a name, are provided with zero or more input arguments (which can be scalar or tabular), and produce a single value (which can be scalar or tabular) based on the function body.
A user-defined function belongs to one of two categories:
- Scalar functions
- Tabular functions
The function’s input arguments and output determine whether it’s scalar or tabular, which then establishes how it might be used.
To optimize multiple uses of the user-defined functions within a single query, see Optimize queries that use named expressions.
We’ve created an assortment of user-defined functions that you can use in your queries. For more information, see Functions library.
Scalar function
- Has zero input arguments, or all its input arguments are scalar values
- Produces a single scalar value
- Can be used wherever a scalar expression is allowed
- May only use the row context in which it’s defined
- Can only refer to tables (and views) that are in the accessible schema
Tabular function
- Accepts one or more tabular input arguments, and zero or more scalar input arguments, and/or:
- Produces a single tabular value
Function names
Valid user-defined function names must follow the same identifier naming rules as other entities.
The name must also be unique in its scope of definition.
Input arguments
Valid user-defined functions follow these rules:
- A user-defined function has a strongly typed list of zero or more input arguments.
- An input argument has a name, a type, and (for scalar arguments) a default value.
- The name of an input argument is an identifier.
- The type of an input argument is either one of the scalar data types, or a tabular schema.
Syntactically, the input arguments list is a comma-separated list of argument definitions, wrapped in parenthesis. Each argument definition is specified as
ArgName:ArgType [= ArgDefaultValue]
For tabular arguments, ArgType has the same syntax as the table definition (parenthesis and a list of column name/type pairs), with the addition of a solitary (*)
indicating “any tabular schema”.
For example:
Syntax | Input arguments list description |
---|---|
() | No arguments |
(s:string) | Single scalar argument called s taking a value of type string |
(a:long, b:bool=true) | Two scalar arguments, the second of which has a default value |
(T1:(*), T2:(r:real), b:bool) | Three arguments (two tabular arguments and one scalar argument) |
Examples
Scalar function
let Add7 = (arg0:long = 5) { arg0 + 7 };
range x from 1 to 10 step 1
| extend x_plus_7 = Add7(x), five_plus_seven = Add7()
Tabular function with no arguments
let tenNumbers = () { range x from 1 to 10 step 1};
tenNumbers
| extend x_plus_7 = x + 7
Tabular function with arguments
let MyFilter = (T:(x:long), v:long) {
T | where x >= v
};
MyFilter((range x from 1 to 10 step 1), 9)
Output
x |
---|
9 |
10 |
A tabular function that uses a tabular input with no column specified. Any table can be passed to a function, and no table columns can be referenced inside the function.
let MyDistinct = (T:(*)) {
T | distinct *
};
MyDistinct((range x from 1 to 3 step 1))
Output
x |
---|
1 |
2 |
3 |
Declaring user-defined functions
The declaration of a user-defined function provides:
- Function name
- Function schema (parameters it accepts, if any)
- Function body
let f=(s:string, i:long) {
tolong(s) * i
};
The function body includes:
- Exactly one expression, which provides the function’s return value (scalar or tabular value).
- Any number (zero or more) of let statements, whose scope is that of the function body. If specified, the let statements must precede the expression defining the function’s return value.
- Any number (zero or more) of query parameters statements, which declare query parameters used by the function. If specified, they must precede the expression defining the function’s return value.
Examples of user-defined functions
The following section shows examples of how to use user-defined functions.
User-defined function that uses a let statement
The following example shows a user-defined function (lambda) that accepts a parameter named ID. The function is bound to the name Test and makes use of three let statements, in which the Test3 definition uses the ID parameter. When run, the output from the query is 70:
let Test = (id: int) {
let Test2 = 10;
let Test3 = 10 + Test2 + id;
let Test4 = (arg: int) {
let Test5 = 20;
Test2 + Test3 + Test5 + arg
};
Test4(10)
};
range x from 1 to Test(10) step 1
| count
User-defined function that defines a default value for a parameter
The following example shows a function that accepts three arguments. The latter two have a default value and don’t have to be present at the call site.
let f = (a:long, b:string = "b.default", c:long = 0) {
strcat(a, "-", b, "-", c)
};
print f(12, c=7) // Returns "12-b.default-7"
Invoking a user-defined function
The method to invoke a user-defined function depends on the arguments that the function expects to receive. The following sections cover how to invoke a UDF without arguments, invoke a UDF with scalar arguments, and invoke a UDF with tabular arguments.
Invoke a UDF without arguments
A user-defined function that takes no arguments and can be invoked either by its name, or by its name and an empty argument list in parentheses.
// Bind the identifier a to a user-defined function (lambda) that takes
// no arguments and returns a constant of type long:
let a=(){123};
// Invoke the function in two equivalent ways:
range x from 1 to 10 step 1
| extend y = x * a, z = x * a()
// Bind the identifier T to a user-defined function (lambda) that takes
// no arguments and returns a random two-by-two table:
let T=(){
range x from 1 to 2 step 1
| project x1 = rand(), x2 = rand()
};
// Invoke the function in two equivalent ways:
// (Note that the second invocation must be itself wrapped in
// an additional set of parentheses, as the union operator
// differentiates between "plain" names and expressions)
union T, (T())
Invoke a UDF with scalar arguments
A user-defined function that takes one or more scalar arguments can be invoked by using the function name and a concrete argument list in parentheses:
let f=(a:string, b:string) {
strcat(a, " (la la la)", b)
};
print f("hello", "world")
Invoke a UDF with tabular arguments
A user-defined function that takes one or more table arguments (with any number of scalar arguments) and can be invoked using the function name and a concrete argument list in parentheses:
let MyFilter = (T:(x:long), v:long) {
T | where x >= v
};
MyFilter((range x from 1 to 10 step 1), 9)
You can also use the operator invoke
to invoke a user-defined function that
takes one or more table arguments and returns a table. This function is useful when the first concrete table argument to the function is the source of the invoke
operator:
let append_to_column_a=(T:(a:string), what:string) {
T | extend a=strcat(a, " ", what)
};
datatable (a:string) ["sad", "really", "sad"]
| invoke append_to_column_a(":-)")
Default values
Functions may provide default values to some of their parameters under the following conditions:
- Default values may be provided for scalar parameters only.
- Default values are always literals (constants). They can’t be arbitrary calculations.
- Parameters with no default value always precede parameters that do have a default value.
- Callers must provide the value of all parameters with no default values arranged in the same order as the function declaration.
- Callers don’t need to provide the value for parameters with default values, but may do so.
- Callers may provide arguments in an order that doesn’t match the order of the parameters. If so, they must name their arguments.
The following example returns a table with two identical records. In the first invocation of f
, the arguments are completely “scrambled”, so each one is explicitly given a name:
let f = (a:long, b:string = "b.default", c:long = 0) {
strcat(a, "-", b, "-", c)
};
union
(print x=f(c=7, a=12)), // "12-b.default-7"
(print x=f(12, c=7)) // "12-b.default-7"
Output
x |
---|
12-b.default-7 |
12-b.default-7 |
View functions
A user-defined function that takes no arguments and returns a tabular expression can be marked as a view. Marking a user-defined function as a view means that the function behaves like a table whenever a wildcard table name resolution is performed.
The following example shows two user-defined functions, T_view
and T_notview
, and shows how only the first one is resolved by the wildcard reference in the union
:
let T_view = view () { print x=1 };
let T_notview = () { print x=2 };
union T*
Restrictions
The following restrictions apply:
- User-defined functions can’t pass into toscalar() invocation information that depends on the row-context in which the function is called.
- User-defined functions that return a tabular expression can’t be invoked with an argument that varies with the row context.
- A function taking at least one tabular input can’t be invoked on a remote cluster.
- A scalar function can’t be invoked on a remote cluster.
The only place a user-defined function may be invoked with an argument that varies with the row context is when the user-defined function is composed of scalar functions only and doesn’t use toscalar()
.
Examples
Supported scalar function
The following query is supported because f
is a scalar function that doesn’t reference any tabular expression.
let Table1 = datatable(xdate:datetime)[datetime(1970-01-01)];
let Table2 = datatable(Column:long)[1235];
let f = (hours:long) { now() + hours*1h };
Table2 | where Column != 123 | project d = f(10)
The following query is supported because f
is a scalar function that references the tabular expression Table1
but is invoked with no reference to the current row context f(10)
:
let Table1 = datatable(xdate:datetime)[datetime(1970-01-01)];
let Table2 = datatable(Column:long)[1235];
let f = (hours:long) { toscalar(Table1 | summarize min(xdate) - hours*1h) };
Table2 | where Column != 123 | project d = f(10)
Unsupported scalar function
The following query isn’t supported because f
is a scalar function that references the tabular expression Table1
, and is invoked with a reference to the current row context f(Column)
:
let Table1 = datatable(xdate:datetime)[datetime(1970-01-01)];
let Table2 = datatable(Column:long)[1235];
let f = (hours:long) { toscalar(Table1 | summarize min(xdate) - hours*1h) };
Table2 | where Column != 123 | project d = f(Column)
Unsupported tabular function
The following query isn’t supported because f
is a tabular function that is invoked in a context that expects a scalar value.
let Table1 = datatable(xdate:datetime)[datetime(1970-01-01)];
let Table2 = datatable(Column:long)[1235];
let f = (hours:long) { range x from 1 to hours step 1 | summarize make_list(x) };
Table2 | where Column != 123 | project d = f(Column)
Features that are currently unsupported by user-defined functions
For completeness, here are some commonly requested features for user-defined functions that are currently not supported:
Function overloading: There’s currently no way to overload a function (a way to create multiple functions with the same name and different input schema).
Default values: The default value for a scalar parameter to a function must be a scalar literal (constant).
Related content
5.61 - wilcoxon_test_fl()
The function wilcoxon_test_fl()
is a user-defined function (UDF) that performs the Wilcoxon Test.
Syntax
T | invoke wilcoxon_test_fl()(
data,
test_statistic,
p_value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
data | string | ✔️ | The name of the column containing the data to be used for the test. |
test_statistic | string | ✔️ | The name of the column to store test statistic value for the results. |
p_value | string | ✔️ | The name of the column to store p-value for the results. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let wilcoxon_test_fl = (tbl:(*), data:string, test_statistic:string, p_value:string)
{
let kwargs = bag_pack('data', data, 'test_statistic', test_statistic, 'p_value', p_value);
let code = ```if 1:
from scipy import stats
data = kargs["data"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
def func(row):
statistics = stats.wilcoxon(row[data])
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function
. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Stats", docstring = "Wilcoxon Test")
wilcoxon_test_fl(tbl:(*), data:string, test_statistic:string, p_value:string)
{
let kwargs = bag_pack('data', data, 'test_statistic', test_statistic, 'p_value', p_value);
let code = ```if 1:
from scipy import stats
data = kargs["data"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
def func(row):
statistics = stats.wilcoxon(row[data])
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let wilcoxon_test_fl = (tbl:(*), data:string, test_statistic:string, p_value:string)
{
let kwargs = bag_pack('data', data, 'test_statistic', test_statistic, 'p_value', p_value);
let code = ```if 1:
from scipy import stats
data = kargs["data"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
def func(row):
statistics = stats.wilcoxon(row[data])
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
datatable(id:string, sample1:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]),
'Test #2', dynamic([20.85, 21.89, 23.41]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke wilcoxon_test_fl('sample1', 'test_stat', 'p_val') -->
Stored
datatable(id:string, sample1:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]),
'Test #2', dynamic([20.85, 21.89, 23.41]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke wilcoxon_test_fl('sample1', 'test_stat', 'p_val')
Output
ID | sample1 | test_stat | p_val |
---|---|---|---|
Test #1 | [23.64, 20.57, 20.42] | 0, 0.10880943004054568 | |
Test #2 | [20.85, 21.89, 23.41] | 0, 0.10880943004054568 | |
Test #3 | [20.13, 20.5, 21.7, 22.02] | 0, 0.06788915486182899 |
6 - Geospatial
6.1 - geo_angle()
Calculates clockwise angle in radians between two lines on Earth. The first line is [point1, point2] and the second line is [point2, point3].
Syntax
geo_angle(
p1_longitude,
p1_latitude,
p2_longitude,
p2_latitude,
p3_longitude,
p3_latitude)
Parameters
Name | Type | Required | Description |
---|---|---|---|
p1_longitude | real | ✔️ | The longitude value in degrees of the first geospatial coordinate. A valid value is in the range [-180, +180]. |
p1_latitude | real | ✔️ | The latitude value in degrees of the first geospatial coordinate. A valid value is in the range [-90, +90]. |
p2_longitude | real | ✔️ | The longitude value in degrees of the second geospatial coordinate. A valid value is in the range [-180, +180]. |
p2_latitude | real | ✔️ | The latitude value in degrees of the second geospatial coordinate. A valid value is in the range [-90, +90]. |
p3_longitude | real | ✔️ | The longitude value in degrees of the second geospatial coordinate. A valid value is in the range [-180, +180]. |
p3_latitude | real | ✔️ | The latitude value in degrees of the second geospatial coordinate. A valid value is in the range [-90, +90]. |
Returns
An angle in radians in range [0, 2pi) between two lines [p1, p2] and [p2, p3]. The angle is measured CW from the first line to the Second line.
Examples
The following example calculates the angle in radians.
print angle_in_radians = geo_angle(0, 10, 0,5, 3,-10)
Output
angle_in_radians |
---|
2.94493843406882 |
The following example calculates the angle in degrees.
let angle_in_radians = geo_angle(0, 10, 0,5, 3,-10);
print angle_in_degrees = degrees(angle_in_radians)
Output
angle_in_degrees |
---|
168.732543198009 |
The following example returns null because 1st point equals to 2nd point.
print is_null = isnull(geo_angle(0, 10, 0, 10, 3, -10))
Output
is_null |
---|
True |
6.2 - geo_azimuth()
Calculates clockwise angle in radians between the line from point1 to true north and a line from point1 to point2 on Earth.
Syntax
geo_azimuth(
p1_longitude,
p1_latitude,
p2_longitude,
p2_latitude)
Parameters
Name | Type | Required | Description |
---|---|---|---|
p1_longitude | real | ✔️ | The longitude value in degrees of the first geospatial coordinate. A valid value is in the range [-180, +180]. |
p1_latitude | real | ✔️ | The latitude value in degrees of the first geospatial coordinate. A valid value is in the range [-90, +90]. |
p2_longitude | real | ✔️ | The longitude value in degrees of the second geospatial coordinate. A valid value is in the range [-180, +180]. |
p2_latitude | real | ✔️ | The latitude value in degrees of the second geospatial coordinate. A valid value is in the range [-90, +90]. |
Returns
An angle in radians between the line from point p1 to true north and line [p1, p2]. The angle is measured clockwise.
Examples
The following example calculates azimuth in radians.
print azimuth_in_radians = geo_azimuth(5, 10, 10, -40)
Output
azimuth_in_radians |
---|
3.05459939796449 |
The following example calculates azimuth in degrees.
let azimuth_in_radians = geo_azimuth(5, 10, 10, -40);
print azimuth_in_degrees = degrees(azimuth_in_radians);
Output
azimuth_in_degrees |
---|
175.015653606568 |
The following example considers a truck that emits telemetry of its location while it travels and looks for its travel direction.
let get_direction = (azimuth:real)
{
let pi = pi();
iff(azimuth < pi/2, "North-East",
iff(azimuth < pi, "South-East",
iff(azimuth < 3*pi/2, "South-West",
"North-West")));
};
datatable(timestamp:datetime, lng:real, lat:real)
[
datetime(2024-01-01T00:01:53.048506Z), -115.4036607693417, 36.40551631046261,
datetime(2024-01-01T00:02:53.048506Z), -115.3256807623232, 36.34102142760111,
datetime(2024-01-01T00:03:53.048506Z), -115.2732290602112, 36.28458914829917,
datetime(2024-01-01T00:04:53.048506Z), -115.2513186233914, 36.27622394664352,
datetime(2024-01-01T00:05:53.048506Z), -115.2352055633212, 36.27545547038515,
datetime(2024-01-01T00:06:53.048506Z), -115.1894341934856, 36.28266934431671,
datetime(2024-01-01T00:07:53.048506Z), -115.1054318118468, 36.28957085435267,
datetime(2024-01-01T00:08:53.048506Z), -115.0648614339413, 36.28110743285072,
datetime(2024-01-01T00:09:53.048506Z), -114.9858032867736, 36.29780696509714,
datetime(2024-01-01T00:10:53.048506Z), -114.9016966527561, 36.36556196813566,
]
| sort by timestamp asc
| extend prev_lng = prev(lng), prev_lat = prev(lat)
| where isnotnull(prev_lng) and isnotnull(prev_lat)
| extend direction = get_direction(geo_azimuth(prev_lng, prev_lat, lng, lat))
| project direction, lng, lat
| render scatterchart with (kind = map)
Output
The following example returns true
because the first point equals the second point.
print is_null = isnull(geo_azimuth(5, 10, 5, 10))
Output
is_null |
---|
true |
6.3 - geo_distance_2points()
Calculates the shortest distance in meters between two geospatial coordinates on Earth.
Syntax
geo_distance_2points(
p1_longitude,
p1_latitude,
p2_longitude,
p2_latitude)
Parameters
Name | Type | Required | Description |
---|---|---|---|
p1_longitude | real | ✔️ | The longitude value in degrees of the first geospatial coordinate. A valid value is in the range [-180, +180]. |
p1_latitude | real | ✔️ | The latitude value in degrees of the first geospatial coordinate. A valid value is in the range [-90, +90]. |
p2_longitude | real | ✔️ | The longitude value in degrees of the second geospatial coordinate. A valid value is in the range [-180, +180]. |
p2_latitude | real | ✔️ | The latitude value in degrees of the second geospatial coordinate. A valid value is in the range [-90, +90]. |
Returns
The shortest distance, in meters, between two geographic locations on Earth. If the coordinates are invalid, the query produces a null result.
Examples
The following example finds the shortest distance between Seattle and Los Angeles.
print distance_in_meters = geo_distance_2points(-122.407628, 47.578557, -118.275287, 34.019056)
Output
distance_in_meters |
---|
1546754.35197381 |
The following example finds an approximation of the shortest path from Seattle to London. The line consists of coordinates along the LineString and within 500 meters from it.
range i from 1 to 1000000 step 1
| project lng = rand() * real(-122), lat = rand() * 90
| where lng between(real(-122) .. 0) and lat between(47 .. 90)
| where geo_distance_point_to_line(lng,lat,dynamic({"type":"LineString","coordinates":[[-122,47],[0,51]]})) < 500
| render scatterchart with (kind=map)
Output
The following example finds all rows in which the shortest distance between two coordinates is between one meter and 11 meters.
StormEvents
| extend distance_1_to_11m = geo_distance_2points(BeginLon, BeginLat, EndLon, EndLat)
| where distance_1_to_11m between (1 .. 11)
| project distance_1_to_11m
Output
distance_1_to_11m |
---|
10.5723100154958 |
7.92153588248414 |
The following example returns a null result because of the invalid coordinate input.
print distance = geo_distance_2points(300,1,1,1)
Output
distance |
---|
6.4 - geo_distance_point_to_line()
Calculates the shortest distance in meters between a coordinate and a line or multiline on Earth.
Syntax
geo_distance_point_to_line(
longitude,
latitude,
lineString)
Parameters
Name | Type | Required | Description |
---|---|---|---|
longitude | real | ✔️ | The geospatial coordinate longitude value in degrees. A valid value is in the range [-180, +180]. |
latitude | real | ✔️ | The geospatial coordinate latitude value in degrees. A valid value is in the range [-90, +90]. |
lineString | dynamic | ✔️ | A line or multiline in the GeoJSON format. |
Returns
The shortest distance, in meters, between a coordinate and a line or multiline on Earth. If the coordinate or lineString are invalid, the query produces a null result.
LineString definition and constraints
dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2],…, [lng_N,lat_N]]})
dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})
- LineString coordinates array must contain at least two entries.
- Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
- Edge length must be less than 180 degrees. The shortest edge between the two vertices is chosen.
Examples
Shortest distance to airport
The following example finds the shortest distance between North Las Vegas Airport and a nearby road.
print distance_in_meters = geo_distance_point_to_line(-115.199625, 36.210419, dynamic({ "type":"LineString","coordinates":[[-115.115385,36.229195],[-115.136995,36.200366],[-115.140252,36.192470],[-115.143558,36.188523],[-115.144076,36.181954],[-115.154662,36.174483],[-115.166431,36.176388],[-115.183289,36.175007],[-115.192612,36.176736],[-115.202485,36.173439],[-115.225355,36.174365]]}))
Output
distance_in_meters |
---|
3797.88887253334 |
Storm events across the south coast
The following example finds storm events along the US south coast filtered by a maximum distance of 5 km from the defined shore line.
let southCoast = dynamic({"type":"LineString","coordinates":[[-97.18505859374999,25.997549919572112],[-97.58056640625,26.96124577052697],[-97.119140625,27.955591004642553],[-94.04296874999999,29.726222319395504],[-92.98828125,29.82158272057499],[-89.18701171875,29.11377539511439],[-89.384765625,30.315987718557867],[-87.5830078125,30.221101852485987],[-86.484375,30.4297295750316],[-85.1220703125,29.6880527498568],[-84.00146484374999,30.14512718337613],[-82.6611328125,28.806173508854776],[-82.81494140625,28.033197847676377],[-82.177734375,26.52956523826758],[-80.9912109375,25.20494115356912]]});
StormEvents
| project BeginLon, BeginLat, EventType
| where geo_distance_point_to_line(BeginLon, BeginLat, southCoast) < 5000
| render scatterchart with (kind=map)
Output
New York taxi pickups
The following example finds New York taxi pickups filtered by a maximum distance of 0.1 meters from the defined multiline.
let MadisonAve = dynamic({"type":"MultiLineString","coordinates":[[[-73.9879823,40.7408625],[-73.9876492,40.7413345],[-73.9874982,40.7415046],[-73.9870343,40.7421446],[-73.9865812,40.7427655],[-73.9861292,40.7433756],[-73.9856813,40.7439956],[-73.9854932,40.7442606],[-73.9852232,40.7446216],[-73.9847903,40.7452305],[-73.9846232,40.7454536],[-73.9844803,40.7456606],[-73.9843413,40.7458585],[-73.9839533,40.7463955],[-73.9839002,40.7464696],[-73.9837683,40.7466566],[-73.9834342,40.7471015],[-73.9833833,40.7471746],[-73.9829712,40.7477686],[-73.9824752,40.7484255],[-73.9820262,40.7490436],[-73.9815623,40.7496566],[-73.9811212,40.7502796],[-73.9809762,40.7504976],[-73.9806982,40.7509255],[-73.9802752,40.7515216],[-73.9798033,40.7521795],[-73.9795863,40.7524656],[-73.9793082,40.7528316],[-73.9787872,40.7534725],[-73.9783433,40.7540976],[-73.9778912,40.7547256],[-73.9774213,40.7553365],[-73.9769402,40.7559816],[-73.9764622,40.7565766],[-73.9760073,40.7572036],[-73.9755592,40.7578366],[-73.9751013,40.7584665],[-73.9746532,40.7590866],[-73.9741902,40.7597326],[-73.9737632,40.7603566],[-73.9733032,40.7609866],[-73.9728472,40.7616205],[-73.9723422,40.7622826],[-73.9718672,40.7629556],[-73.9714042,40.7635726],[-73.9709362,40.7642185],[-73.9705282,40.7647636],[-73.9704903,40.7648196],[-73.9703342,40.7650355],[-73.9701562,40.7652826],[-73.9700322,40.7654535],[-73.9695742,40.7660886],[-73.9691232,40.7667166],[-73.9686672,40.7673375],[-73.9682142,40.7679605],[-73.9677482,40.7685786],[-73.9672883,40.7692076],[-73.9668412,40.7698296],[-73.9663882,40.7704605],[-73.9659222,40.7710936],[-73.9654262,40.7717756],[-73.9649292,40.7724595],[-73.9644662,40.7730955],[-73.9640012,40.7737285],[-73.9635382,40.7743615],[-73.9630692,40.7749936],[-73.9626122,40.7756275],[-73.9621172,40.7763106],[-73.9616111,40.7769896],[-73.9611552,40.7776245],[-73.9606891,40.7782625],[-73.9602212,40.7788866],[-73.9597532,40.7795236],[-73.9595842,40.7797445],[-73.9592942,40.7801635],[-73.9591122,40.7804105],[-73.9587982,40.7808305],[-73.9582992,40.7815116],[-73.9578452,40.7821455],[-73.9573802,40.7827706],[-73.9569262,40.7833965],[-73.9564802,40.7840315],[-73.9560102,40.7846486],[-73.9555601,40.7852755],[-73.9551221,40.7859005],[-73.9546752,40.7865426],[-73.9542571,40.7871505],[-73.9541771,40.7872335],[-73.9540892,40.7873366],[-73.9536971,40.7879115],[-73.9532792,40.7884706],[-73.9532142,40.7885205],[-73.9531522,40.7885826],[-73.9527382,40.7891785],[-73.9523081,40.7897545],[-73.9518332,40.7904115],[-73.9513721,40.7910435],[-73.9509082,40.7916695],[-73.9504602,40.7922995],[-73.9499882,40.7929195],[-73.9495051,40.7936045],[-73.9490071,40.7942835],[-73.9485542,40.7949065],[-73.9480832,40.7955345],[-73.9476372,40.7961425],[-73.9471772,40.7967915],[-73.9466841,40.7974475],[-73.9453432,40.7992905],[-73.9448332,40.7999835],[-73.9443442,40.8006565],[-73.9438862,40.8012945],[-73.9434262,40.8019196],[-73.9431412,40.8023325],[-73.9429842,40.8025585],[-73.9425691,40.8031855],[-73.9424401,40.8033609],[-73.9422987,40.8035533],[-73.9422013,40.8036857],[-73.9421022,40.8038205],[-73.9420024,40.8039552],[-73.9416372,40.8044485],[-73.9411562,40.8050725],[-73.9406471,40.8057176],[-73.9401481,40.8064135],[-73.9397022,40.8070255],[-73.9394081,40.8074155],[-73.9392351,40.8076495],[-73.9387842,40.8082715],[-73.9384681,40.8087086],[-73.9383211,40.8089025],[-73.9378792,40.8095215],[-73.9374011,40.8101795],[-73.936405,40.8115707],[-73.9362328,40.8118098]],[[-73.9362328,40.8118098],[-73.9362432,40.8118567],[-73.9361239,40.8120222],[-73.9360302,40.8120805]],[[-73.9362328,40.8118098],[-73.9361571,40.8118294],[-73.9360443,40.8119993],[-73.9360302,40.8120805]],[[-73.9360302,40.8120805],[-73.9359423,40.8121378],[-73.9358551,40.8122385],[-73.9352181,40.8130815],[-73.9348702,40.8135515],[-73.9347541,40.8137145],[-73.9346332,40.8138615],[-73.9345542,40.8139595],[-73.9344981,40.8139945],[-73.9344571,40.8140165],[-73.9343962,40.8140445],[-73.9343642,40.8140585],[-73.9343081,40.8140725],[-73.9341971,40.8140895],[-73.9341041,40.8141005],[-73.9340022,40.8140965],[-73.9338442,40.8141005],[-73.9333712,40.8140895],[-73.9325541,40.8140755],[-73.9324561,40.8140705],[-73.9324022,40.8140695]],[[-73.9360302,40.8120805],[-73.93605,40.8121667],[-73.9359632,40.8122805],[-73.9353631,40.8130795],[-73.9351482,40.8133625],[-73.9350072,40.8135415],[-73.9347441,40.8139168],[-73.9346611,40.8140125],[-73.9346101,40.8140515],[-73.9345401,40.8140965],[-73.9344381,40.8141385],[-73.9343451,40.8141555],[-73.9342991,40.8141675],[-73.9341552,40.8141985],[-73.9338601,40.8141885],[-73.9333991,40.8141815],[-73.9323981,40.8141665]]]});
nyc_taxi
| project pickup_longitude, pickup_latitude
| where geo_distance_point_to_line(pickup_longitude, pickup_latitude, MadisonAve) <= 0.1
| take 100
| render scatterchart with (kind=map)
Output
The following example folds many lines into one multiline and queries this multiline. The query finds all taxi pickups that happened 10 km away from all roads in Manhattan.
let ManhattanRoads =
datatable(features:dynamic)
[
dynamic({"type":"Feature","properties":{"Label":"145thStreetBrg"},"geometry":{"type":"MultiLineString","coordinates":[[[-73.9322259,40.8194635],[-73.9323259,40.8194743],[-73.9323973,40.8194779]]]}}),
dynamic({"type":"Feature","properties":{"Label":"W120thSt"},"geometry":{"type":"MultiLineString","coordinates":[[[-73.9619541,40.8104844],[-73.9621542,40.8105725],[-73.9630542,40.8109455],[-73.9635902,40.8111714],[-73.9639492,40.8113174],[-73.9640502,40.8113705]]]}}),
dynamic({"type":"Feature","properties":{"Label":"1stAve"},"geometry":{"type":"MultiLineString","coordinates":[[[-73.9704124,40.748033],[-73.9702043,40.7480906],[-73.9696892,40.7487346],[-73.9695012,40.7491976],[-73.9694522,40.7493196]],[[-73.9699932,40.7488636],[-73.9694522,40.7493196]],[[-73.9694522,40.7493196],[-73.9693113,40.7494946],[-73.9688832,40.7501056],[-73.9686562,40.7504196],[-73.9684231,40.7507476],[-73.9679832,40.7513586],[-73.9678702,40.7514986]],[[-73.9676833,40.7520426],[-73.9675462,40.7522286],[-73.9673532,40.7524976],[-73.9672892,40.7525906],[-73.9672122,40.7526806]]]}})
// ... more roads ...
];
let allRoads=toscalar(
ManhattanRoads
| project road_coordinates=features.geometry.coordinates
| summarize make_list(road_coordinates)
| project multiline = bag_pack("type","MultiLineString", "coordinates", list_road_coordinates));
nyc_taxi
| project pickup_longitude, pickup_latitude
| where pickup_longitude != 0 and pickup_latitude != 0
| where geo_distance_point_to_line(pickup_longitude, pickup_latitude, parse_json(allRoads)) > 10000
| take 10
| render scatterchart with (kind=map)
Output
Invalid LineString
The following example returns a null result because of the invalid LineString input.
print distance_in_meters = geo_distance_point_to_line(1,1, dynamic({ "type":"LineString"}))
Output
distance_in_meters |
---|
Invalid coordinate
The following example returns a null result because of the invalid coordinate input.
print distance_in_meters = geo_distance_point_to_line(300, 3, dynamic({ "type":"LineString","coordinates":[[1,1],[2,2]]}))
Output
distance_in_meters |
---|
6.5 - geo_distance_point_to_polygon()
Calculates the shortest distance between a coordinate and a polygon or a multipolygon on Earth.
Syntax
geo_distance_point_to_polygon(
longitude,
latitude,
polygon)
Parameters
Name | Type | Required | Description |
---|---|---|---|
longitude | real | ✔️ | Geospatial coordinate, longitude value in degrees. Valid value is a real number and in the range [-180, +180]. |
latitude | real | ✔️ | Geospatial coordinate, latitude value in degrees. Valid value is a real number and in the range [-90, +90]. |
polygon | dynamic | ✔️ | Polygon or multipolygon in the GeoJSON format. |
Returns
The shortest distance, in meters, between a coordinate and a polygon or a multipolygon on Earth. If polygon contains point, the distance will be 0. If the coordinates or polygons are invalid, the query will produce a null result.
Polygon definition and constraints
dynamic({“type”: “Polygon”,“coordinates”: [LinearRingShell, LinearRingHole_1, …, LinearRingHole_N]})
dynamic({“type”: “MultiPolygon”,“coordinates”: [[LinearRingShell, LinearRingHole_1,…, LinearRingHole_N],…, [LinearRingShell, LinearRingHole_1,…, LinearRingHole_M]]})
- LinearRingShell is required and defined as a
counterclockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell. - LinearRingHole is optional and defined as a
clockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes. - LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
- Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
- LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
- LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
- LinearRings must not cross and must not share edges. LinearRings may share vertices.
- Polygon doesn’t necessarily contain its vertices.
Examples
The following example calculates shortest distance in meters from some location in NYC to Central Park.
let central_park = dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]]});
print geo_distance_point_to_polygon(-73.9839, 40.7705, central_park)
Output
print_0 |
---|
259.940756070596 |
The following example enriches the data with distance.
let multipolygon = dynamic({"type":"MultiPolygon","coordinates":[[[[-73.991460000000131,40.731738000000206],[-73.992854491775518,40.730082566051351],[-73.996772,40.725432000000154],[-73.997634685522883,40.725786309886963],[-74.002855946639244,40.728346630056791],[-74.001413,40.731065000000207],[-73.996796995070824,40.73736378205173],[-73.991724524037934,40.735245208931886],[-73.990703782359589,40.734781896080477],[-73.991460000000131,40.731738000000206]]],[[[-73.958357552055688,40.800369095633819],[-73.98143901556422,40.768762584141953],[-73.981548752788598,40.7685590292784],[-73.981565335901905,40.768307084720796],[-73.981754418060945,40.768399727738668],[-73.982038573548124,40.768387823012056],[-73.982268248204349,40.768298621883247],[-73.982384797518051,40.768097213086911],[-73.982320919746599,40.767894461792181],[-73.982155532845766,40.767756204474757],[-73.98238873834039,40.767411004834273],[-73.993650353659021,40.772145571634361],[-73.99415893763998,40.772493009137818],[-73.993831082030937,40.772931787850908],[-73.993891252437052,40.772955194876722],[-73.993962585514595,40.772944653908901],[-73.99401262480508,40.772882846631894],[-73.994122058082397,40.77292405902601],[-73.994136652588594,40.772901870174394],[-73.994301342391154,40.772970028663913],[-73.994281535134448,40.77299380206933],[-73.994376552751078,40.77303955110149],[-73.994294029824005,40.773156243992048],[-73.995023275860802,40.773481196576356],[-73.99508939189289,40.773388475039134],[-73.995013963716758,40.773358035426909],[-73.995050284699261,40.773297153189958],[-73.996240651898916,40.773789791397689],[-73.996195837470992,40.773852356184044],[-73.996098807369748,40.773951805299085],[-73.996179459973888,40.773986954351571],[-73.996095245226442,40.774086186437756],[-73.995572265161172,40.773870731394297],[-73.994017424135961,40.77321375261053],[-73.993935876811335,40.773179512586211],[-73.993861942928888,40.773269531698837],[-73.993822393527211,40.773381758622882],[-73.993767019318497,40.773483981224835],[-73.993698463744295,40.773562141052594],[-73.993358326468751,40.773926888327956],[-73.992622663865575,40.774974056037109],[-73.992577842766124,40.774956016359418],[-73.992527743951555,40.775002110439829],[-73.992469745815342,40.775024159551755],[-73.992403837191887,40.775018140390664],[-73.99226708903538,40.775116033858794],[-73.99217809026365,40.775279293897171],[-73.992059084937338,40.775497598192516],[-73.992125372394938,40.775509075053385],[-73.992226867797001,40.775482211026116],[-73.992329346608813,40.775468900958522],[-73.992361756801131,40.775501899766638],[-73.992386042960277,40.775557180424634],[-73.992087684712729,40.775983970821372],[-73.990927174149746,40.777566878763238],[-73.99039616003671,40.777585065679204],[-73.989461267506471,40.778875124584417],[-73.989175778438053,40.779287524015778],[-73.988868617400072,40.779692922911607],[-73.988871874499793,40.779713738253008],[-73.989219022880576,40.779697895209402],[-73.98927785904425,40.779723439271038],[-73.989409054180143,40.779737706471963],[-73.989498614927044,40.779725044389757],[-73.989596493388234,40.779698146683387],[-73.989679812902509,40.779677568658038],[-73.989752702937935,40.779671244211556],[-73.989842247806507,40.779680752670664],[-73.990040102120489,40.779707677698219],[-73.990137977524839,40.779699769704784],[-73.99033584033225,40.779661794394983],[-73.990430598697046,40.779664973055503],[-73.990622199396725,40.779676064914298],[-73.990745069505479,40.779671328184051],[-73.990872114282197,40.779646007643876],[-73.990961672224358,40.779639683751753],[-73.991057472829539,40.779652352625774],[-73.991157429497036,40.779669775606465],[-73.991242817404469,40.779671367084504],[-73.991255318289745,40.779650782516491],[-73.991294887120119,40.779630209208889],[-73.991321967649895,40.779631796041372],[-73.991359455569423,40.779585883337383],[-73.991551059227476,40.779574821437407],[-73.99141982585985,40.779755280287233],[-73.988886144117032,40.779878898532999],[-73.988939656706265,40.779956178440393],[-73.988926103530844,40.780059292013632],[-73.988911680264692,40.780096037146606],[-73.988919261468567,40.780226094343945],[-73.988381050202634,40.780981074045783],[-73.988232413846987,40.781233144215555],[-73.988210420831663,40.781225482542055],[-73.988140000000143,40.781409000000224],[-73.988041288067166,40.781585961353777],[-73.98810029382463,40.781602878305286],[-73.988076449145055,40.781650935001608],[-73.988018059972219,40.781634188810422],[-73.987960792842145,40.781770987031535],[-73.985465811970457,40.785360700575431],[-73.986172704965611,40.786068452258647],[-73.986455862401996,40.785919219081421],[-73.987072345615601,40.785189638820121],[-73.98711901394276,40.785210319004058],[-73.986497781023601,40.785951202887254],[-73.986164628806279,40.786121882448327],[-73.986128422486075,40.786239001331111],[-73.986071135219746,40.786240706026611],[-73.986027274789123,40.786228964236727],[-73.986097637849426,40.78605822569795],[-73.985429321269592,40.785413942184597],[-73.985081137732209,40.785921935110366],[-73.985198833254501,40.785966552197777],[-73.985170502389906,40.78601333415817],[-73.985216218673656,40.786030501816427],[-73.98525509797993,40.785976205511588],[-73.98524273937646,40.785972572653328],[-73.98524962933017,40.785963139855845],[-73.985281779186749,40.785978620950075],[-73.985240032884533,40.786035858136792],[-73.985683885242182,40.786222123919686],[-73.985717529004575,40.786175994668795],[-73.985765660297687,40.786196274858618],[-73.985682871922691,40.786309786213067],[-73.985636270930442,40.786290150649279],[-73.985670722564691,40.786242911993817],[-73.98520511880038,40.786047669212785],[-73.985211035607492,40.786039554883686],[-73.985162639946992,40.786020999769754],[-73.985131636312062,40.786060297019972],[-73.985016964065125,40.78601423719563],[-73.984655078830457,40.786534741807841],[-73.985743787901043,40.786570082854738],[-73.98589227228328,40.786426529019593],[-73.985942854994988,40.786452847880334],[-73.985949561556794,40.78648711396653],[-73.985812373526713,40.786616865357047],[-73.985135209703174,40.78658761889551],[-73.984619428584324,40.786586016349787],[-73.981952458164173,40.790393724337193],[-73.972823037363767,40.803428052816756],[-73.971036786332192,40.805918478839672],[-73.966701,40.804169000000186],[-73.959647,40.801156000000113],[-73.958508540159471,40.800682279767472],[-73.95853274080838,40.800491362464697],[-73.958357552055688,40.800369095633819]]],[[[-73.943592454622546,40.782747908206574],[-73.943648235390199,40.782656161333449],[-73.943870759887162,40.781273026571704],[-73.94345932494096,40.780048275653243],[-73.943213862652243,40.779317588660199],[-73.943004239504688,40.779639495474292],[-73.942716005450905,40.779544169476175],[-73.942712374762181,40.779214856940001],[-73.942535563208608,40.779090956062532],[-73.942893408188027,40.778614093246276],[-73.942438481745029,40.777315235766039],[-73.942244919522594,40.777104088947254],[-73.942074188038887,40.776917846977142],[-73.942002667222781,40.776185317382648],[-73.942620205199006,40.775180871576474],[-73.94285645694552,40.774796600349191],[-73.94293043781397,40.774676268036011],[-73.945870899588215,40.771692257932997],[-73.946618690150586,40.77093339256956],[-73.948664164778933,40.768857624399587],[-73.950069793030679,40.767025088383498],[-73.954418260786071,40.762184104951245],[-73.95650786241211,40.760285256574043],[-73.958787773424007,40.758213471309809],[-73.973015157270069,40.764278692864671],[-73.955760332998182,40.787906554459667],[-73.944023,40.782960000000301],[-73.943592454622546,40.782747908206574]]]]});
let coordinates =
datatable(longitude:real, latitude:real, description:string)
[
real(-73.9741), 40.7914, 'Upper West Side',
real(-73.9950), 40.7340, 'Greenwich Village',
real(-73.8743), 40.7773, 'LaGuardia Airport',
];
coordinates
| extend distance = geo_distance_point_to_polygon(longitude, latitude, multipolygon)
Output
longitude | latitude | description | distance |
---|---|---|---|
-73.9741 | 40.7914 | Upper West Side | 0 |
-73.995 | 40.734 | Greenwich Village | 0 |
-73.8743 | 40.7773 | LaGuardia Airport | 5702.15731467514 |
The following example finds all states that are within 200-km distance, excluding state that contains the point.
US_States
| project name = features.properties.NAME, polygon = features.geometry
| project name, distance = ceiling(geo_distance_point_to_polygon(-111.905, 40.634, polygon) / 1000)
| where distance < 200 and distance > 0
Output
name | distance |
---|---|
Idaho | 152 |
Nevada | 181 |
Wyoming | 83 |
The following example will return a null result because of the invalid coordinate input.
print distance = geo_distance_point_to_polygon(500,1,dynamic({"type": "Polygon","coordinates": [[[0,0],[10,10],[10,1],[0,0]]]}))
Output
distance |
---|
The following example will return a null result because of the invalid polygon input.
print distance = geo_distance_point_to_polygon(1,1,dynamic({"type": "Polygon","coordinates": [[[0,0],[10,10],[10,10],[0,0]]]}))
Output
distance |
---|
6.6 - geo_geohash_neighbors()
Calculates Geohash neighbors.
Read more about geohash
.
Syntax
geo_geohash_neighbors(
geohash)
Parameters
Name | Type | Required | Description |
---|---|---|---|
geohash | string | ✔️ | A geohash value as it was calculated by geo_point_to_geohash(). The geohash string must be between 1 and 18 characters. |
Returns
An array of Geohash neighbors. If the Geohash is invalid, the query produces a null result.
Examples
The following example calculates Geohash neighbors.
print neighbors = geo_geohash_neighbors('sunny')
Output
neighbors |
---|
[“sunnt”,“sunpj”,“sunnx”,“sunpn”,“sunnv”,“sunpp”,“sunnz”,“sunnw”] |
The following example calculates an array of input Geohash with its neighbors.
let geohash = 'sunny';
print cells = array_concat(pack_array(geohash), geo_geohash_neighbors(geohash))
Output
cells |
---|
[“sunny”,“sunnt”,“sunpj”,“sunnx”,“sunpn”,“sunnv”,“sunpp”,“sunnz”,“sunnw”] |
The following example calculates Geohash polygons GeoJSON geometry collection.
let geohash = 'sunny';
print cells = array_concat(pack_array(geohash), geo_geohash_neighbors(geohash))
| mv-expand cells to typeof(string)
| project polygons = geo_geohash_to_polygon(cells)
| summarize arr = make_list(polygons)
| project geojson = bag_pack("type", "Feature","geometry", bag_pack("type", "GeometryCollection", "geometries", arr), "properties", bag_pack("name", "polygons"))
Output
geojson |
---|
{“type”: “Feature”,“geometry”: {“type”: “GeometryCollection”,“geometries”: [ {“type”:“Polygon”,“coordinates”:[[[42.451171875,23.6865234375],[42.4951171875,23.6865234375],[42.4951171875,23.73046875],[42.451171875,23.73046875],[42.451171875,23.6865234375]]]}, {“type”:“Polygon”,“coordinates”:[[[42.4072265625,23.642578125],[42.451171875,23.642578125],[42.451171875,23.6865234375],[42.4072265625,23.6865234375],[42.4072265625,23.642578125]]]}, {“type”:“Polygon”,“coordinates”:[[[42.4072265625,23.73046875],[42.451171875,23.73046875],[42.451171875,23.7744140625],[42.4072265625,23.7744140625],[42.4072265625,23.73046875]]]}, {“type”:“Polygon”,“coordinates”:[[[42.4951171875,23.642578125],[42.5390625,23.642578125],[42.5390625,23.6865234375],[42.4951171875,23.6865234375],[42.4951171875,23.642578125]]]}, {“type”:“Polygon”,“coordinates”:[[[42.451171875,23.73046875],[42.4951171875,23.73046875],[42.4951171875,23.7744140625],[42.451171875,23.7744140625],[42.451171875,23.73046875]]]}, {“type”:“Polygon”,“coordinates”:[[[42.4072265625,23.6865234375],[42.451171875,23.6865234375],[42.451171875,23.73046875],[42.4072265625,23.73046875],[42.4072265625,23.6865234375]]]}, {“type”:“Polygon”,“coordinates”:[[[42.4951171875,23.73046875],[42.5390625,23.73046875],[42.5390625,23.7744140625],[42.4951171875,23.7744140625],[42.4951171875,23.73046875]]]}, {“type”:“Polygon”,“coordinates”:[[[42.4951171875,23.6865234375],[42.5390625,23.6865234375],[42.5390625,23.73046875],[42.4951171875,23.73046875],[42.4951171875,23.6865234375]]]}, {“type”:“Polygon”,“coordinates”:[[[42.451171875,23.642578125],[42.4951171875,23.642578125],[42.4951171875,23.6865234375],[42.451171875,23.6865234375],[42.451171875,23.642578125]]]}]}, “properties”: {“name”: “polygons”}} |
The following example calculates polygon unions that represent Geohash and its neighbors.
let h3cell = 'sunny';
print cells = array_concat(pack_array(h3cell), geo_geohash_neighbors(h3cell))
| mv-expand cells to typeof(string)
| project polygons = geo_geohash_to_polygon(cells)
| summarize arr = make_list(polygons)
| project polygon = geo_union_polygons_array(arr)
Output
polygon |
---|
{“type”:“Polygon”,“coordinates”:[[[42.4072265625,23.642578125],[42.451171875,23.642578125],[42.4951171875,23.642578125],[42.5390625,23.642578125],[42.5390625,23.686523437500004],[42.5390625,23.730468750000004],[42.5390625,23.7744140625],[42.4951171875,23.7744140625],[42.451171875,23.7744140625],[42.407226562499993,23.7744140625],[42.4072265625,23.73046875],[42.4072265625,23.6865234375],[42.4072265625,23.642578125]]]} |
The following example returns true because of the invalid Geohash token input.
print invalid = isnull(geo_geohash_neighbors('a'))
Output
invalid |
---|
1 |
6.7 - geo_geohash_to_central_point()
Calculates the geospatial coordinates that represent the center of a geohash rectangular area.
Read more about geohash
.
Syntax
geo_geohash_to_central_point(
geohash)
Parameters
Name | Type | Required | Description |
---|---|---|---|
geohash | string | ✔️ | A geohash value as it was calculated by geo_point_to_geohash(). The geohash string must be between 1 and 18 characters. |
Returns
The geospatial coordinate values in GeoJSON Format and of a dynamic data type. If the geohash is invalid, the query will produce a null result.
Examples
print point = geo_geohash_to_central_point("sunny")
| extend coordinates = point.coordinates
| extend longitude = coordinates[0], latitude = coordinates[1]
Output
point | coordinates | longitude | latitude |
---|---|---|---|
{ “type”: “Point”, “coordinates”: [ 42.47314453125, 23.70849609375 ] } | [ 42.47314453125, 23.70849609375 ] | 42.47314453125 | 23.70849609375 |
The following example returns a null result because of the invalid geohash input.
print geohash = geo_geohash_to_central_point("a")
Output
geohash |
---|
Creating location deep-links for Bing Maps
You can use the geohash value to create a deep-link URL to Bing Maps by pointing to the geohash center point:
// Use string concatenation to create Bing Map deep-link URL from a geo-point
let point_to_map_url = (_point:dynamic, _title:string)
{
strcat('https://www.bing.com/maps?sp=point.', _point.coordinates[1] ,'_', _point.coordinates[0], '_', url_encode(_title))
};
// Convert geohash to center point, and then use 'point_to_map_url' to create Bing Map deep-link
let geohash_to_map_url = (_geohash:string, _title:string)
{
point_to_map_url(geo_geohash_to_central_point(_geohash), _title)
};
print geohash = 'sv8wzvy7'
| extend url = geohash_to_map_url(geohash, "You are here")
Output
geohash | url |
---|---|
sv8wzvy7 | https://www.bing.com/maps?sp=point.32.15620994567871_34.80245590209961_You+are+here |
6.8 - geo_geohash_to_polygon()
Calculates the polygon that represents the geohash rectangular area.
Read more about geohash.
Syntax
geo_geohash_to_polygon(
geohash)
Parameters
Name | Type | Required | Description |
---|---|---|---|
geohash | string | ✔️ | A geohash value as it was calculated by geo_point_to_geohash(). The geohash string must be between 1 and 18 characters. |
Returns
Polygon in GeoJSON Format and of a dynamic data type. If the geohash is invalid, the query will produce a null result.
Examples
print GeohashPolygon = geo_geohash_to_polygon("dr5ru");
Output
GeohashPolygon |
---|
{ “type”: “Polygon”, “coordinates”: [ [[-74.00390625, 40.7373046875], [-73.9599609375, 40.7373046875], [-73.9599609375, 40.78125], [-74.00390625, 40.78125], [-74.00390625, 40.7373046875]]] } |
The following example assembles GeoJSON geometry collection of geohash polygons.
// Geohash GeoJSON collection
datatable(lng:real, lat:real)
[
-73.975212, 40.789608,
-73.916869, 40.818314,
-73.989148, 40.743273,
]
| project geohash = geo_point_to_geohash(lng, lat, 5)
| project geohash_polygon = geo_geohash_to_polygon(geohash)
| summarize geohash_polygon_lst = make_list(geohash_polygon)
| project bag_pack(
"type", "Feature",
"geometry", bag_pack("type", "GeometryCollection", "geometries", geohash_polygon_lst),
"properties", bag_pack("name", "Geohash polygons collection"))
Output
Column1 |
---|
{ “type”: “Feature”, “geometry”: {“type”: “GeometryCollection”,“geometries”: [ {“type”: “Polygon”, “coordinates”: [[[-74.00390625, 40.78125], [-73.9599609375, 40.78125], [-73.9599609375, 40.8251953125],[ -74.00390625, 40.8251953125], [ -74.00390625, 40.78125]]]}, {“type”: “Polygon”, “coordinates”: [[[ -73.9599609375, 40.78125], [-73.916015625, 40.78125], [-73.916015625, 40.8251953125], [-73.9599609375, 40.8251953125], [-73.9599609375, 40.78125]]]}, {“type”: “Polygon”, “coordinates”: [[[-74.00390625, 40.7373046875], [-73.9599609375, 40.7373046875], [-73.9599609375, 40.78125], [-74.00390625, 40.78125], [-74.00390625, 40.7373046875]]]}] }, “properties”: {“name”: “Geohash polygons collection” }} |
The following example returns a null result because of the invalid geohash input.
print GeohashPolygon = geo_geohash_to_polygon("a");
Output
GeohashPolygon |
---|
6.9 - geo_h3cell_children()
Calculates the H3 cell children.
Read more about H3 Cell.
Syntax
geo_h3cell_children(
h3cell,
resolution)
Parameters
Name | Type | Required | Description |
---|---|---|---|
h3cell | string | ✔️ | An H3 Cell token value as it was calculated by geo_point_to_h3cell(). |
resolution | int | Defines the requested children cells resolution. Supported values are in the range [1, 15]. If unspecified, an immediate children token will be calculated. |
Returns
Array of H3 Cell children tokens. If the H3 Cell is invalid or child resolution is lower than given cell, the query will produce a null result.
Examples
print children = geo_h3cell_children('862a1072fffffff')
Output
children |
---|
[ “872a10728ffffff”, “872a10729ffffff”, “872a1072affffff”, “872a1072bffffff”, “872a1072cffffff”, “872a1072dffffff”, “872a1072effffff” ] |
The following example counts children 3 levels below a given cell.
let h3_cell = '862a1072fffffff';
print children_count = array_length(geo_h3cell_children(h3_cell, geo_h3cell_level(h3_cell) + 3))
Output
children_count |
---|
343 |
The following example assembles GeoJSON geometry collection of H3 Cell children polygons.
print children = geo_h3cell_children('862a1072fffffff')
| mv-expand children to typeof(string)
| project child = geo_h3cell_to_polygon(children)
| summarize h3_hash_polygon_lst = make_list(child)
| project geojson = bag_pack(
"type", "Feature",
"geometry", bag_pack("type", "GeometryCollection", "geometries", h3_hash_polygon_lst),
"properties", bag_pack("name", "H3 polygons collection"))
Output
geojson |
---|
{ “type”: “Feature”, “geometry”: { “type”: “GeometryCollection”, “geometries”: [ … … … ] }, “properties”: { “name”: “H3 polygons collection” }} |
The following example returns true because of the invalid cell.
print is_null = isnull(geo_h3cell_children('abc'))
Output
is_null |
---|
1 |
The following example returns true because the level difference between cell and its children is more than 5.
print is_null = isnull(geo_h3cell_children(geo_point_to_h3cell(1, 1, 9), 15))
Output
is_null |
---|
1 |
6.10 - geo_h3cell_level()
Calculates the H3 cell resolution.
Read more about H3 Cell.
Syntax
geo_h3cell_level(
h3cell)
Parameters
Name | Type | Required | Description |
---|---|---|---|
h3cell | string | ✔️ | An H3 Cell token value as it was calculated by geo_point_to_h3cell(). |
Returns
An integer that represents H3 Cell level. Valid level is in range [0, 15]. If the H3 Cell is invalid, the query will produce a null result.
Examples
print cell_res = geo_h3cell_level('862a1072fffffff')
Output
cell_res |
---|
6 |
print cell_res = geo_h3cell_level(geo_point_to_h3cell(1,1,10))
Output
cell_res |
---|
10 |
The following example returns true because of the invalid H3 Cell token input.
print invalid_res = isnull(geo_h3cell_level('abc'))
Output
invalid_res |
---|
1 |
6.11 - geo_h3cell_neighbors()
Calculates the H3 cell neighbors.
Read more about H3 Cell.
Syntax
geo_h3cell_neighbors(
h3cell)
Parameters
Name | Type | Required | Description |
---|---|---|---|
h3cell | string | ✔️ | An H3 Cell token value as it was calculated by geo_point_to_h3cell(). |
Returns
An array of H3 cell neighbors. If the H3 Cell is invalid, the query will produce a null result.
Examples
The following example calculates H3 cell neighbors.
print neighbors = geo_h3cell_neighbors('862a1072fffffff')
Output
neighbors |
---|
[“862a10727ffffff”,“862a10707ffffff”,“862a1070fffffff”,“862a10777ffffff”,“862a100dfffffff”,“862a100d7ffffff”] |
The following example calculates an array of input H3 cell with its neighbors.
let h3cell = '862a1072fffffff';
print cells = array_concat(pack_array(h3cell), geo_h3cell_neighbors(h3cell))
Output
cells |
---|
[“862a1072fffffff”,“862a10727ffffff”,“862a10707ffffff”,“862a1070fffffff”,“862a10777ffffff”,“862a100dfffffff”,“862a100d7ffffff”] |
The following example calculates H3 cells polygons GeoJSON geometry collection.
let h3cell = '862a1072fffffff';
print cells = array_concat(pack_array(h3cell), geo_h3cell_neighbors(h3cell))
| mv-expand cells to typeof(string)
| project polygons = geo_h3cell_to_polygon(cells)
| summarize arr = make_list(polygons)
| project geojson = bag_pack("type", "Feature","geometry", bag_pack("type", "GeometryCollection", "geometries", arr), "properties", bag_pack("name", "polygons"))
Output
geojson |
---|
{“type”: “Feature”,“geometry”: {“type”: “GeometryCollection”,“geometries”: [ {“type”:“Polygon”,“coordinates”:[[[-74.0022744646159,40.735376026215022],[-74.046908029686236,40.727986222489115],[-74.060610712223664,40.696775140349033],[-74.029724408156682,40.672970047595463],[-73.985140983708192,40.680349049267583],[-73.971393761028622,40.71154393543933],[-74.0022744646159,40.735376026215022]]]}, {“type”:“Polygon”,“coordinates”:[[[-74.019448383546617,40.790439140236963],[-74.064132193843633,40.783038509825],[-74.077839665342211,40.751803958414136],[-74.046908029686236,40.727986222489115],[-74.0022744646159,40.735376026215022],[-73.988522328408948,40.766594382212254],[-74.019448383546617,40.790439140236963]]]}, {“type”:“Polygon”,“coordinates”:[[[-74.077839665342211,40.751803958414136],[-74.1224794808745,40.744383587828388],[-74.1361375042681,40.713156370029125],[-74.1052004095288,40.689365648097258],[-74.060610712223664,40.696775140349033],[-74.046908029686236,40.727986222489115],[-74.077839665342211,40.751803958414136]]]}, {“type”:“Polygon”,“coordinates”:[[[-74.060610712223664,40.696775140349033],[-74.1052004095288,40.689365648097258],[-74.118853750491638,40.658161927046628],[-74.0879619670209,40.634383824229609],[-74.043422283844933,40.641782462872115],[-74.029724408156682,40.672970047595463],[-74.060610712223664,40.696775140349033]]]}, {“type”:“Polygon”,“coordinates”:[[[-73.985140983708192,40.680349049267583],[-74.029724408156682,40.672970047595463],[-74.043422283844933,40.641782462872115],[-74.012581189358343,40.617990065981623],[-73.968047801220749,40.625358290164748],[-73.954305509472675,40.656529678451555],[-73.985140983708192,40.680349049267583]]]}, {“type”:“Polygon”,“coordinates”:[[[-73.926766604813565,40.718903205013063],[-73.971393761028622,40.71154393543933],[-73.985140983708192,40.680349049267583],[-73.954305509472675,40.656529678451555],[-73.909728515658443,40.663878222244435],[-73.895936872069854,40.69505685239637],[-73.926766604813565,40.718903205013063]]]}, {“type”:“Polygon”,“coordinates”:[[[-73.943844904976629,40.773964402038523],[-73.988522328408948,40.766594382212254],[-74.0022744646159,40.735376026215022],[-73.971393761028622,40.71154393543933],[-73.926766604813565,40.718903205013063],[-73.912969923470314,40.750105305345329],[-73.943844904976629,40.773964402038523]]]}]}, “properties”: {“name”: “polygons”}} |
The following example calculates polygon unions that represent H3 cell and its neighbors.
let h3cell = '862a1072fffffff';
print cells = array_concat(pack_array(h3cell), geo_h3cell_neighbors(h3cell))
| mv-expand cells to typeof(string)
| project polygons = geo_h3cell_to_polygon(cells)
| summarize arr = make_list(polygons)
| project polygon = geo_union_polygons_array(arr)
Output
polygon |
---|
{ “type”: “Polygon”, “coordinates”: [[[ -73.926766604813565, 40.718903205013063],[ -73.912969923470314, 40.750105305345329],[ -73.943844904976629, 40.773964402038523],[ -73.988522328408948, 40.766594382212254],[ -74.019448383546617, 40.79043914023697],[ -74.064132193843633, 40.783038509825005],[ -74.077839665342211, 40.751803958414136],[ -74.1224794808745, 40.744383587828388],[ -74.1361375042681, 40.713156370029125],[ -74.1052004095288, 40.689365648097251],[ -74.118853750491638, 40.658161927046628],[ -74.0879619670209, 40.6343838242296],[ -74.043422283844933, 40.641782462872115],[ -74.012581189358343, 40.617990065981623],[ -73.968047801220749, 40.625358290164755],[ -73.954305509472675, 40.656529678451555],[ -73.909728515658443, 40.663878222244442],[ -73.895936872069854, 40.695056852396377],[ -73.926766604813565, 40.718903205013063]]]} |
The following example returns true because of the invalid H3 Cell token input.
print invalid = isnull(geo_h3cell_neighbors('abc'))
Output
invalid |
---|
1 |
6.12 - geo_h3cell_parent()
Calculates the H3 cell parent.
Read more about H3 Cell.
Syntax
geo_h3cell_parent(
h3cell,
resolution)
Parameters
Name | Type | Required | Description |
---|---|---|---|
h3cell | string | ✔️ | An H3 Cell token value as it was calculated by geo_point_to_h3cell(). |
resolution | int | Defines the requested children cells resolution. Supported values are in the range [0, 14]. If unspecified, an immediate children token will be calculated. |
Returns
H3 Cell parent token string
. If the H3 Cell is invalid or parent resolution is higher than given cell, the query will produce an empty result.
Examples
print parent_cell = geo_h3cell_parent('862a1072fffffff')
Output
parent_cell |
---|
852a1073fffffff |
The following example calculates cell parent at level 1.
print parent_cell = geo_h3cell_parent('862a1072fffffff', 1)
Output
parent_cell |
---|
812a3ffffffffff |
print parent_res = geo_h3cell_level(geo_h3cell_parent((geo_point_to_h3cell(1,1,10))))
Output
parent_res |
---|
9 |
print parent_res = geo_h3cell_level(geo_h3cell_parent(geo_point_to_h3cell(1,1,10), 3))
Output
parent_res |
---|
3 |
The following example produces an empty result because of the invalid cell input.
print invalid = isempty(geo_h3cell_parent('123'))
Output
invalid |
---|
1 |
The following example produces an empty result because of the invalid parent resolution.
print invalid = isempty(geo_h3cell_parent('862a1072fffffff', 100))
Output
invalid |
---|
1 |
The following example produces an empty result because parent can’t be of a higher resolution than child.
print invalid = isempty(geo_h3cell_parent('862a1072fffffff', 15))
Output
invalid |
---|
1 |
6.13 - geo_h3cell_rings()
Calculates the H3 cell Rings.
Read more about H3 Cell.
Syntax
geo_h3cell_rings(
h3cell,
distance)
Parameters
Name | Type | Required | Description |
---|---|---|---|
h3cell | string | ✔️ | An H3 Cell token value as it was calculated by geo_point_to_h3cell(). |
distance | int | ✔️ | Defines the maximum ring distance from given cell. Valid distance is in range [0, 142]. |
Returns
An ordered array of ring arrays where first ring contains the original cell, second ring contains neighboring cells, and so on. If either the H3 Cell or distance is invalid, the query produces a null result.
Examples
The following example produces rings up to distance 2.
print rings = geo_h3cell_rings('861f8894fffffff', 2)
Output
rings |
---|
[ [“861f8894fffffff”], [“861f88947ffffff”,“861f8895fffffff”,“861f88867ffffff”,“861f8d497ffffff”,“861f8d4b7ffffff”,“861f8896fffffff”], [“861f88967ffffff”,“861f88977ffffff”,“861f88957ffffff”,“861f8882fffffff”,“861f88877ffffff”,“861f88847ffffff”,“861f8886fffffff”,“861f8d49fffffff”,“861f8d487ffffff”,“861f8d4a7ffffff”,“861f8d59fffffff”,“861f8d597ffffff”] ] |
The following example produces all cells at level 1 (all neighbors).
print neighbors = geo_h3cell_rings('861f8894fffffff', 1)[1]
Output
neighbors |
---|
[“861f88947ffffff”, “861f8895fffffff”, “861f88867ffffff”, “861f8d497ffffff”, “861f8d4b7ffffff”,“861f8896fffffff”] |
The following example produces list of cells from all rings.
print rings = geo_h3cell_rings('861f8894fffffff', 1)
| mv-apply rings on
(
summarize cells = make_list(rings)
)
Output
cells |
---|
[“861f8894fffffff”,“861f88947ffffff”,“861f8895fffffff”,“861f88867ffffff”,“861f8d497ffffff”,“861f8d4b7ffffff”,“861f8896fffffff”] |
The following example assembles GeoJSON geometry collection of all cells.
print rings = geo_h3cell_rings('861f8894fffffff', 1)
| mv-apply rings on
(
summarize make_list(rings)
)
| mv-expand list_rings to typeof(string)
| project polygon = geo_h3cell_to_polygon(list_rings)
| summarize polygon_lst = make_list(polygon)
| project geojson = bag_pack(
"type", "Feature",
"geometry", bag_pack("type", "GeometryCollection", "geometries", polygon_lst),
"properties", bag_pack("name", "H3 polygons collection"))
Output
geojson |
---|
{ “type”: “Feature”, “geometry”: { “type”: “GeometryCollection”, “geometries”: [ … … … ]}, “properties”: { “name”: “H3 polygons collection” }} |
The following example returns true because of the invalid cell.
print is_null = isnull(geo_h3cell_rings('abc', 3))
Output
is_null |
---|
1 |
The following example returns true because of the invalid distance.
print is_null = isnull(geo_h3cell_rings('861f8894fffffff', 150))
Output
is_null |
---|
1 |
6.14 - geo_h3cell_to_central_point()
Calculates the geospatial coordinates that represent the center of an H3 Cell.
Read more about H3 Cell.
Syntax
geo_h3cell_to_central_point(
h3cell)
Parameters
Name | Type | Required | Description |
---|---|---|---|
h3cell | string | ✔️ | An H3 Cell token value as it was calculated by geo_point_to_h3cell(). |
Returns
The geospatial coordinate values in GeoJSON Format and of a dynamic data type. If the H3 cell token is invalid, the query will produce a null result.
Examples
print h3cell = geo_h3cell_to_central_point("862a1072fffffff")
Output
h3cell |
---|
{ “type”: “Point”, “coordinates”: [-74.016008479792447, 40.7041679083504] } |
The following example returns the longitude of the H3 Cell center point:
print longitude = geo_h3cell_to_central_point("862a1072fffffff").coordinates[0]
Output
longitude |
---|
-74.0160084797924 |
The following example returns a null result because of the invalid H3 cell token input.
print h3cell = geo_h3cell_to_central_point("1")
Output
h3cell |
---|
6.15 - geo_h3cell_to_polygon()
Calculates the polygon that represents the H3 Cell rectangular area.
Read more about H3 Cell.
Syntax
geo_h3cell_to_polygon(
h3cell)
Parameters
Name | Type | Required | Description |
---|---|---|---|
h3cell | string | ✔️ | An H3 Cell token value as it was calculated by geo_point_to_h3cell(). |
Returns
Polygon in GeoJSON Format and of a dynamic data type. If the H3 Cell is invalid, the query will produce a null result.
Examples
print geo_h3cell_to_polygon("862a1072fffffff")
Output
print_0 |
---|
{ “type”: “Polygon”, “coordinates”: [[[-74.0022744646159, 40.735376026215022], [-74.046908029686236, 40.727986222489115], [-74.060610712223664, 40.696775140349033],[ -74.029724408156682, 40.672970047595463], [-73.985140983708192, 40.680349049267583],[ -73.971393761028622, 40.71154393543933], [-74.0022744646159, 40.735376026215022]]] } |
The following example assembles GeoJSON geometry collection of H3 Cell polygons.
// H3 cell GeoJSON collection
datatable(lng:real, lat:real)
[
-73.956683, 40.807907,
-73.916869, 40.818314,
-73.989148, 40.743273,
]
| project h3_hash = geo_point_to_h3cell(lng, lat, 6)
| project h3_hash_polygon = geo_h3cell_to_polygon(h3_hash)
| summarize h3_hash_polygon_lst = make_list(h3_hash_polygon)
| project bag_pack(
"type", "Feature",
"geometry", bag_pack("type", "GeometryCollection", "geometries", h3_hash_polygon_lst),
"properties", bag_pack("name", "H3 polygons collection"))
Output
Column1 |
---|
{ “type”: “Feature”, “geometry”: {“type”: “GeometryCollection”, “geometries”: [{“type”: “Polygon”,“coordinates”: [[[-73.9609635556213, 40.829061732419916], [-74.005691351383675, 40.821680937801922], [-74.019448383546617, 40.790439140236963], [-73.988522328408948, 40.766594382212254], [-73.943844904976629, 40.773964402038523], [-73.930043202964953, 40.805189944379514], [-73.9609635556213, 40.829061732419916]]]}, {“type”: “Polygon”, “coordinates”: [[[-73.902385078754875, 40.867671551513595], [-73.94715685019348, 40.860310688399885], [-73.9609635556213, 40.829061732419916], [-73.930043202964953, 40.805189944379514], [-73.885321931061725, 40.812540084842404 ], [-73.871470551071766, 40.843772725733125], [ -73.902385078754875, 40.867671551513595]]]}, {“type”: “Polygon”,“coordinates”: [[[-73.943844904976629, 40.773964402038523], [-73.988522328408948, 40.766594382212254], [-74.0022744646159, 40.735376026215022], [-73.971393761028622, 40.71154393543933], [-73.926766604813565, 40.718903205013063], [ -73.912969923470314, 40.750105305345329 ], [-73.943844904976629, 40.773964402038523]]]}] }, “properties”: {“name”: “H3 polygons collection”} } |
The following example returns a null result because of the invalid H3 Cell token input.
print geo_h3cell_to_polygon("@")
Output
print_0 |
---|
6.16 - geo_intersection_2lines()
Calculates the intersection of two lines or multilines.
Syntax
geo_intersection_2lines(
lineString1,
lineString2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
lineString1 | dynamic | ✔️ | A line or multiline in the GeoJSON format. |
lineString2 | dynamic | ✔️ | A line or multiline in the GeoJSON format. |
Returns
Intersection in GeoJSON Format and of a dynamic data type. If LineString or a MultiLineString are invalid, the query will produce a null result.
LineString definition and constraints
dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2],…, [lng_N,lat_N]]})
dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2,…, line_N]]})
- LineString coordinates array must contain at least two entries.
- Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
- Edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
Examples
The following example calculates intersection between two lines. In this case, the result is a point.
let lineString1 = dynamic({"type":"LineString","coordinates":[[-73.978929,40.785155],[-73.980903,40.782621]]});
let lineString2 = dynamic({"type":"LineString","coordinates":[[-73.985195,40.788275],[-73.974552,40.779761]]});
print intersection = geo_intersection_2lines(lineString1, lineString2)
Output
intersection |
---|
{“type”: “Point”,“coordinates”: [-73.979837116670978,40.783989289772165]} |
The following example calculates intersection between two lines. In this case, the result is a line.
let line = dynamic({"type":"LineString","coordinates":[[-73.978929,40.785155],[-73.980903,40.782621]]});
print intersection = geo_intersection_2lines(line, line)
Output
intersection |
---|
{“type”: “LineString”,“coordinates”: [[ -73.978929, 40.785155],[ -73.980903, 40.782621]]} |
The following two lines don’t intersect.
let lineString1 = dynamic({"type":"LineString","coordinates":[[1, 1],[2, 2]]});
let lineString2 = dynamic({"type":"LineString","coordinates":[[3, 3],[4, 4]]});
print intersection = geo_intersection_2lines(lineString1, lineString2)
Output
intersection |
---|
{“type”: “GeometryCollection”, “geometries”: []} |
The following example will return a null result because one of lines is invalid.
let lineString1 = dynamic({"type":"LineString","coordinates":[[1, 1],[2, 2]]});
let lineString2 = dynamic({"type":"LineString","coordinates":[[3, 3]]});
print invalid = isnull(geo_intersection_2lines(lineString1, lineString2))
Output
invalid |
---|
1 |
6.17 - geo_intersection_2polygons()
Calculates the intersection of two polygons or multipolygons.
Syntax
geo_intersection_2polygons(
polygon1,
polygon1)
Parameters
Name | Type | Required | Description |
---|---|---|---|
polygon1 | dynamic | ✔️ | Polygon or multipolygon in the GeoJSON format. |
polygon2 | dynamic | ✔️ | Polygon or multipolygon in the GeoJSON format. |
Returns
Intersection in GeoJSON Format and of a dynamic data type. If Polygon or a MultiPolygon are invalid, the query will produce a null result.
Polygon definition and constraints
dynamic({“type”: “Polygon”,“coordinates”: [LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})
dynamic({“type”: “MultiPolygon”,“coordinates”: [[LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ],…, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})
- LinearRingShell is required and defined as a
counterclockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell. - LinearRingHole is optional and defined as a
clockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes. - LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
- Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
- LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
- LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
- LinearRings must not cross and must not share edges. LinearRings may share vertices.
- Polygon contains its vertices.
Examples
The following example calculates intersection between two polygons. In this case, the result is a polygon.
let polygon1 = dynamic({"type":"Polygon","coordinates":[[[-73.9630937576294,40.77498840732385],[-73.963565826416,40.774383111780914],[-73.96205306053162,40.773745311181585],[-73.96160781383514,40.7743912365898],[-73.9630937576294,40.77498840732385]]]});
let polygon2 = dynamic({"type":"Polygon","coordinates":[[[-73.96213352680206,40.775045280447145],[-73.9631313085556,40.774578106920345],[-73.96207988262177,40.77416780398293],[-73.96213352680206,40.775045280447145]]]});
print intersection = geo_intersection_2polygons(polygon1, polygon2)
Output
intersection |
---|
{“type”: “Polygon”, “coordinates”: [[[-73.962105776437156,40.774591360999679],[-73.962642403166868,40.774807020251778],[-73.9631313085556,40.774578106920352],[-73.962079882621765,40.774167803982927],[-73.962105776437156,40.774591360999679]]]} |
The following example calculates intersection between two polygons. In this case, the result is a point.
let polygon1 = dynamic({"type":"Polygon","coordinates":[[[2,45],[0,45],[1,44],[2,45]]]});
let polygon2 = dynamic({"type":"Polygon","coordinates":[[[3,44],[2,45],[2,43],[3,44]]]});
print intersection = geo_intersection_2polygons(polygon1, polygon2)
Output
intersection |
---|
{“type”: “Point”,“coordinates”: [2,45]} |
The following two polygons intersection is a collection.
let polygon1 = dynamic({"type":"Polygon","coordinates":[[[2,45],[0,45],[1,44],[2,45]]]});
let polygon2 = dynamic({"type":"MultiPolygon","coordinates":[[[[3,44],[2,45],[2,43],[3,44]]],[[[1.192,45.265],[1.005,44.943],[1.356,44.937],[1.192,45.265]]]]});
print intersection = geo_intersection_2polygons(polygon1, polygon2)
Output
intersection |
---|
{“type”: “GeometryCollection”,“geometries”: [ { “type”: “Point”, “coordinates”: [2, 45]}, { “type”: “Polygon”, “coordinates”: [[[1.3227075526410679,45.003909145068739],[1.0404565374899824,45.004356403066552],[1.005,44.943],[1.356,44.937],[1.3227075526410679,45.003909145068739]]]}]} |
The following two polygons don’t intersect.
let polygon1 = dynamic({"type":"Polygon","coordinates":[[[2,45],[0,45],[1,44],[2,45]]]});
let polygon2 = dynamic({"type":"Polygon","coordinates":[[[3,44],[3,45],[2,43],[3,44]]]});
print intersection = geo_intersection_2polygons(polygon1, polygon2)
Output
intersection |
---|
{“type”: “GeometryCollection”, “geometries”: []} |
The following example finds all counties in USA that intersect with area of interest polygon.
let area_of_interest = dynamic({"type":"Polygon","coordinates":[[[-73.96213352680206,40.775045280447145],[-73.9631313085556,40.774578106920345],[-73.96207988262177,40.77416780398293],[-73.96213352680206,40.775045280447145]]]});
US_Counties
| project name = features.properties.NAME, county = features.geometry
| project name, intersection = geo_intersection_2polygons(county, area_of_interest)
| where array_length(intersection.geometries) != 0
Output
name | intersection |
---|---|
New York | {“type”: “Polygon”,“coordinates”: [[[-73.96213352680206, 40.775045280447145], [-73.9631313085556, 40.774578106920345], [-73.96207988262177,40.77416780398293],[-73.96213352680206, 40.775045280447145]]]} |
The following example will return a null result because one of the polygons is invalid.
let central_park_polygon = dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]]});
let invalid_polygon = dynamic({"type":"Polygon"});
print isnull(geo_intersection_2polygons(invalid_polygon, central_park_polygon))
Output
print_0 |
---|
1 |
6.18 - geo_intersection_line_with_polygon()
Calculates the intersection of a line or a multiline with a polygon or a multipolygon.
Syntax
geo_intersection_line_with_polygon(
lineString,
polygon)
Parameters
Name | Type | Required | Description |
---|---|---|---|
lineString | dynamic | ✔️ | A LineString or MultiLineString in the GeoJSON format. |
polygon | dynamic | ✔️ | A Polygon or MultiPolygon in the GeoJSON format. |
Returns
Intersection in GeoJSON Format and of a dynamic data type. If lineString or a multiLineString or a polygon or a multipolygon are invalid, the query will produce a null result.
LineString definition and constraints
dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})
dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})
- LineString coordinates array must contain at least two entries.
- Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
- Edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
Polygon definition and constraints
dynamic({“type”: “Polygon”,“coordinates”: [LinearRingShell, LinearRingHole_1, …, LinearRingHole_N]})
dynamic({“type”: “MultiPolygon”,“coordinates”: [[LinearRingShell, LinearRingHole_1, …, LinearRingHole_N],…, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})
- LinearRingShell is required and defined as a
counterclockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell. - LinearRingHole is optional and defined as a
clockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes. - LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
- Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
- LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
- LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
- LinearRings must not cross and must not share edges. LinearRings may share vertices.
- Polygon contains its vertices.
Examples
The following example calculates intersection between line and polygon. In this case, the result is a line.
let lineString = dynamic({"type":"LineString","coordinates":[[-73.985195,40.788275],[-73.974552,40.779761]]});
let polygon = dynamic({"type":"Polygon","coordinates":[[[-73.9712905883789,40.78580561168767],[-73.98004531860352,40.775276834803655],[-73.97000312805176,40.77852663535664],[-73.9712905883789,40.78580561168767]]]});
print intersection = geo_intersection_line_with_polygon(lineString, polygon)
Output
intersection |
---|
{“type”: “LineString”,“coordinates”: [[-73.975611956578192,40.78060906714618],[-73.974552,40.779761]]} |
The following example calculates intersection between line and polygon. In this case, the result is a multiline.
let lineString = dynamic({"type":"LineString","coordinates":[[-110.522, 39.198],[-91.428, 40.880]]});
let polygon = dynamic({"type":"Polygon","coordinates":[[[-90.263,36.738],[-102.041,45.274],[-109.335,36.527],[-90.263,36.738]],[[-100.393,41.705],[-103.139,38.925],[-97.558,39.113],[-100.393,41.705]]]});
print intersection = geo_intersection_line_with_polygon(lineString, polygon)
Output
intersection |
---|
{“type”: “MultiLineString”,“coordinates”: [[[ -106.89353655881905, 39.769226209776306],[ -101.74448553679453, 40.373506008712525]],[[-99.136499431328858, 40.589336512699994],[-95.284527737311791, 40.799060242246348]]]} |
The following line and polygon don’t intersect.
let lineString = dynamic({"type":"LineString","coordinates":[[1, 1],[2, 2]]});
let polygon = dynamic({"type":"Polygon","coordinates":[[[-73.9712905883789,40.78580561168767],[-73.98004531860352,40.775276834803655],[-73.97000312805176,40.77852663535664],[-73.9712905883789,40.78580561168767]]]});
print intersection = geo_intersection_line_with_polygon(lineString, polygon)
Output
intersection |
---|
{“type”: “GeometryCollection”,“geometries”: []} |
The following example finds all roads in the NYC GeoJSON roads table that intersects with the area of interest literal polygon.
let area_of_interest = dynamic({"type":"Polygon","coordinates":[[[-73.95768642425537,40.80065354924362],[-73.9582872390747,40.80089719667298],[-73.95869493484497,40.80050736035672],[-73.9580512046814,40.80019873831593],[-73.95768642425537,40.80065354924362]]]});
NY_Manhattan_Roads
| project name = features.properties.Label, road = features.geometry
| project name, intersection = geo_intersection_line_with_polygon(road, area_of_interest)
| where array_length(intersection.geometries) != 0
Output
name | intersection |
---|---|
CentralParkW | {“type”:“MultiLineString”,“coordinates”:[[[-73.958295846836933,40.800316027289647],[-73.9582724,40.8003415]],[[-73.958413422194482,40.80037239620097],[-73.9584093,40.8003797]]]} |
FrederickDouglassCir | {“type”:“LineString”,“coordinates”:[[-73.9579272943862,40.800751229494182],[-73.9579019,40.8007238],[-73.9578688,40.8006749],[-73.9578508,40.8006203],[-73.9578459,40.800570199999996],[-73.9578484,40.80053310000001],[-73.9578627,40.800486700000008],[-73.957913,40.800421100000008],[-73.9579668,40.8003923],[-73.9580189,40.80037260000001],[-73.9580543,40.8003616],[-73.9581237,40.8003395],[-73.9581778,40.8003365],[-73.9582724,40.8003415],[-73.958308,40.8003466],[-73.9583328,40.8003517],[-73.9583757,40.8003645],[-73.9584093,40.8003797],[-73.9584535,40.80041099999999],[-73.9584818,40.8004536],[-73.958507000000012,40.8004955],[-73.9585217,40.800562400000004],[-73.9585282,40.8006155],[-73.958416200000016,40.8007325],[-73.9583541,40.8007785],[-73.9582772,40.800811499999995],[-73.9582151,40.8008285],[-73.958145918999392,40.800839887820239]]} |
W110thSt | {“type”:“MultiLineString”,“coordinates”:[[[-73.957828446036331,40.800476476316327],[-73.9578627,40.800486700000008]],[[-73.9585282,40.8006155],[-73.958565492035873,40.800631133466972]],[[-73.958416200000016,40.8007325],[-73.958446850928084,40.800744577466617]]]} |
WestDr | {“type”:“LineString”,“coordinates”:[[-73.9580543,40.8003616],[-73.958009693938735,40.800250494588468]]} |
The following example finds all counties in the USA that intersect with area of interest literal LineString.
let area_of_interest = dynamic({"type":"LineString","coordinates":[[-73.97159099578857,40.794513338780895],[-73.96738529205322,40.792758888618756],[-73.96978855133057,40.789769718601505]]});
US_Counties
| project name = features.properties.NAME, county = features.geometry
| project name, intersection = geo_intersection_line_with_polygon(area_of_interest, county)
| where array_length(intersection.geometries) != 0
Output
name | intersection |
---|---|
New York | {“type”: “LineString”,“coordinates”: [[-73.971590995788574, 40.794513338780895], [-73.967385292053223, 40.792758888618756],[-73.969788551330566, 40.789769718601512]]} |
The following example will return a null result because the LineString is invalid.
let lineString = dynamic({"type":"LineString","coordinates":[[-73.985195,40.788275]]});
let polygon = dynamic({"type":"Polygon","coordinates":[[[-73.95768642425537,40.80065354924362],[-73.9582872390747,40.80089719667298],[-73.95869493484497,40.80050736035672],[-73.9580512046814,40.80019873831593],[-73.95768642425537,40.80065354924362]]]});
print is_invalid = isnull(geo_intersection_2lines(lineString, polygon))
Output
is_invalid |
---|
1 |
The following example will return a null result because the polygon is invalid.
let lineString = dynamic({"type":"LineString","coordinates":[[-73.97159099578857,40.794513338780895],[-73.96738529205322,40.792758888618756],[-73.96978855133057,40.789769718601505]]});
let polygon = dynamic({"type":"Polygon","coordinates":[]});
print is_invalid = isnull(geo_intersection_2lines(lineString, polygon))
Output
is_invalid |
---|
1 |
6.19 - geo_intersects_2lines()
Calculates whether two lines or multilines intersect.
Syntax
geo_intersects_2lines(
lineString1,
lineString2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
lineString1 | dynamic | ✔️ | A line or multiline in the GeoJSON format. |
lineString2 | dynamic | ✔️ | A line or multiline in the GeoJSON format. |
Returns
Indicates whether two lines or multilines intersect. If lineString or a multiLineString are invalid, the query will produce a null result.
LineString definition and constraints
dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})
dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})
- LineString coordinates array must contain at least two entries.
- Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
- Edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
Examples
The following example checks whether some two literal lines intersects.
let lineString1 = dynamic({"type":"LineString","coordinates":[[-73.978929,40.785155],[-73.980903,40.782621]]});
let lineString2 = dynamic({"type":"LineString","coordinates":[[-73.985195,40.788275],[-73.974552,40.779761]]});
print intersects = geo_intersects_2lines(lineString1, lineString2)
Output
intersects |
---|
True |
The following example finds all roads in the NYC GeoJSON roads table that intersects with some lines of interest.
let my_road = dynamic({"type":"LineString","coordinates":[[-73.97892951965332,40.78515573551921],[-73.98090362548828,40.78262115769851]]});
NY_Manhattan_Roads
| project name = features.properties.Label, road = features.geometry
| where geo_intersects_2lines(road, my_road)
| project name
Output
name |
---|
Broadway |
W 78th St |
W 79th St |
W 80th St |
W 81st St |
The following example will return a null result because one of lines is invalid.
let lineString1 = dynamic({"type":"LineString","coordinates":[[-73.978929,40.785155],[-73.980903,40.782621]]});
let lineString2 = dynamic({"type":"LineString","coordinates":[[-73.985195,40.788275]]});
print isnull(geo_intersects_2lines(lineString1, lineString2))
Output
print_0 |
---|
True |
6.20 - geo_intersects_2polygons()
Calculates whether two polygons or multipolygons intersect.
Syntax
geo_intersects_2polygons(
polygon1,
polygon1)
Parameters
Name | Type | Required | Description |
---|---|---|---|
polygon1 | dynamic | ✔️ | Polygon or multipolygon in the GeoJSON format. |
polygon2 | dynamic | ✔️ | Polygon or multipolygon in the GeoJSON format. |
Returns
Indicates whether two polygons or multipolygons intersect. If the Polygon or the MultiPolygon are invalid, the query will produce a null result.
Polygon definition and constraints
dynamic({“type”: “Polygon”,“coordinates”: [LinearRingShell, LinearRingHole_1, …, LinearRingHole_N]})
dynamic({“type”: “MultiPolygon”,“coordinates”: [[LinearRingShell, LinearRingHole_1, …, LinearRingHole_N], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})
- LinearRingShell is required and defined as a
counterclockwise
ordered array of coordinates [[lng_1,lat_1], …, [lng_i,lat_i], …,[lng_j,lat_j], …,[lng_1,lat_1]]. There can be only one shell. - LinearRingHole is optional and defined as a
clockwise
ordered array of coordinates [[lng_1,lat_1], …,[lng_i,lat_i], …,[lng_j,lat_j], …,[lng_1,lat_1]]. There can be any number of interior rings and holes. - LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
- Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
- LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
- LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
- LinearRings must not cross and must not share edges. LinearRings may share vertices.
- Polygon contains its vertices.
Examples
The following example checks whether some two literal polygons intersects.
let polygon1 = dynamic({"type":"Polygon","coordinates":[[[-73.9630937576294,40.77498840732385],[-73.963565826416,40.774383111780914],[-73.96205306053162,40.773745311181585],[-73.96160781383514,40.7743912365898],[-73.9630937576294,40.77498840732385]]]});
let polygon2 = dynamic({"type":"Polygon","coordinates":[[[-73.96213352680206,40.775045280447145],[-73.9631313085556,40.774578106920345],[-73.96207988262177,40.77416780398293],[-73.96213352680206,40.775045280447145]]]});
print geo_intersects_2polygons(polygon1, polygon2)
Output
print_0 |
---|
True |
The following example finds all counties in the USA that intersect with area of interest literal polygon.
let area_of_interest = dynamic({"type":"Polygon","coordinates":[[[-73.96213352680206,40.775045280447145],[-73.9631313085556,40.774578106920345],[-73.96207988262177,40.77416780398293],[-73.96213352680206,40.775045280447145]]]});
US_Counties
| project name = features.properties.NAME, county = features.geometry
| where geo_intersects_2polygons(county, area_of_interest)
| project name
Output
name |
---|
New York |
The following example will return a null result because one of the polygons is invalid.
let central_park_polygon = dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]]});
let invalid_polygon = dynamic({"type":"Polygon"});
print isnull(geo_intersects_2polygons(invalid_polygon, central_park_polygon))
Output
print_0 |
---|
True |
6.21 - geo_intersects_line_with_polygon()
Calculates whether a line or multiline intersect with a polygon or a multipolygon.
Syntax
geo_intersects_line_with_polygon(
lineString,
polygon)
Parameters
Name | Type | Required | Description |
---|---|---|---|
lineString | dynamic | ✔️ | A LineString or MultiLineString in the GeoJSON format. |
polygon | dynamic | ✔️ | A Polygon or MultiPolygon in the GeoJSON format. |
Returns
Indicates whether the line or multiline intersects with polygon or a multipolygon. If lineString or a multiLineString or a polygon or a multipolygon are invalid, the query will produce a null result.
LineString definition and constraints
dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})
dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})
- LineString coordinates array must contain at least two entries.
- Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
- Edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
Polygon definition and constraints
dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N]})
dynamic({“type”: “MultiPolygon”,“coordinates”: [[LinearRingShell, LinearRingHole_1, …, LinearRingHole_N], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})
- LinearRingShell is required and defined as a
counterclockwise
ordered array of coordinates [[lng_1,lat_1], …,[lng_i,lat_i], …,[lng_j,lat_j], …,[lng_1,lat_1]]. There can be only one shell. - LinearRingHole is optional and defined as a
clockwise
ordered array of coordinates [[lng_1,lat_1], …,[lng_i,lat_i], …,[lng_j,lat_j], …,[lng_1,lat_1]]. There can be any number of interior rings and holes. - LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
- Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
- LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
- LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
- LinearRings must not cross and must not share edges. LinearRings may share vertices.
- Polygon doesn’t necessarily contain its vertices.
Examples
The following example checks whether a literal LineString intersects with a Polygon.
let lineString = dynamic({"type":"LineString","coordinates":[[-73.985195,40.788275],[-73.974552,40.779761]]});
let polygon = dynamic({"type":"Polygon","coordinates":[[[-73.9712905883789,40.78580561168767],[-73.98004531860352,40.775276834803655],[-73.97000312805176,40.77852663535664],[-73.9712905883789,40.78580561168767]]]});
print intersects = geo_intersects_line_with_polygon(lineString, polygon)
Output
intersects |
---|
True |
The following example finds all roads in the NYC GeoJSON roads table that intersect with area of interest literal polygon.
let area_of_interest = dynamic({"type":"Polygon","coordinates":[[[-73.95768642425537,40.80065354924362],[-73.9582872390747,40.80089719667298],[-73.95869493484497,40.80050736035672],[-73.9580512046814,40.80019873831593],[-73.95768642425537,40.80065354924362]]]});
NY_Manhattan_Roads
| project name = features.properties.Label, road = features.geometry
| where geo_intersects_line_with_polygon(road, area_of_interest)
| project name
Output
name |
---|
Central Park W |
Frederick Douglass Cir |
W 110th St |
West Dr |
The following example finds all counties in the USA that intersect with area of interest literal LineString.
let area_of_interest = dynamic({"type":"LineString","coordinates":[[-73.97159099578857,40.794513338780895],[-73.96738529205322,40.792758888618756],[-73.96978855133057,40.789769718601505]]});
US_Counties
| project name = features.properties.NAME, county = features.geometry
| where geo_intersects_line_with_polygon(area_of_interest, county)
| project name
Output
name |
---|
New York |
The following example will return a null result because the LineString is invalid.
let lineString = dynamic({"type":"LineString","coordinates":[[-73.985195,40.788275]]});
let polygon = dynamic({"type":"Polygon","coordinates":[[[-73.95768642425537,40.80065354924362],[-73.9582872390747,40.80089719667298],[-73.95869493484497,40.80050736035672],[-73.9580512046814,40.80019873831593],[-73.95768642425537,40.80065354924362]]]});
print isnull(geo_intersects_2lines(lineString, polygon))
Output
print_0 |
---|
True |
The following example will return a null result because the polygon is invalid.
let lineString = dynamic({"type":"LineString","coordinates":[[-73.97159099578857,40.794513338780895],[-73.96738529205322,40.792758888618756],[-73.96978855133057,40.789769718601505]]});
let polygon = dynamic({"type":"Polygon","coordinates":[]});
print isnull(geo_intersects_2lines(lineString, polygon))
Output
print_0 |
---|
True |
6.22 - geo_line_buffer()
Calculates polygon or multipolygon that contains all points within the given radius of the input line or multiline on Earth.
Syntax
geo_line_buffer(
lineString,
radius,
tolerance)
Parameters
Name | Type | Required | Description |
---|---|---|---|
lineString | dynamic | ✔️ | A LineString or MultiLineString in the GeoJSON format. |
radius | real | ✔️ | Buffer radius in meters. Valid value must be positive. |
tolerance | real | Defines the tolerance in meters that determines how much a polygon can deviate from the ideal radius. If unspecified, the default value 10 is used. Tolerance should be no lower than 0.0001% of the radius. Specifying tolerance bigger than radius lowers the tolerance to biggest possible value below the radius. |
Returns
Polygon or MultiPolygon around the input LineString or MultiLineString. If the coordinates or radius or tolerance is invalid, the query produces a null result.
LineString definition and constraints
dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})
dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})
- LineString coordinates array must contain at least two entries.
- Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
- Edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
Examples
The following query calculates polygon around line, with radius of 4 meters and 0.1 meter tolerance
let line = dynamic({"type":"LineString","coordinates":[[-80.66634997047466,24.894526340592122],[-80.67373241820246,24.890808090321286]]});
print buffer = geo_line_buffer(line, 4, 0.1)
buffer |
---|
{“type”: “Polygon”, “coordinates”: [ … ]} |
The following query calculates buffer around each line and unifies result
datatable(line:dynamic)
[
dynamic({"type":"LineString","coordinates":[[14.429214068940496,50.10043066548272],[14.431184174126173,50.10046525983731]]}),
dynamic({"type":"LineString","coordinates":[[14.43030222687753,50.100780677801936],[14.4303847111523,50.10020274910934]]})
]
| project buffer = geo_line_buffer(line, 2, 0.1)
| summarize polygons = make_list(buffer)
| project result = geo_union_polygons_array(polygons)
result |
---|
{“type”: “Polygon”,“coordinates”: [ … ]} |
The following example will return true, due to invalid line.
print buffer = isnull(geo_line_buffer(dynamic({"type":"LineString"}), 5))
buffer |
---|
True |
The following example will return true, due to invalid radius.
print buffer = isnull(geo_line_buffer(dynamic({"type":"LineString","coordinates":[[0,0],[1,1]]}), 0))
buffer |
---|
True |
6.23 - geo_line_centroid()
Calculates the centroid of a line or a multiline on Earth.
Syntax
geo_line_centroid(
lineString)
Parameters
Name | Type | Required | Description |
---|---|---|---|
lineString | dynamic | ✔️ | A LineString or MultiLineString in the GeoJSON format. |
Returns
The centroid coordinate values in GeoJSON Format and of a dynamic data type. If the line or the multiline is invalid, the query produces a null result.
LineString definition and constraints
dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})
dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})
- LineString coordinates array must contain at least two entries.
- Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
- Edge length must be less than 180 degrees. The shortest edge between the two vertices is chosen.
Examples
The following example calculates line centroid.
let line = dynamic({"type":"LineString","coordinates":[[-73.95796, 40.80042], [-73.97317, 40.764486]]});
print centroid = geo_line_centroid(line);
Output
centroid |
---|
{“type”: “Point”, “coordinates”: [-73.965567057230942, 40.782453249627416]} |
The following example calculates line centroid longitude.
let line = dynamic({"type":"LineString","coordinates":[[-73.95807266235352,40.800426144169315],[-73.94966125488281,40.79691751000055],[-73.97317886352539,40.764486356930334],[-73.98210525512695,40.76786669510221],[-73.96004676818848,40.7980870753293]]});
print centroid = geo_line_centroid(line)
| project lng = centroid.coordinates[0]
Output
lng |
---|
-73.9660675626837 |
The following example visualizes line centroid on a map.
let line = dynamic({"type":"MultiLineString","coordinates":[[[-73.95798683166502,40.800556090021466],[-73.98193359375,40.76819171855746]],[[-73.94940376281738,40.79691751000055],[-73.97317886352539,40.76435634049001]]]});
print centroid = geo_line_centroid(line)
| render scatterchart with (kind = map)
The following example returns true
because of the invalid line.
print is_bad_line = isnull(geo_line_centroid(dynamic({"type":"LineString","coordinates":[[1, 1]]})))
Output
is_bad_line |
---|
true |
6.24 - geo_line_densify()
Converts planar lines or multiline edges to geodesics by adding intermediate points.
Syntax
geo_line_densify(
lineString,
tolerance,
[ preserve_crossing ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
lineString | dynamic | ✔️ | A LineString or MultiLineString in the GeoJSON format. |
tolerance | int, long, or real | Defines maximum distance in meters between the original planar edge and the converted geodesic edge chain. Supported values are in the range [0.1, 10000]. If unspecified, the default value 10 is used. | |
preserve_crossing | bool | If true , preserves edge crossing over antimeridian. If unspecified, the default value false is used. |
Returns
Densified line in the GeoJSON format and of a dynamic data type. If either the line or tolerance is invalid, the query will produce a null result.
LineString definition
dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})
dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})
- LineString coordinates array must contain at least two entries.
- The coordinates [longitude, latitude] must be valid. The longitude must be a real number in the range [-180, +180] and the latitude must be a real number in the range [-90, +90].
- The edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
Constraints
- The maximum number of points in the densified line is limited to 10485760.
- Storing lines in dynamic format has size limits.
Motivation
- GeoJSON format defines an edge between two points as a straight cartesian line while
geo_line_densify()
uses geodesic. - The decision to use geodesic or planar edges might depend on the dataset and is especially relevant in long edges.
Examples
The following example densifies a road in Manhattan island. The edge is short and the distance between the planar edge and its geodesic counterpart is less than the distance specified by tolerance. As such, the result remains unchanged.
print densified_line = tostring(geo_line_densify(dynamic({"type":"LineString","coordinates":[[-73.949247, 40.796860],[-73.973017, 40.764323]]})))
Output
densified_line |
---|
{“type”:“LineString”,“coordinates”:[[-73.949247, 40.796860], [-73.973017, 40.764323]]} |
The following example densifies an edge of ~130-km length
print densified_line = tostring(geo_line_densify(dynamic({"type":"LineString","coordinates":[[50, 50], [51, 51]]})))
Output
densified_line |
---|
{“type”:“LineString”,“coordinates”:[[50,50],[50.125,50.125],[50.25,50.25],[50.375,50.375],[50.5,50.5],[50.625,50.625],[50.75,50.75],[50.875,50.875],[51,51]]} |
The following example returns a null result because of the invalid coordinate input.
print densified_line = geo_line_densify(dynamic({"type":"LineString","coordinates":[[300,1],[1,1]]}))
Output
densified_line |
---|
The following example returns a null result because of the invalid tolerance input.
print densified_line = geo_line_densify(dynamic({"type":"LineString","coordinates":[[1,1],[2,2]]}), 0)
Output
densified_line |
---|
6.25 - geo_line_length()
Calculates the total length of a line or a multiline on Earth.
Syntax
geo_line_length(
lineString)
Parameters
Name | Type | Required | Description |
---|---|---|---|
lineString | dynamic | ✔️ | A LineString or MultiLineString in the GeoJSON format. |
Returns
The total length of a line or a multiline, in meters, on Earth. If the line or multiline is invalid, the query will produce a null result.
LineString definition and constraints
dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})
dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})
- LineString coordinates array must contain at least two entries.
- Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
- Edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
Examples
The following example calculates the total line length, in meters.
let line = dynamic({"type":"LineString","coordinates":[[-73.95807266235352,40.800426144169315],[-73.94966125488281,40.79691751000055],[-73.97317886352539,40.764486356930334]]});
print length = geo_line_length(line)
Output
length |
---|
4922.48016992081 |
The following example calculates total multiline length, in meters.
let line = dynamic({"type":"MultiLineString","coordinates":[[[-73.95798683166502,40.800556090021466],[-73.98193359375,40.76819171855746]],[[-73.94940376281738,40.79691751000055],[-73.97317886352539,40.76435634049001]]]});
print length = geo_line_length(line)
Output
length |
---|
8262.24339753741 |
The following example returns True because of the invalid line.
print is_bad_line = isnull(geo_line_length(dynamic({"type":"LineString","coordinates":[[1, 1]]})))
Output
is_bad_line |
---|
True |
6.26 - geo_line_simplify()
Simplifies a line or a multiline by replacing nearly straight chains of short edges with a single long edge on Earth.
Syntax
geo_line_simplify(
lineString,
tolerance)
Parameters
Name | Type | Required | Description |
---|---|---|---|
lineString | dynamic | ✔️ | A LineString or MultiLineString in the GeoJSON format. |
tolerance | int, long, or real | Defines minimum distance in meters between any two vertices. Supported values are in the range [0, ~7,800,000 meters]. If unspecified, the default value 10 is used. |
Returns
Simplified line or a multiline in the GeoJSON format and of a dynamic data type, with no two vertices with distance less than tolerance. If either the line or tolerance is invalid, the query will produce a null result.
LineString definition and constraints
dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})
dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})
- LineString coordinates array must contain at least two entries.
- Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
- Edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
Examples
The following example simplifies the line by removing vertices that are within a 10-meter distance from each other.
let line = dynamic({"type":"LineString","coordinates":[[-73.97033169865608,40.789063020152824],[-73.97039607167244,40.78897975920816],[-73.9704617857933,40.78888837512432],[-73.97052884101868,40.7887949601531],[-73.9706052839756,40.788698498903564],[-73.97065222263336,40.78862640672032],[-73.97072866559029,40.78852791445617],[-73.97079303860664,40.788434498977836]]});
print simplified = geo_line_simplify(line, 10)
Output
simplified |
---|
{“type”: “LineString”, “coordinates”: [[-73.97033169865608, 40.789063020152824], [-73.97079303860664, 40.788434498977836]]} |
The following example simplifies lines and combines results into GeoJSON geometry collection.
NY_Manhattan_Roads
| project road = features.geometry
| project road_simplified = geo_line_simplify(road, 100)
| summarize roads_lst = make_list(road_simplified)
| project geojson = bag_pack("type", "Feature","geometry", bag_pack("type", "GeometryCollection", "geometries", roads_lst), "properties", bag_pack("name", "roads"))
Output
geojson |
---|
{“type”: “Feature”, “geometry”: {“type”: “GeometryCollection”, “geometries”: [ … ]}, “properties”: {“name”: “roads”}} |
The following example simplifies lines and unifies result
NY_Manhattan_Roads
| project road = features.geometry
| project road_simplified = geo_line_simplify(road, 100)
| summarize roads_lst = make_list(road_simplified)
| project roads = geo_union_lines_array(roads_lst)
Output
roads |
---|
{“type”: “MultiLineString”, “coordinates”: [ … ]} |
The following example returns True because of the invalid line.
print is_invalid_line = isnull(geo_line_simplify(dynamic({"type":"LineString","coordinates":[[1, 1]]})))
Output
is_invalid_line |
---|
True |
The following example returns True because of the invalid tolerance.
print is_invalid_line = isnull(geo_line_simplify(dynamic({"type":"LineString","coordinates":[[1, 1],[2,2]]}), -1))
Output
is_invalid_line |
---|
True |
The following example returns True because high tolerance causes small line to disappear.
print is_invalid_line = isnull(geo_line_simplify(dynamic({"type":"LineString","coordinates":[[1.1, 1.1],[1.2,1.2]]}), 100000))
Output
is_invalid_line |
---|
True |
6.27 - geo_line_to_s2cells()
Calculates S2 cell tokens that cover a line or multiline on Earth. This function is a useful geospatial join tool.
Read more about S2 cell hierarchy.
Syntax
geo_line_to_s2cells(
lineString [,
level[ ,
radius]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
lineString | dynamic | ✔️ | Line or multiline in the GeoJSON format. |
level | int | Defines the requested cell level. Supported values are in the range [0, 30]. If unspecified, the default value 11 is used. | |
radius | real | Buffer radius in meters. If unspecified, the default value 0 is used. |
Returns
Array of S2 cell token strings that cover a line or a multiline. If the radius is set to a positive value, then the covering will be of both input shape and all points within the radius of the input geometry.
If any of the following: line, level, radius is invalid, or the cell count exceeds the limit, the query will produce a null result.
Choosing the S2 cell level
- Ideally we would want to cover every line with one or just a few unique cells such that no two lines share the same cell.
- In practice, try covering with just a few cells, no more than a dozen. Covering with more than 10,000 cells might not yield good performance.
- Query run time and memory consumption might differ greatly because of different S2 cell level values.
Performance improvement suggestions
- If possible, reduce lines count due to nature of the data or business needs. Filter out unnecessary lines before join, scope to the area of interest or unify lines.
- In case of very big lines, reduce their size using geo_line_simplify().
- Changing S2 cell level may improve performance and memory consumption.
- Changing join kind and hint may improve performance and memory consumption.
- In case positive radius is set, reverting to radius 0 on buffered shape using geo_line_buffer() may improve performance.
Examples
The following query finds all tube stations within 500 meters of streets and aggregates tubes count by street name.
let radius = 500;
let tube_stations = datatable(tube_station_name:string, lng:real, lat: real)
[
"St. James' Park", -0.13451078568013486, 51.49919145858172,
"London Bridge station", -0.08492752160134387, 51.504876316440914,
// more points
];
let streets = datatable(street_name:string, line:dynamic)
[
"Buckingham Palace", dynamic({"type":"LineString","coordinates":[[-0.1399656708283601,51.50190802248855],[-0.14088438832752104,51.50012082761452]]}),
"London Bridge", dynamic({"type":"LineString","coordinates":[[-0.087152,51.509596],[-0.088340,51.506110]]}),
// more lines
];
let join_level = 14;
let lines = materialize(streets | extend id = new_guid());
let res =
lines
| project id, covering = geo_line_to_s2cells(line, join_level, radius)
| mv-expand covering to typeof(string)
| join kind=inner hint.strategy=broadcast
(
tube_stations
| extend covering = geo_point_to_s2cell(lng, lat, join_level)
) on covering;
res | lookup lines on id
| where geo_distance_point_to_line(lng, lat, line) <= radius
| summarize count = count() by name = street_name
name | count |
---|---|
Buckingham Palace | 1 |
London Bridge | 1 |
In case of invalid line, a null result will be returned.
let line = dynamic({"type":"LineString","coordinates":[[[0,0],[0,0]]]});
print isnull(geo_line_to_s2cells(line))
print_0 |
---|
True |
6.28 - geo_point_buffer()
Calculates polygon that contains all points within the given radius of the point on Earth.
Syntax
geo_point_buffer(
longitude,
latitude,
radius,
tolerance)
Parameters
Name | Type | Required | Description |
---|---|---|---|
longitude | real | ✔️ | Geospatial coordinate longitude value in degrees. Valid value is a real number and in the range [-180, +180]. |
latitude | real | ✔️ | Geospatial coordinate latitude value in degrees. Valid value is a real number and in the range [-90, +90]. |
radius | real | ✔️ | Buffer radius in meters. Valid value must be positive. |
tolerance | real | Defines the tolerance in meters that determines how much a polygon can deviate from the ideal radius. If unspecified, the default value 10 is used. Tolerance should be no lower than 0.0001% of the radius. Specifying tolerance bigger than radius lowers the tolerance to biggest possible value below the radius. |
Returns
Polygon around the input point. If the coordinates or radius or tolerance is invalid, the query produces a null result.
Examples
The following query calculates polygon around [-115.1745008278, 36.1497251277] coordinates, with 20km radius.
print buffer = geo_point_buffer(-115.1745008278, 36.1497251277, 20000)
buffer |
---|
{“type”: “Polygon”,“coordinates”: [ … ]} |
The following query calculates buffer around each point and unifies result
datatable(longitude:real, latitude:real, radius:real)
[
real(-80.3212217992616), 25.268683367546604, 5000,
real(-80.81717403605833), 24.82658441221962, 3000
]
| project buffer = geo_point_buffer(longitude, latitude, radius)
| summarize polygons = make_list(buffer)
| project result = geo_union_polygons_array(polygons)
result |
---|
{“type”: “MultiPolygon”,“coordinates”: [ … ]} |
The following example returns true, due to invalid point.
print result = isnull(geo_point_buffer(200, 1,0.1))
result |
---|
True |
The following example returns true, due to invalid radius.
print result = isnull(geo_point_buffer(10, 10, -1))
result |
---|
True |
6.29 - geo_point_in_circle()
Calculates whether the geospatial coordinates are inside a circle on Earth.
Syntax
geo_point_in_circle(
p_longitude,
p_latitude,
pc_longitude,
pc_latitude,
c_radius)
Parameters
Name | Type | Required | Description |
---|---|---|---|
p_longitude | real | ✔️ | Geospatial coordinate longitude value in degrees. Valid value is a real number and in the range [-180, +180]. |
p_latitude | real | ✔️ | Geospatial coordinate latitude value in degrees. Valid value is a real number and in the range [-90, +90]. |
pc_longitude | real | ✔️ | Circle center geospatial coordinate longitude value in degrees. Valid value is a real number and in the range [-180, +180]. |
pc_latitude | real | ✔️ | circle center geospatial coordinate latitude value in degrees. Valid value is a real number and in the range [-90, +90]. |
c_radius | real | ✔️ | Circle radius in meters. Valid value must be positive. |
Returns
Indicates whether the geospatial coordinates are inside a circle. If the coordinates or circle is invalid, the query produces a null result.
Examples
The following example finds all the places in the area defined by the following circle: Radius of 18 km, center at [-122.317404, 47.609119] coordinates.
datatable(longitude:real, latitude:real, place:string)
[
real(-122.317404), 47.609119, 'Seattle', // In circle
real(-123.497688), 47.458098, 'Olympic National Forest', // In exterior of circle
real(-122.201741), 47.677084, 'Kirkland', // In circle
real(-122.443663), 47.247092, 'Tacoma', // In exterior of circle
real(-122.121975), 47.671345, 'Redmond', // In circle
]
| where geo_point_in_circle(longitude, latitude, -122.317404, 47.609119, 18000)
| project place
Output
place |
---|
Seattle |
Kirkland |
Redmond |
The following example finds storm events in Orlando. The events are filtered by 100 km within Orlando coordinates, and aggregated by event type and hash.
StormEvents
| project BeginLon, BeginLat, EventType
| where geo_point_in_circle(BeginLon, BeginLat, real(-81.3891), 28.5346, 1000 * 100)
| summarize count() by EventType, hash = geo_point_to_s2cell(BeginLon, BeginLat)
| project geo_s2cell_to_central_point(hash), EventType, count_
| render piechart with (kind=map) // map pie rendering available in Kusto Explorer desktop
Output
The following example shows New York city taxi pickups within 10 meters of a particular location. Relevant pickups are aggregated by hash.
nyc_taxi
| project pickup_longitude, pickup_latitude
| where geo_point_in_circle( pickup_longitude, pickup_latitude, real(-73.9928), 40.7429, 10)
| summarize by hash = geo_point_to_s2cell(pickup_longitude, pickup_latitude, 22)
| project geo_s2cell_to_central_point(hash)
| render scatterchart with (kind = map)
Output
The following example returns true
.
print in_circle = geo_point_in_circle(-122.143564, 47.535677, -122.100896, 47.527351, 3500)
Output
in_circle |
---|
true |
The following example returns false
.
print in_circle = geo_point_in_circle(-122.137575, 47.630683, -122.100896, 47.527351, 3500)
Output
in_circle |
---|
false |
The following example returns a null result because of the invalid coordinate input.
print in_circle = geo_point_in_circle(200, 1, 1, 1, 1)
Output
in_circle |
---|
The following example returns a null result because of the invalid circle radius input.
print in_circle = geo_point_in_circle(1, 1, 1, 1, -1)
Output
in_circle |
---|
6.30 - geo_point_in_polygon()
Calculates whether the geospatial coordinates are inside a polygon or a multipolygon on Earth.
Syntax
geo_point_in_polygon(
longitude,
latitude,
polygon)
Parameters
Name | Type | Required | Description |
---|---|---|---|
longitude | real | ✔️ | Geospatial coordinate, longitude value in degrees. Valid value is a real number and in the range [-180, +180]. |
latitude | real | ✔️ | Geospatial coordinate, latitude value in degrees. Valid value is a real number and in the range [-90, +90]. |
polygon | dynamic | ✔️ | Polygon or multipolygon in the GeoJSON format. |
Returns
Indicates whether the geospatial coordinates are inside a polygon. If the coordinates or polygon is invalid, the query produces a null result.
Polygon definition and constraints
dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})
dynamic({“type”: “MultiPolygon”,“coordinates”: [[LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})
- LinearRingShell is required and defined as a
counterclockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell. - LinearRingHole is optional and defined as a
clockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes. - LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
- Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
- LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions, is chosen.
- LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices is chosen.
- LinearRings must not cross and must not share edges. LinearRings might share vertices.
- Polygon doesn’t necessarily contain its vertices. Point containment in polygon is defined so that if the Earth is subdivided into polygons, every point is contained by exactly one polygon.
Examples
The following example finds locations which fall within Manhattan island, excluding the area of Central Park.
datatable(longitude:real, latitude:real, description:string)
[
real(-73.985654), 40.748487, 'Empire State Building', // In Polygon
real(-73.963249), 40.779525, 'The Metropolitan Museum of Art', // In exterior of polygon
real(-73.874367), 40.777356, 'LaGuardia Airport', // In exterior of polygon
]
| where geo_point_in_polygon(longitude, latitude, dynamic({"type":"Polygon","coordinates":[[[-73.92597198486328,40.87821814104651],[-73.94691467285156,40.85069618625578],[-73.94691467285156,40.841865966890786],[-74.01008605957031,40.7519385984599],[-74.01866912841797,40.704586878965245],[-74.01214599609375,40.699901911003046],[-73.99772644042969,40.70875101828792],[-73.97747039794922,40.71083299030839],[-73.97026062011719,40.7290474687069],[-73.97506713867186,40.734510840309376],[-73.970947265625,40.74543623770158],[-73.94210815429688,40.77586181063573],[-73.9434814453125,40.78080140115127],[-73.92974853515625,40.79691751000055],[-73.93077850341797,40.804454347291006],[-73.93489837646484,40.80965166748853],[-73.93524169921875,40.837190668541105],[-73.92288208007812,40.85770758108904],[-73.9101791381836,40.871728144624974],[-73.92597198486328,40.87821814104651]],[[-73.95824432373047,40.80071852197889],[-73.98206233978271,40.76815921628347],[-73.97309303283691,40.76422632379533],[-73.94914627075195,40.796949998204596],[-73.95824432373047,40.80071852197889]]]}))
Output
longitude | latitude | description |
---|---|---|
-73.985654 | 40.748487 | Empire State Building |
The following example searches for coordinates in a multipolygon.
let multipolygon = dynamic({"type":"MultiPolygon","coordinates":[[[[-73.991460000000131,40.731738000000206],[-73.992854491775518,40.730082566051351],[-73.996772,40.725432000000154],[-73.997634685522883,40.725786309886963],[-74.002855946639244,40.728346630056791],[-74.001413,40.731065000000207],[-73.996796995070824,40.73736378205173],[-73.991724524037934,40.735245208931886],[-73.990703782359589,40.734781896080477],[-73.991460000000131,40.731738000000206]]],[[[-73.958357552055688,40.800369095633819],[-73.98143901556422,40.768762584141953],[-73.981548752788598,40.7685590292784],[-73.981565335901905,40.768307084720796],[-73.981754418060945,40.768399727738668],[-73.982038573548124,40.768387823012056],[-73.982268248204349,40.768298621883247],[-73.982384797518051,40.768097213086911],[-73.982320919746599,40.767894461792181],[-73.982155532845766,40.767756204474757],[-73.98238873834039,40.767411004834273],[-73.993650353659021,40.772145571634361],[-73.99415893763998,40.772493009137818],[-73.993831082030937,40.772931787850908],[-73.993891252437052,40.772955194876722],[-73.993962585514595,40.772944653908901],[-73.99401262480508,40.772882846631894],[-73.994122058082397,40.77292405902601],[-73.994136652588594,40.772901870174394],[-73.994301342391154,40.772970028663913],[-73.994281535134448,40.77299380206933],[-73.994376552751078,40.77303955110149],[-73.994294029824005,40.773156243992048],[-73.995023275860802,40.773481196576356],[-73.99508939189289,40.773388475039134],[-73.995013963716758,40.773358035426909],[-73.995050284699261,40.773297153189958],[-73.996240651898916,40.773789791397689],[-73.996195837470992,40.773852356184044],[-73.996098807369748,40.773951805299085],[-73.996179459973888,40.773986954351571],[-73.996095245226442,40.774086186437756],[-73.995572265161172,40.773870731394297],[-73.994017424135961,40.77321375261053],[-73.993935876811335,40.773179512586211],[-73.993861942928888,40.773269531698837],[-73.993822393527211,40.773381758622882],[-73.993767019318497,40.773483981224835],[-73.993698463744295,40.773562141052594],[-73.993358326468751,40.773926888327956],[-73.992622663865575,40.774974056037109],[-73.992577842766124,40.774956016359418],[-73.992527743951555,40.775002110439829],[-73.992469745815342,40.775024159551755],[-73.992403837191887,40.775018140390664],[-73.99226708903538,40.775116033858794],[-73.99217809026365,40.775279293897171],[-73.992059084937338,40.775497598192516],[-73.992125372394938,40.775509075053385],[-73.992226867797001,40.775482211026116],[-73.992329346608813,40.775468900958522],[-73.992361756801131,40.775501899766638],[-73.992386042960277,40.775557180424634],[-73.992087684712729,40.775983970821372],[-73.990927174149746,40.777566878763238],[-73.99039616003671,40.777585065679204],[-73.989461267506471,40.778875124584417],[-73.989175778438053,40.779287524015778],[-73.988868617400072,40.779692922911607],[-73.988871874499793,40.779713738253008],[-73.989219022880576,40.779697895209402],[-73.98927785904425,40.779723439271038],[-73.989409054180143,40.779737706471963],[-73.989498614927044,40.779725044389757],[-73.989596493388234,40.779698146683387],[-73.989679812902509,40.779677568658038],[-73.989752702937935,40.779671244211556],[-73.989842247806507,40.779680752670664],[-73.990040102120489,40.779707677698219],[-73.990137977524839,40.779699769704784],[-73.99033584033225,40.779661794394983],[-73.990430598697046,40.779664973055503],[-73.990622199396725,40.779676064914298],[-73.990745069505479,40.779671328184051],[-73.990872114282197,40.779646007643876],[-73.990961672224358,40.779639683751753],[-73.991057472829539,40.779652352625774],[-73.991157429497036,40.779669775606465],[-73.991242817404469,40.779671367084504],[-73.991255318289745,40.779650782516491],[-73.991294887120119,40.779630209208889],[-73.991321967649895,40.779631796041372],[-73.991359455569423,40.779585883337383],[-73.991551059227476,40.779574821437407],[-73.99141982585985,40.779755280287233],[-73.988886144117032,40.779878898532999],[-73.988939656706265,40.779956178440393],[-73.988926103530844,40.780059292013632],[-73.988911680264692,40.780096037146606],[-73.988919261468567,40.780226094343945],[-73.988381050202634,40.780981074045783],[-73.988232413846987,40.781233144215555],[-73.988210420831663,40.781225482542055],[-73.988140000000143,40.781409000000224],[-73.988041288067166,40.781585961353777],[-73.98810029382463,40.781602878305286],[-73.988076449145055,40.781650935001608],[-73.988018059972219,40.781634188810422],[-73.987960792842145,40.781770987031535],[-73.985465811970457,40.785360700575431],[-73.986172704965611,40.786068452258647],[-73.986455862401996,40.785919219081421],[-73.987072345615601,40.785189638820121],[-73.98711901394276,40.785210319004058],[-73.986497781023601,40.785951202887254],[-73.986164628806279,40.786121882448327],[-73.986128422486075,40.786239001331111],[-73.986071135219746,40.786240706026611],[-73.986027274789123,40.786228964236727],[-73.986097637849426,40.78605822569795],[-73.985429321269592,40.785413942184597],[-73.985081137732209,40.785921935110366],[-73.985198833254501,40.785966552197777],[-73.985170502389906,40.78601333415817],[-73.985216218673656,40.786030501816427],[-73.98525509797993,40.785976205511588],[-73.98524273937646,40.785972572653328],[-73.98524962933017,40.785963139855845],[-73.985281779186749,40.785978620950075],[-73.985240032884533,40.786035858136792],[-73.985683885242182,40.786222123919686],[-73.985717529004575,40.786175994668795],[-73.985765660297687,40.786196274858618],[-73.985682871922691,40.786309786213067],[-73.985636270930442,40.786290150649279],[-73.985670722564691,40.786242911993817],[-73.98520511880038,40.786047669212785],[-73.985211035607492,40.786039554883686],[-73.985162639946992,40.786020999769754],[-73.985131636312062,40.786060297019972],[-73.985016964065125,40.78601423719563],[-73.984655078830457,40.786534741807841],[-73.985743787901043,40.786570082854738],[-73.98589227228328,40.786426529019593],[-73.985942854994988,40.786452847880334],[-73.985949561556794,40.78648711396653],[-73.985812373526713,40.786616865357047],[-73.985135209703174,40.78658761889551],[-73.984619428584324,40.786586016349787],[-73.981952458164173,40.790393724337193],[-73.972823037363767,40.803428052816756],[-73.971036786332192,40.805918478839672],[-73.966701,40.804169000000186],[-73.959647,40.801156000000113],[-73.958508540159471,40.800682279767472],[-73.95853274080838,40.800491362464697],[-73.958357552055688,40.800369095633819]]],[[[-73.943592454622546,40.782747908206574],[-73.943648235390199,40.782656161333449],[-73.943870759887162,40.781273026571704],[-73.94345932494096,40.780048275653243],[-73.943213862652243,40.779317588660199],[-73.943004239504688,40.779639495474292],[-73.942716005450905,40.779544169476175],[-73.942712374762181,40.779214856940001],[-73.942535563208608,40.779090956062532],[-73.942893408188027,40.778614093246276],[-73.942438481745029,40.777315235766039],[-73.942244919522594,40.777104088947254],[-73.942074188038887,40.776917846977142],[-73.942002667222781,40.776185317382648],[-73.942620205199006,40.775180871576474],[-73.94285645694552,40.774796600349191],[-73.94293043781397,40.774676268036011],[-73.945870899588215,40.771692257932997],[-73.946618690150586,40.77093339256956],[-73.948664164778933,40.768857624399587],[-73.950069793030679,40.767025088383498],[-73.954418260786071,40.762184104951245],[-73.95650786241211,40.760285256574043],[-73.958787773424007,40.758213471309809],[-73.973015157270069,40.764278692864671],[-73.955760332998182,40.787906554459667],[-73.944023,40.782960000000301],[-73.943592454622546,40.782747908206574]]]]});
let coordinates =
datatable(longitude:real, latitude:real, description:string)
[
real(-73.9741), 40.7914, 'Upper West Side', // In MultiPolygon
real(-73.9950), 40.7340, 'Greenwich Village', // In MultiPolygon
real(-73.8743), 40.7773, 'LaGuardia Airport', // In exterior of MultiPolygon
];
coordinates
| where geo_point_in_polygon(longitude, latitude, multipolygon)
Output
longitude | latitude | description |
---|---|---|
-73.9741 | 40.7914 | Upper West Side |
-73.995 | 40.734 | Greenwich Village |
The following example finds storm events in California. The events are filtered by a California state polygon and aggregated by event type and hash.
let california = dynamic({"type":"Polygon","coordinates":[[[-123.233256,42.006186],[-122.378853,42.011663],[-121.037003,41.995232],[-120.001861,41.995232],[-119.996384,40.264519],[-120.001861,38.999346],[-118.71478,38.101128],[-117.498899,37.21934],[-116.540435,36.501861],[-115.85034,35.970598],[-114.634459,35.00118],[-114.634459,34.87521],[-114.470151,34.710902],[-114.333228,34.448009],[-114.136058,34.305608],[-114.256551,34.174162],[-114.415382,34.108438],[-114.535874,33.933176],[-114.497536,33.697668],[-114.524921,33.54979],[-114.727567,33.40739],[-114.661844,33.034958],[-114.524921,33.029481],[-114.470151,32.843265],[-114.524921,32.755634],[-114.72209,32.717295],[-116.04751,32.624187],[-117.126467,32.536556],[-117.24696,32.668003],[-117.252437,32.876127],[-117.329114,33.122589],[-117.471515,33.297851],[-117.7837,33.538836],[-118.183517,33.763391],[-118.260194,33.703145],[-118.413548,33.741483],[-118.391641,33.840068],[-118.566903,34.042715],[-118.802411,33.998899],[-119.218659,34.146777],[-119.278905,34.26727],[-119.558229,34.415147],[-119.875891,34.40967],[-120.138784,34.475393],[-120.472878,34.448009],[-120.64814,34.579455],[-120.609801,34.858779],[-120.670048,34.902595],[-120.631709,35.099764],[-120.894602,35.247642],[-120.905556,35.450289],[-121.004141,35.461243],[-121.168449,35.636505],[-121.283465,35.674843],[-121.332757,35.784382],[-121.716143,36.195153],[-121.896882,36.315645],[-121.935221,36.638785],[-121.858544,36.6114],[-121.787344,36.803093],[-121.929744,36.978355],[-122.105006,36.956447],[-122.335038,37.115279],[-122.417192,37.241248],[-122.400761,37.361741],[-122.515777,37.520572],[-122.515777,37.783465],[-122.329561,37.783465],[-122.406238,38.15042],[-122.488392,38.112082],[-122.504823,37.931343],[-122.701993,37.893004],[-122.937501,38.029928],[-122.97584,38.265436],[-123.129194,38.451652],[-123.331841,38.566668],[-123.44138,38.698114],[-123.737134,38.95553],[-123.687842,39.032208],[-123.824765,39.366301],[-123.764519,39.552517],[-123.85215,39.831841],[-124.109566,40.105688],[-124.361506,40.259042],[-124.410798,40.439781],[-124.158859,40.877937],[-124.109566,41.025814],[-124.158859,41.14083],[-124.065751,41.442061],[-124.147905,41.715908],[-124.257444,41.781632],[-124.213628,42.000709],[-123.233256,42.006186]]]});
StormEvents
| project BeginLon, BeginLat, EventType
| where geo_point_in_polygon(BeginLon, BeginLat, california)
| summarize count() by EventType, hash = geo_point_to_s2cell(BeginLon, BeginLat, 7)
| project geo_s2cell_to_central_point(hash), EventType, count_
| render piechart with (kind=map) // map rendering available in Kusto Explorer desktop
Output
The following example shows how to classify coordinates to polygons using the partition operator.
let Polygons = datatable(description:string, polygon:dynamic)
[
"New York city area", dynamic({"type":"Polygon","coordinates":[[[-73.85009765625,40.85744791303121],[-74.16046142578125,40.84290487729676],[-74.190673828125,40.59935608796518],[-73.83087158203125,40.61812224225511],[-73.85009765625,40.85744791303121]]]}),
"Seattle area", dynamic({"type":"Polygon","coordinates":[[[-122.200927734375,47.68573021131587],[-122.4591064453125,47.68573021131587],[-122.4755859375,47.468949677672484],[-122.17620849609374,47.47266286861342],[-122.200927734375,47.68573021131587]]]}),
"Las Vegas", dynamic({"type":"Polygon","coordinates":[[[-114.9,36.36],[-115.4498291015625,36.33282808737917],[-115.4498291015625,35.84453450421662],[-114.949951171875,35.902399875143615],[-114.9,36.36]]]}),
];
let Locations = datatable(longitude:real, latitude:real)
[
real(-73.95), real(40.75), // Somewhere in New York
real(-122.3), real(47.6), // Somewhere in Seattle
real(-115.18), real(36.16) // Somewhere in Las Vegas
];
Polygons
| project polygonPartition = tostring(pack("description", description, "polygon", polygon))
| partition hint.materialized=true hint.strategy=native by polygonPartition
{
Locations
| extend description = parse_json(toscalar(polygonPartition)).description
| extend polygon = parse_json(toscalar(polygonPartition)).polygon
| where geo_point_in_polygon(longitude, latitude, polygon)
| project-away polygon
}
Output
longitude | latitude | description |
---|---|---|
-73.95 | 40.75 | New York city area |
-122.3 | 47.6 | Seattle area |
-115.18 | 36.16 | Las Vegas |
See also geo_polygon_to_s2cells().
The following example folds several polygons into one multipolygon and checks locations that fall within the multipolygon.
let Polygons =
datatable(polygon:dynamic)
[
dynamic({"type":"Polygon","coordinates":[[[-73.991460000000131,40.731738000000206],[-73.992854491775518,40.730082566051351],[-73.996772,40.725432000000154],[-73.997634685522883,40.725786309886963],[-74.002855946639244,40.728346630056791],[-74.001413,40.731065000000207],[-73.996796995070824,40.73736378205173],[-73.991724524037934,40.735245208931886],[-73.990703782359589,40.734781896080477],[-73.991460000000131,40.731738000000206]]]}),
dynamic({"type":"Polygon","coordinates":[[[-73.958357552055688,40.800369095633819],[-73.98143901556422,40.768762584141953],[-73.981548752788598,40.7685590292784],[-73.981565335901905,40.768307084720796],[-73.981754418060945,40.768399727738668],[-73.982038573548124,40.768387823012056],[-73.982268248204349,40.768298621883247],[-73.982384797518051,40.768097213086911],[-73.982320919746599,40.767894461792181],[-73.982155532845766,40.767756204474757],[-73.98238873834039,40.767411004834273],[-73.993650353659021,40.772145571634361],[-73.99415893763998,40.772493009137818],[-73.993831082030937,40.772931787850908],[-73.993891252437052,40.772955194876722],[-73.993962585514595,40.772944653908901],[-73.99401262480508,40.772882846631894],[-73.994122058082397,40.77292405902601],[-73.994136652588594,40.772901870174394],[-73.994301342391154,40.772970028663913],[-73.994281535134448,40.77299380206933],[-73.994376552751078,40.77303955110149],[-73.994294029824005,40.773156243992048],[-73.995023275860802,40.773481196576356],[-73.99508939189289,40.773388475039134],[-73.995013963716758,40.773358035426909],[-73.995050284699261,40.773297153189958],[-73.996240651898916,40.773789791397689],[-73.996195837470992,40.773852356184044],[-73.996098807369748,40.773951805299085],[-73.996179459973888,40.773986954351571],[-73.996095245226442,40.774086186437756],[-73.995572265161172,40.773870731394297],[-73.994017424135961,40.77321375261053],[-73.993935876811335,40.773179512586211],[-73.993861942928888,40.773269531698837],[-73.993822393527211,40.773381758622882],[-73.993767019318497,40.773483981224835],[-73.993698463744295,40.773562141052594],[-73.993358326468751,40.773926888327956],[-73.992622663865575,40.774974056037109],[-73.992577842766124,40.774956016359418],[-73.992527743951555,40.775002110439829],[-73.992469745815342,40.775024159551755],[-73.992403837191887,40.775018140390664],[-73.99226708903538,40.775116033858794],[-73.99217809026365,40.775279293897171],[-73.992059084937338,40.775497598192516],[-73.992125372394938,40.775509075053385],[-73.992226867797001,40.775482211026116],[-73.992329346608813,40.775468900958522],[-73.992361756801131,40.775501899766638],[-73.992386042960277,40.775557180424634],[-73.992087684712729,40.775983970821372],[-73.990927174149746,40.777566878763238],[-73.99039616003671,40.777585065679204],[-73.989461267506471,40.778875124584417],[-73.989175778438053,40.779287524015778],[-73.988868617400072,40.779692922911607],[-73.988871874499793,40.779713738253008],[-73.989219022880576,40.779697895209402],[-73.98927785904425,40.779723439271038],[-73.989409054180143,40.779737706471963],[-73.989498614927044,40.779725044389757],[-73.989596493388234,40.779698146683387],[-73.989679812902509,40.779677568658038],[-73.989752702937935,40.779671244211556],[-73.989842247806507,40.779680752670664],[-73.990040102120489,40.779707677698219],[-73.990137977524839,40.779699769704784],[-73.99033584033225,40.779661794394983],[-73.990430598697046,40.779664973055503],[-73.990622199396725,40.779676064914298],[-73.990745069505479,40.779671328184051],[-73.990872114282197,40.779646007643876],[-73.990961672224358,40.779639683751753],[-73.991057472829539,40.779652352625774],[-73.991157429497036,40.779669775606465],[-73.991242817404469,40.779671367084504],[-73.991255318289745,40.779650782516491],[-73.991294887120119,40.779630209208889],[-73.991321967649895,40.779631796041372],[-73.991359455569423,40.779585883337383],[-73.991551059227476,40.779574821437407],[-73.99141982585985,40.779755280287233],[-73.988886144117032,40.779878898532999],[-73.988939656706265,40.779956178440393],[-73.988926103530844,40.780059292013632],[-73.988911680264692,40.780096037146606],[-73.988919261468567,40.780226094343945],[-73.988381050202634,40.780981074045783],[-73.988232413846987,40.781233144215555],[-73.988210420831663,40.781225482542055],[-73.988140000000143,40.781409000000224],[-73.988041288067166,40.781585961353777],[-73.98810029382463,40.781602878305286],[-73.988076449145055,40.781650935001608],[-73.988018059972219,40.781634188810422],[-73.987960792842145,40.781770987031535],[-73.985465811970457,40.785360700575431],[-73.986172704965611,40.786068452258647],[-73.986455862401996,40.785919219081421],[-73.987072345615601,40.785189638820121],[-73.98711901394276,40.785210319004058],[-73.986497781023601,40.785951202887254],[-73.986164628806279,40.786121882448327],[-73.986128422486075,40.786239001331111],[-73.986071135219746,40.786240706026611],[-73.986027274789123,40.786228964236727],[-73.986097637849426,40.78605822569795],[-73.985429321269592,40.785413942184597],[-73.985081137732209,40.785921935110366],[-73.985198833254501,40.785966552197777],[-73.985170502389906,40.78601333415817],[-73.985216218673656,40.786030501816427],[-73.98525509797993,40.785976205511588],[-73.98524273937646,40.785972572653328],[-73.98524962933017,40.785963139855845],[-73.985281779186749,40.785978620950075],[-73.985240032884533,40.786035858136792],[-73.985683885242182,40.786222123919686],[-73.985717529004575,40.786175994668795],[-73.985765660297687,40.786196274858618],[-73.985682871922691,40.786309786213067],[-73.985636270930442,40.786290150649279],[-73.985670722564691,40.786242911993817],[-73.98520511880038,40.786047669212785],[-73.985211035607492,40.786039554883686],[-73.985162639946992,40.786020999769754],[-73.985131636312062,40.786060297019972],[-73.985016964065125,40.78601423719563],[-73.984655078830457,40.786534741807841],[-73.985743787901043,40.786570082854738],[-73.98589227228328,40.786426529019593],[-73.985942854994988,40.786452847880334],[-73.985949561556794,40.78648711396653],[-73.985812373526713,40.786616865357047],[-73.985135209703174,40.78658761889551],[-73.984619428584324,40.786586016349787],[-73.981952458164173,40.790393724337193],[-73.972823037363767,40.803428052816756],[-73.971036786332192,40.805918478839672],[-73.966701,40.804169000000186],[-73.959647,40.801156000000113],[-73.958508540159471,40.800682279767472],[-73.95853274080838,40.800491362464697],[-73.958357552055688,40.800369095633819]]]}),
dynamic({"type":"Polygon","coordinates":[[[-73.943592454622546,40.782747908206574],[-73.943648235390199,40.782656161333449],[-73.943870759887162,40.781273026571704],[-73.94345932494096,40.780048275653243],[-73.943213862652243,40.779317588660199],[-73.943004239504688,40.779639495474292],[-73.942716005450905,40.779544169476175],[-73.942712374762181,40.779214856940001],[-73.942535563208608,40.779090956062532],[-73.942893408188027,40.778614093246276],[-73.942438481745029,40.777315235766039],[-73.942244919522594,40.777104088947254],[-73.942074188038887,40.776917846977142],[-73.942002667222781,40.776185317382648],[-73.942620205199006,40.775180871576474],[-73.94285645694552,40.774796600349191],[-73.94293043781397,40.774676268036011],[-73.945870899588215,40.771692257932997],[-73.946618690150586,40.77093339256956],[-73.948664164778933,40.768857624399587],[-73.950069793030679,40.767025088383498],[-73.954418260786071,40.762184104951245],[-73.95650786241211,40.760285256574043],[-73.958787773424007,40.758213471309809],[-73.973015157270069,40.764278692864671],[-73.955760332998182,40.787906554459667],[-73.944023,40.782960000000301],[-73.943592454622546,40.782747908206574]]]}),
];
let Coordinates =
datatable(longitude:real, latitude:real, description:string)
[
real(-73.9741), 40.7914, 'Upper West Side',
real(-73.9950), 40.7340, 'Greenwich Village',
real(-73.8743), 40.7773, 'LaGuardia Airport',
];
let multipolygon = toscalar(
Polygons
| project individual_polygon = pack_array(polygon.coordinates)
| summarize multipolygon_coordinates = make_list(individual_polygon)
| project multipolygon = bag_pack("type","MultiPolygon", "coordinates", multipolygon_coordinates));
Coordinates
| where geo_point_in_polygon(longitude, latitude, multipolygon)
Output
longitude | latitude | description |
---|---|---|
-73.9741 | 40.7914 | Upper West Side |
-73.995 | 40.734 | Greenwich Village |
The following example returns a null result because of the invalid coordinate input.
print in_polygon = geo_point_in_polygon(200,1,dynamic({"type": "Polygon","coordinates": [[[0,0],[10,10],[10,1],[0,0]]]}))
Output
in_polygon |
---|
The following example returns a null result because of the invalid polygon input.
print in_polygon = geo_point_in_polygon(1,1,dynamic({"type": "Polygon","coordinates": [[[0,0],[10,10],[10,10],[0,0]]]}))
Output
in_polygon |
---|
6.31 - geo_point_to_geohash()
Calculates the geohash string value of a geographic location.
Read more about geohash.
Syntax
geo_point_to_geohash(
longitude,
latitude,
[ accuracy ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
longitude | real | ✔️ | Geospatial coordinate, longitude value in degrees. Valid value is a real number and in the range [-180, +180]. |
latitude | real | ✔️ | Geospatial coordinate, latitude value in degrees. Valid value is a real number and in the range [-90, +90]. |
accuracy | int | Defines the requested accuracy. Supported values are in the range [1, 18]. If unspecified, the default value 5 is used. |
Returns
The geohash string value of a given geographic location with requested accuracy length. If the coordinate or accuracy is invalid, the query produces an empty result.
Geohash rectangular area coverage per accuracy value:
Accuracy | Width | Height |
---|---|---|
1 | 5000 km | 5000 km |
2 | 1250 km | 625 km |
3 | 156.25 km | 156.25 km |
4 | 39.06 km | 19.53 km |
5 | 4.88 km | 4.88 km |
6 | 1.22 km | 0.61 km |
7 | 152.59 m | 152.59 m |
8 | 38.15 m | 19.07 m |
9 | 4.77 m | 4.77 m |
10 | 1.19 m | 0.59 m |
11 | 149.01 mm | 149.01 mm |
12 | 37.25 mm | 18.63 mm |
13 | 4.66 mm | 4.66 mm |
14 | 1.16 mm | 0.58 mm |
15 | 145.52 μ | 145.52 μ |
16 | 36.28 μ | 18.19 μ |
17 | 4.55 μ | 4.55 μ |
18 | 1.14 μ | 0.57 μ |
See also geo_point_to_s2cell(), geo_point_to_h3cell().
Examples
The following example finds US storm events aggregated by geohash.
StormEvents
| project BeginLon, BeginLat
| summarize by hash=geo_point_to_geohash(BeginLon, BeginLat, 3)
| project geo_geohash_to_central_point(hash)
| render scatterchart with (kind=map)
Output
The following example calculates and returns the geohash string value.
print geohash = geo_point_to_geohash(-80.195829, 25.802215, 8)
Output
geohash |
---|
dhwfz15h |
The following example finds groups of coordinates. Every pair of coordinates in the group resides in a rectangular area of 4.88 km by 4.88 km.
datatable(location_id:string, longitude:real, latitude:real)
[
"A", double(-122.303404), 47.570482,
"B", double(-122.304745), 47.567052,
"C", double(-122.278156), 47.566936,
]
| summarize count = count(), // items per group count
locations = make_list(location_id) // items in the group
by geohash = geo_point_to_geohash(longitude, latitude) // geohash of the group
Output
geohash | count | locations |
---|---|---|
c23n8 | 2 | [“A”, “B”] |
c23n9 | 1 | [“C”] |
The following example produces an empty result because of the invalid coordinate input.
print geohash = geo_point_to_geohash(200,1,8)
Output
geohash |
---|
The following example produces an empty result because of the invalid accuracy input.
print geohash = geo_point_to_geohash(1,1,int(null))
Output
geohash |
---|
6.32 - geo_point_to_h3cell()
Calculates the H3 Cell token string value of a geographic location.
Read more about H3 Cell.
Syntax
geo_point_to_h3cell(
longitude,
latitude,
[ resolution ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
longitude | real | ✔️ | Geospatial coordinate, longitude value in degrees. Valid value is a real number and in the range [-180, +180]. |
latitude | real | ✔️ | Geospatial coordinate, latitude value in degrees. Valid value is a real number and in the range [-90, +90]. |
resolution | int | Defines the requested cell resolution. Supported values are in the range [0, 15]. If unspecified, the default value 6 is used. |
Returns
The H3 Cell token string value of a given geographic location. If the coordinates or levels are invalid, the query will produce an empty result.
H3 Cell approximate area coverage per resolution value
Level | Average Hexagon Edge Length |
---|---|
0 | 1108 km |
1 | 419 km |
2 | 158 km |
3 | 60 km |
4 | 23 km |
5 | 8 km |
6 | 3 km |
7 | 1 km |
8 | 460 m |
9 | 174 m |
10 | 66 m |
11 | 25 m |
12 | 9 m |
13 | 3 m |
14 | 1 m |
15 | 0.5 m |
The table source can be found in this H3 Cell statistical resource.
See also geo_point_to_s2cell(), geo_point_to_geohash().
Examples
print h3cell = geo_point_to_h3cell(-74.04450446039874, 40.689250859314974, 6)
Output
h3cell |
---|
862a1072fffffff |
The following example finds groups of coordinates. Every pair of coordinates in the group resides in the H3 Cell with average hexagon area of 253 km².
datatable(location_id:string, longitude:real, latitude:real)
[
"A", -73.956683, 40.807907,
"B", -73.916869, 40.818314,
"C", -73.989148, 40.743273,
]
| summarize count = count(), // Items per group count
locations = make_list(location_id) // Items in the group
by h3cell = geo_point_to_h3cell(longitude, latitude, 5) // H3 Cell of the group
Output
h3cell | count | locations |
---|---|---|
852a100bfffffff | 2 | [ “A”, “B” ] |
852a1073fffffff | 1 | [ “C” ] |
The following example produces an empty result because of the invalid coordinate input.
print h3cell = geo_point_to_h3cell(300,1,8)
Output
h3cell |
---|
The following example produces an empty result because of the invalid level input.
print h3cell = geo_point_to_h3cell(1,1,16)
Output
h3cell |
---|
The following example produces an empty result because of the invalid level input.
print h3cell = geo_point_to_h3cell(1,1,int(null))
Output
h3cell |
---|
6.33 - geo_point_to_s2cell()
Calculates the S2 cell token string value of a geographic location.
Read more about S2 cell hierarchy. S2 cell can be a useful geospatial clustering tool. An S2 cell is a cell on a spherical surface and it has geodesic edges. S2 cells are part of a hierarchy dividing up the Earth’s surface. They have a maximum of 31 levels, ranging from zero to 30, which define the number of times a cell is subdivided. Levels range from the largest coverage on level zero with area coverage of 85,011,012.19km², to the lowest coverage of 0.44 cm² at level 30. As S2 cells are subdivided at higher levels, the cell center is preserved well. Two geographic locations can be very close to each other but they have different S2 cell tokens.
Read more about S2 cell hierarchy.
Syntax
geo_point_to_s2cell(
longitude,
latitude,
[ level ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
longitude | real | ✔️ | Geospatial coordinate, longitude value in degrees. Valid value is a real number and in the range [-180, +180]. |
latitude | real | ✔️ | Geospatial coordinate, latitude value in degrees. Valid value is a real number and in the range [-90, +90]. |
level | int | Defines the requested cell level. Supported values are in the range [0, 30]. If unspecified, the default value 11 is used. |
Returns
The S2 cell token string value of a given geographic location. If the coordinates or levels are invalid, the query produces an empty result.
S2 cell approximate area coverage per level value
For every level, the size of the S2 cell is similar but not exactly equal. Nearby cell sizes tend to be more equal.
Level | Minimum random cell edge length (UK) | Maximum random cell edge length (US) |
---|---|---|
0 | 7842 km | 7842 km |
1 | 3921 km | 5004 km |
2 | 1825 km | 2489 km |
3 | 840 km | 1310 km |
4 | 432 km | 636 km |
5 | 210 km | 315 km |
6 | 108 km | 156 km |
7 | 54 km | 78 km |
8 | 27 km | 39 km |
9 | 14 km | 20 km |
10 | 7 km | 10 km |
11 | 3 km | 5 km |
12 | 1699 m | 2 km |
13 | 850 m | 1225 m |
14 | 425 m | 613 m |
15 | 212 m | 306 m |
16 | 106 m | 153 m |
17 | 53 m | 77 m |
18 | 27 m | 38 m |
19 | 13 m | 19 m |
20 | 7 m | 10 m |
21 | 3 m | 5 m |
22 | 166 cm | 2 m |
23 | 83 cm | 120 cm |
24 | 41 cm | 60 cm |
25 | 21 cm | 30 cm |
26 | 10 cm | 15 cm |
27 | 5 cm | 7 cm |
28 | 2 cm | 4 cm |
29 | 12 mm | 18 mm |
30 | 6 mm | 9 mm |
The table source can be found in this S2 Cell statistical resource.
Examples
US storm events aggregated by S2 cell
The following example finds US storm events aggregated by S2 cells.
StormEvents
| project BeginLon, BeginLat
| summarize by hash=geo_point_to_s2cell(BeginLon, BeginLat, 5)
| project geo_s2cell_to_central_point(hash)
| render scatterchart with (kind=map)
Output
The following example calculates the S2 cell ID.
print s2cell = geo_point_to_s2cell(-80.195829, 25.802215, 8)
Output
s2cell |
---|
88d9b |
Find a group of coordinates
The following example finds groups of coordinates. Every pair of coordinates in the group resides in the S2 cell with a maximum area of 1632.45 km².
datatable(location_id:string, longitude:real, latitude:real)
[
"A", 10.1234, 53,
"B", 10.3579, 53,
"C", 10.6842, 53,
]
| summarize count = count(), // items per group count
locations = make_list(location_id) // items in the group
by s2cell = geo_point_to_s2cell(longitude, latitude, 8) // s2 cell of the group
Output
s2cell | count | locations |
---|---|---|
47b1d | 2 | [“A”,“B”] |
47ae3 | 1 | [“C”] |
Empty results
The following example produces an empty result because of the invalid coordinate input.
print s2cell = geo_point_to_s2cell(300,1,8)
Output
s2cell |
---|
The following example produces an empty result because of the invalid level input.
print s2cell = geo_point_to_s2cell(1,1,35)
Output
s2cell |
---|
The following example produces an empty result because of the invalid level input.
print s2cell = geo_point_to_s2cell(1,1,int(null))
Output
s2cell |
---|
Related content
6.34 - geo_polygon_area()
Calculates the area of a polygon or a multipolygon on Earth.
Syntax
geo_polygon_area(
polygon)
Parameters
Name | Type | Required | Description |
---|---|---|---|
polygon | dynamic | ✔️ | Polygon or multipolygon in the GeoJSON format. |
Returns
The area of a polygon or a multipolygon, in square meters, on Earth. If the polygon or the multipolygon is invalid, the query will produce a null result.
Polygon definition and constraints
dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})
dynamic({“type”: “MultiPolygon”,“coordinates”: [[ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})
- LinearRingShell is required and defined as a
counterclockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell. - LinearRingHole is optional and defined as a
clockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes. - LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
- Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
- LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
- LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
- LinearRings must not cross and must not share edges. LinearRings may share vertices.
Examples
The following example calculates NYC Central Park area.
let central_park = dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]]});
print area = geo_polygon_area(central_park)
Output
area |
---|
3475207.28346606 |
The following example performs union of polygons in multipolygon and calculates area on the unified polygon.
let polygons = dynamic({"type":"MultiPolygon","coordinates":[[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]],[[[-73.94262313842773,40.775991804565585],[-73.98107528686523,40.791849155467695],[-73.99600982666016,40.77092185281977],[-73.96150588989258,40.75609977566361],[-73.94262313842773,40.775991804565585]]]]});
print polygons_union_area = geo_polygon_area(polygons)
Output
polygons_union_area |
---|
10889971.5343487 |
The following example calculates top 5 biggest US states by area.
US_States
| project name = features.properties.NAME, polygon = geo_polygon_densify(features.geometry)
| project name, area = geo_polygon_area(polygon)
| top 5 by area desc
Output
name | area |
---|---|
Alaska | 1550934810070.61 |
Texas | 693231378868.483 |
California | 410339536449.521 |
Montana | 379583933973.436 |
New Mexico | 314979912310.579 |
The following example returns True because of the invalid polygon.
print isnull(geo_polygon_area(dynamic({"type": "Polygon","coordinates": [[[0,0],[10,10],[10,10],[0,0]]]})))
Output
print_0 |
---|
True |
6.35 - geo_polygon_buffer()
Calculates polygon or multipolygon that contains all points within the given radius of the input polygon or multipolygon on Earth.
Syntax
geo_polygon_buffer(
polygon,
radius,
tolerance)
Parameters
Name | Type | Required | Description |
---|---|---|---|
polygon | dynamic | ✔️ | Polygon or multipolygon in the GeoJSON format. |
radius | real | ✔️ | Buffer radius in meters. Valid value must be positive. |
tolerance | real | Defines the tolerance in meters that determines how much a polygon can deviate from the ideal radius. If unspecified, the default value 10 is used. Tolerance should be no lower than 0.0001% of the radius. Specifying tolerance bigger than radius will lower the tolerance to biggest possible value below the radius. |
Returns
Polygon or MultiPolygon around the input Polygon or multipolygon. If the coordinates or radius or tolerance is invalid, the query will produce a null result.
Polygon definition and constraints
dynamic({“type”: “Polygon”,“coordinates”: [LinearRingShell, LinearRingHole_1, …, LinearRingHole_N]})
dynamic({“type”: “MultiPolygon”,“coordinates”: [[LinearRingShell, LinearRingHole_1, …, LinearRingHole_N], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})
- LinearRingShell is required and defined as a
counterclockwise
ordered array of coordinates [[lng_1,lat_1], …, [lng_i,lat_i], …,[lng_j,lat_j], …,[lng_1,lat_1]]. There can be only one shell. - LinearRingHole is optional and defined as a
clockwise
ordered array of coordinates [[lng_1,lat_1], …,[lng_i,lat_i], …,[lng_j,lat_j], …,[lng_1,lat_1]]. There can be any number of interior rings and holes. - LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
- Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
- LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
- LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
- LinearRings must not cross and must not share edges. LinearRings may share vertices.
- Polygon contains its vertices.
Examples
The following query calculates polygon around input polygon, with radius of 10km.
let polygon = dynamic({"type":"Polygon","coordinates":[[[139.813757,35.719666],[139.72558,35.71813],[139.727471,35.653231],[139.818721,35.657264],[139.813757,35.719666]]]});
print buffer = geo_polygon_buffer(polygon, 10000)
buffer |
---|
{“type”: “Polygon”,“coordinates”: [ … ]} |
The following query calculates buffer around each polygon and unifies result
datatable(polygon:dynamic, radius:real )
[
dynamic({"type":"Polygon","coordinates":[[[12.451218693639277,41.906457003556625],[12.445753852969375,41.90160968881543],[12.453514425793855,41.90361551885886],[12.451218693639277,41.906457003556625]]]}), 100,
dynamic({"type":"Polygon","coordinates":[[[12.4566086734784,41.905119850039995],[12.453913683559591,41.903652663265234],[12.455485761012113,41.90146110630562],[12.4566086734784,41.905119850039995]]]}), 20
]
| project buffer = geo_polygon_buffer(polygon, radius)
| summarize polygons = make_list(buffer)
| project result = geo_union_polygons_array(polygons)
result |
---|
{“type”: “Polygon”,“coordinates”: [ … ]} |
The following example will return true, due to invalid polygon.
print buffer = isnull(geo_polygon_buffer(dynamic({"type":"p"}), 1))
buffer |
---|
True |
The following example will return true, due to invalid radius.
print buffer = isnull(geo_polygon_buffer(dynamic({"type":"Polygon","coordinates":[[[10,10],[0,10],[0,0],[10,10]]]}), 0))
buffer |
---|
True |
6.36 - geo_polygon_centroid()
Calculates the centroid of a polygon or a multipolygon on Earth.
Syntax
geo_polygon_centroid(
polygon)
Parameters
Name | Type | Required | Description |
---|---|---|---|
polygon | dynamic | ✔️ | Polygon or multipolygon in the GeoJSON format. |
Returns
The centroid coordinate values in GeoJSON Format and of a dynamic data type. If polygon or multipolygon are invalid, the query produces a null result.
Polygon definition and constraints
dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})
dynamic({“type”: “MultiPolygon”,“coordinates”: [[ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})
- LinearRingShell is required and defined as a
counterclockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell. - LinearRingHole is optional and defined as a
clockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes. - LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
- Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
- LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions and chooses the smaller of the two regions.
- LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices is chosen.
- LinearRings must not cross and must not share edges. LinearRings might share vertices.
Examples
The following example calculates the Central Park centroid in New York City.
let central_park = dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]]});
print centroid = geo_polygon_centroid(central_park)
Output
centroid |
---|
{“type”: “Point”, “coordinates”: [-73.965735689907618, 40.782550538057812]} |
The following example calculates the Central Park centroid longitude.
let central_park = dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]]});
print
centroid = geo_polygon_centroid(central_park)
| project lng = centroid.coordinates[0]
Output
lng |
---|
-73.9657356899076 |
The following example performs union of polygons in multipolygon and calculates the centroid of the unified polygon.
let polygons = dynamic({"type":"MultiPolygon","coordinates":[[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]],[[[-73.94262313842773,40.775991804565585],[-73.98107528686523,40.791849155467695],[-73.99600982666016,40.77092185281977],[-73.96150588989258,40.75609977566361],[-73.94262313842773,40.775991804565585]]]]});
print polygons_union_centroid = geo_polygon_centroid(polygons)
Output
polygons_union_centroid |
---|
“type”: “Point”, “coordinates”: [-73.968569587829577, 40.776310752555119]} |
The following example visualizes the Central Park centroid on a map.
let central_park = dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]]});
print
centroid = geo_polygon_centroid(central_park)
| render scatterchart with (kind = map)
Output
The following example returns true
because of the invalid polygon.
print isnull(geo_polygon_centroid(dynamic({"type": "Polygon","coordinates": [[[0,0],[10,10],[10,10],[0,0]]]})))
Output
print_0 |
---|
true |
6.37 - geo_polygon_densify()
Converts polygon or multipolygon planar edges to geodesics by adding intermediate points.
Syntax
geo_polygon_densify(
polygon,
tolerance,
[ preserve_crossing ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
polygon | dynamic | ✔️ | Polygon or multipolygon in the GeoJSON format. |
tolerance | int, long, or real | Defines maximum distance in meters between the original planar edge and the converted geodesic edge chain. Supported values are in the range [0.1, 10000]. If unspecified, the default value is 10 . | |
preserve_crossing | bool | If true , preserves edge crossing over antimeridian. If unspecified, the default value false is used. |
Polygon definition
dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})
dynamic({“type”: “MultiPolygon”,“coordinates”: [[ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})
LinearRingShell
is required and defined as acounterclockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell.LinearRingHole
is optional and defined as aclockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes.LinearRing
vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.- Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
LinearRingShell
encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.LinearRing
edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
Constraints
- The maximum number of points in the densified polygon is limited to 10485760.
- Storing polygons in dynamic format has size limits.
- Densifying a valid polygon may invalidate the polygon. The algorithm adds points in a non-uniform manner, and as such may cause edges to intertwine with each other.
Motivation
- GeoJSON format defines an edge between two points as a straight cartesian line while
geo_polygon_densify()
uses geodesic. - The decision to use geodesic or planar edges might depend on the dataset and is especially relevant in long edges.
Returns
Densified polygon in the GeoJSON format and of a dynamic data type. If either the polygon or tolerance is invalid, the query produces a null result.
Examples
The following example densifies Manhattan Central Park polygon. The edges are short and the distance between planar edges and their geodesic counterparts is less than the distance specified by tolerance. As such, the result remains unchanged.
print densified_polygon = tostring(geo_polygon_densify(dynamic({"type":"Polygon","coordinates":[[[-73.958244,40.800719],[-73.949146,40.79695],[-73.973093,40.764226],[-73.982062,40.768159],[-73.958244,40.800719]]]})))
Output
densified_polygon |
---|
{“type”:“Polygon”,“coordinates”:[[[-73.958244,40.800719],[-73.949146,40.79695],[-73.973093,40.764226],[-73.982062,40.768159],[-73.958244,40.800719]]]} |
The following example densifies two edges of the polygon. Densified edges length is ~110 km
print densified_polygon = tostring(geo_polygon_densify(dynamic({"type":"Polygon","coordinates":[[[10,10],[11,10],[11,11],[10,11],[10,10]]]})))
Output
densified_polygon |
---|
{“type”:“Polygon”,“coordinates”:[[[10,10],[10.25,10],[10.5,10],[10.75,10],[11,10],[11,11],[10.75,11],[10.5,11],[10.25,11],[10,11],[10,10]]]} |
The following example returns a null result because of the invalid coordinate input.
print densified_polygon = geo_polygon_densify(dynamic({"type":"Polygon","coordinates":[[[10,900],[11,10],[11,11],[10,11],[10,10]]]}))
Output
densified_polygon |
---|
The following example returns a null result because of the invalid tolerance input.
print densified_polygon = geo_polygon_densify(dynamic({"type":"Polygon","coordinates":[[[10,10],[11,10],[11,11],[10,11],[10,10]]]}), 0)
Output
densified_polygon |
---|
6.38 - geo_polygon_perimeter()
Calculates the length of the boundary of a polygon or a multipolygon on Earth.
Syntax
geo_polygon_perimeter(
polygon)
Parameters
Name | Type | Required | Description |
---|---|---|---|
polygon | dynamic | ✔️ | Polygon or multipolygon in the GeoJSON format. |
Returns
The length of the boundary of polygon or a multipolygon, in meters, on Earth. If polygon or multipolygon are invalid, the query will produce a null result.
Polygon definition and constraints
dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})
dynamic({“type”: “MultiPolygon”,“coordinates”: [[ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})
- LinearRingShell is required and defined as a
counterclockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell. - LinearRingHole is optional and defined as a
clockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes. - LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
- Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
- LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
- LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
- LinearRings must not cross and must not share edges. LinearRings may share vertices.
Examples
The following example calculates the NYC Central Park perimeter, in meters.
let central_park = dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]]});
print perimeter = geo_polygon_perimeter(central_park)
Output
perimeter |
---|
9930.30149604938 |
The following example performs union of polygons in multipolygon and calculates perimeter of the unified polygon.
let polygons = dynamic({"type":"MultiPolygon","coordinates":[[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]],[[[-73.94262313842773,40.775991804565585],[-73.98107528686523,40.791849155467695],[-73.99600982666016,40.77092185281977],[-73.96150588989258,40.75609977566361],[-73.94262313842773,40.775991804565585]]]]});
print perimeter = geo_polygon_perimeter(polygons)
Output
perimeter |
---|
15943.5384578745 |
The following example returns True because of the invalid polygon.
print is_invalid = isnull(geo_polygon_perimeter(dynamic({"type": "Polygon","coordinates": [[[0,0],[10,10],[10,10],[0,0]]]})))
Output
is_invalid |
---|
True |
6.39 - geo_polygon_simplify()
Simplifies a polygon or a multipolygon by replacing nearly straight chains of short edges with a single long edge on Earth.
Syntax
geo_polygon_simplify(
polygon,
tolerance)
Parameters
Name | Type | Required | Description |
---|---|---|---|
polygon | dynamic | ✔️ | Polygon or multipolygon in the GeoJSON format. |
tolerance | int, long, or real | Defines maximum distance in meters between the original planar edge and the converted geodesic edge chain. Supported values are in the range [0.1, 10000]. If unspecified, the default value is 10 . |
Returns
Simplified polygon or a multipolygon in the GeoJSON format and of a dynamic data type, with no two vertices with distance less than tolerance. If either the polygon or tolerance is invalid, the query will produce a null result.
Polygon definition and constraints
dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})
dynamic({“type”: “MultiPolygon”,“coordinates”: [[ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})
- LinearRingShell is required and defined as a
counterclockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell. - LinearRingHole is optional and defined as a
clockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes. - LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
- Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
- LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
- LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
- LinearRings must not cross and must not share edges. LinearRings may share vertices.
Examples
The following example simplifies polygons by removing vertices that are within a 10-meter distance from each other.
let polygon = dynamic({"type":"Polygon","coordinates":[[[-73.94885122776031,40.79673476355657],[-73.94885927438736,40.79692258628347],[-73.94887939095497,40.79692055577034],[-73.9488673210144,40.79693476936093],[-73.94888743758202,40.79693476936093],[-73.9488834142685,40.796959135509105],[-73.94890084862709,40.79695304397289],[-73.94906312227248,40.79710736271788],[-73.94923612475395,40.7968708081794],[-73.94885122776031,40.79673476355657]]]});
print simplified = geo_polygon_simplify(polygon)
Output
simplified |
---|
{“type”: “Polygon”, “coordinates”: [[[-73.948851227760315, 40.796734763556572],[-73.949063122272477, 40.797107362717881],[-73.949236124753952, 40.7968708081794],[-73.948851227760315, 40.796734763556572]]]} |
The following example simplifies polygons and combines results into GeoJSON geometry collection.
Polygons
| project polygon = features.geometry
| project simplified = geo_polygon_simplify(polygon, 1000)
| summarize lst = make_list(simplified)
| project geojson = bag_pack("type", "Feature","geometry", bag_pack("type", "GeometryCollection", "geometries", lst), "properties", bag_pack("name", "polygons"))
Output
geojson |
---|
{“type”: “Feature”, “geometry”: {“type”: “GeometryCollection”, “geometries”: [ … ]}, “properties”: {“name”: “polygons”}} |
The following example simplifies polygons and unifies result
US_States
| project polygon = features.geometry
| project simplified = geo_polygon_simplify(polygon, 1000)
| summarize lst = make_list(simplified)
| project polygons = geo_union_polygons_array(lst)
Output
polygons |
---|
{“type”: “MultiPolygon”, “coordinates”: [ … ]} |
The following example returns True because of the invalid polygon.
let polygon = dynamic({"type":"Polygon","coordinates":[[[5,48],[5,48]]]});
print is_invalid_polygon = isnull(geo_polygon_simplify(polygon))
Output
is_invalid_polygon |
---|
1 |
The following example returns True because of the invalid tolerance.
let polygon = dynamic({"type":"Polygon","coordinates":[[[5,48],[0,50],[0,47],[4,47],[5,48]]]});
print is_invalid_polygon = isnull(geo_polygon_simplify(polygon, -0.1))
Output
is_invalid_polygon |
---|
1 |
The following example returns True because high tolerance causes polygon to disappear.
let polygon = dynamic({"type":"Polygon","coordinates":[[[5,48],[0,50],[0,47],[4,47],[5,48]]]});
print is_invalid_polygon = isnull(geo_polygon_simplify(polygon, 1000000))
Output
is_invalid_polygon |
---|
1 |
6.40 - geo_polygon_to_h3cells()
Converts polygon to H3 cells. This function is a useful geospatial join and visualization tool.
Syntax
geo_polygon_to_h3cells(
polygon [,
resolution[,
radius]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
polygon | dynamic | ✔️ | Polygon or multipolygon in the GeoJSON format. |
resolution | int | Defines the requested cell resolution. Supported values are in the range [0, 15]. If unspecified, the default value 6 is used. | |
radius | real | Buffer radius in meters. If unspecified, the default value 0 is used. |
Returns
Array of H3 cell token strings of the same resolution that represet a polygon or a multipolygon. If radius is set to a positive value, then the polygon will be enlarged such that all points within the given radius of the input polygon or multipolygon will be contained inside and the newly calculated polygon that will be converted to H3 cells. If polygon, resolution, radius is invalid, or the cell count exceeds the limit, the query will produce a null result.
Seel also geo_polygon_to_s2cells().
Examples
The following example calculates H3 cells that approximate the polygon.
let polygon = dynamic({"type":"Polygon","coordinates":[[[-3.659,40.553],[-3.913,40.409],[-3.729,40.273],[-3.524,40.440],[-3.659,40.553]]]});
print h3_cells = geo_polygon_to_h3cells(polygon)
Output
h3_cells |
---|
[“86390cb57ffffff”,“86390cb0fffffff”,“86390ca27ffffff”,“86390cb87ffffff”,“86390cb07ffffff”,“86390ca2fffffff”,“86390ca37ffffff”,“86390cb17ffffff”,“86390cb1fffffff”,“86390cb8fffffff”,“86390cba7ffffff”,“86390ca07ffffff”,“86390cbafffffff”] |
The following example demonstrates a multipolygon that consists of H3 cells that approximate the above polygon. Specifing a higher resolution will improve polygon approximation.
let polygon = dynamic({"type":"Polygon","coordinates":[[[-3.659,40.553],[-3.913,40.409],[-3.729,40.273],[-3.524,40.440],[-3.659,40.553]]]});
print h3_cells = geo_polygon_to_h3cells(polygon)
| mv-expand cell = h3_cells to typeof(string) // extract cell to a separate row
| project polygon_cell = geo_h3cell_to_polygon(cell) // convert each cell to a polygon
| project individual_polygon_coordinates = pack_array(polygon_cell.coordinates)
| summarize multipolygon_coordinates = make_list(individual_polygon_coordinates)
| project multipolygon = bag_pack("type","MultiPolygon", "coordinates", multipolygon_coordinates)
Output
multipolygon |
---|
{“type”: “MultiPolygon”, “coordinates”: [ … ]} |
The following example return null because the polygon is invalid.
let polygon = dynamic({"type":"Polygon","coordinates":[[[0,0],[1,1]]]});
print is_null = isnull(geo_polygon_to_h3cells(polygon))
Output
is_null |
---|
True |
6.41 - geo_polygon_to_s2cells()
Calculates S2 cell tokens that cover a polygon or multipolygon on Earth. This function is a useful geospatial join tool.
Read more about S2 cell hierarchy.
Syntax
geo_polygon_to_s2cells(
polygon [,
level[,
radius]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
polygon | dynamic | ✔️ | Polygon or multipolygon in the GeoJSON format. |
level | int | Defines the requested cell level. Supported values are in the range [0, 30]. If unspecified, the default value 11 is used. | |
radius | real | Buffer radius in meters. If unspecified, the default value 0 is used. |
Returns
Array of S2 cell token strings that cover a polygon or a multipolygon. If radius is set to a positive value, then the covering will be, in addition to input shape, of all points within the radius of the input geometry. If polygon, level, radius is invalid, or the cell count exceeds the limit, the query will produce a null result.
Motivation for covering polygons with S2 cell tokens
Without this function, here’s one approach we could take in order to classify coordinates into polygons containing these coordinates.
let Polygons =
datatable(description:string, polygon:dynamic)
[
"New York", dynamic({"type":"Polygon","coordinates":[[[-73.85009765625,40.85744791303121],[-74.16046142578125,40.84290487729676],[-74.190673828125,40.59935608796518],[-73.83087158203125,40.61812224225511],[-73.85009765625,40.85744791303121]]]}),
"Seattle", dynamic({"type":"Polygon","coordinates":[[[-122.200927734375,47.68573021131587],[-122.4591064453125,47.68573021131587],[-122.4755859375,47.468949677672484],[-122.17620849609374,47.47266286861342],[-122.200927734375,47.68573021131587]]]}),
"Las Vegas", dynamic({"type":"Polygon","coordinates":[[[-114.9,36.36],[-115.4498291015625,36.33282808737917],[-115.4498291015625,35.84453450421662],[-114.949951171875,35.902399875143615],[-114.9,36.36]]]}),
];
let Coordinates =
datatable(longitude:real, latitude:real)
[
real(-73.95), real(40.75), // New York
real(-122.3), real(47.6), // Seattle
real(-115.18), real(36.16) // Las Vegas
];
Polygons | extend dummy=1
| join kind=inner (Coordinates | extend dummy=1) on dummy
| where geo_point_in_polygon(longitude, latitude, polygon)
| project longitude, latitude, description
Output
longitude | latitude | description |
---|---|---|
-73.95 | 40.75 | New York city |
-122.3 | 47.6 | Seattle |
-115.18 | 36.16 | Las Vegas |
While this method works in some cases, it’s inefficient. This method does a cross-join, meaning that it tries to match every polygon to every point. This process consumes a large amount of memory and compute resources. Instead, we would like to match every polygon to a point with a high probability of containment success, and filter out other points.
This match can be achieved by the following process:
- Converting polygons to S2 cells of level k,
- Converting points to the same S2 cells level k,
- Joining on S2 cells,
- Filtering by geo_point_in_polygon(). This phase can be omitted if some amount of false positives is ok. The maximum error will be the area of s2 cells at level k beyond the boundary of the polygon.
Choosing the S2 cell level
- Ideally we would want to cover every polygon with one or just a few unique cells such that no two polygons share the same cell.
- If the polygons are close to each other, choose the S2 cell level such that its cell edge will be smaller (4, 8, 12 times smaller) than the edge of the average polygon.
- If the polygons are far from each other, choose the S2 cell level such that its cell edge will be similar or bigger than the edge of the average polygon.
- In practice, covering a polygon with more than 10,000 cells might not yield good performance.
- Sample use cases:
- S2 cell level 5 might prove to be good for covering countries/regions.
- S2 cell level 16 can cover dense and relatively small Manhattan (New York) neighborhoods.
- S2 cell level 11 can be used for covering suburbs of Australia.
- Query run time and memory consumption might differ greatly because of different S2 cell level values.
Examples
The following example classifies coordinates into polygons.
let Polygons =
datatable(description:string, polygon:dynamic)
[
'Greenwich Village', dynamic({"type":"Polygon","coordinates":[[[-73.991460000000131,40.731738000000206],[-73.992854491775518,40.730082566051351],[-73.996772,40.725432000000154],[-73.997634685522883,40.725786309886963],[-74.002855946639244,40.728346630056791],[-74.001413,40.731065000000207],[-73.996796995070824,40.73736378205173],[-73.991724524037934,40.735245208931886],[-73.990703782359589,40.734781896080477],[-73.991460000000131,40.731738000000206]]]}),
'Upper West Side', dynamic({"type":"Polygon","coordinates":[[[-73.958357552055688,40.800369095633819],[-73.98143901556422,40.768762584141953],[-73.981548752788598,40.7685590292784],[-73.981565335901905,40.768307084720796],[-73.981754418060945,40.768399727738668],[-73.982038573548124,40.768387823012056],[-73.982268248204349,40.768298621883247],[-73.982384797518051,40.768097213086911],[-73.982320919746599,40.767894461792181],[-73.982155532845766,40.767756204474757],[-73.98238873834039,40.767411004834273],[-73.993650353659021,40.772145571634361],[-73.99415893763998,40.772493009137818],[-73.993831082030937,40.772931787850908],[-73.993891252437052,40.772955194876722],[-73.993962585514595,40.772944653908901],[-73.99401262480508,40.772882846631894],[-73.994122058082397,40.77292405902601],[-73.994136652588594,40.772901870174394],[-73.994301342391154,40.772970028663913],[-73.994281535134448,40.77299380206933],[-73.994376552751078,40.77303955110149],[-73.994294029824005,40.773156243992048],[-73.995023275860802,40.773481196576356],[-73.99508939189289,40.773388475039134],[-73.995013963716758,40.773358035426909],[-73.995050284699261,40.773297153189958],[-73.996240651898916,40.773789791397689],[-73.996195837470992,40.773852356184044],[-73.996098807369748,40.773951805299085],[-73.996179459973888,40.773986954351571],[-73.996095245226442,40.774086186437756],[-73.995572265161172,40.773870731394297],[-73.994017424135961,40.77321375261053],[-73.993935876811335,40.773179512586211],[-73.993861942928888,40.773269531698837],[-73.993822393527211,40.773381758622882],[-73.993767019318497,40.773483981224835],[-73.993698463744295,40.773562141052594],[-73.993358326468751,40.773926888327956],[-73.992622663865575,40.774974056037109],[-73.992577842766124,40.774956016359418],[-73.992527743951555,40.775002110439829],[-73.992469745815342,40.775024159551755],[-73.992403837191887,40.775018140390664],[-73.99226708903538,40.775116033858794],[-73.99217809026365,40.775279293897171],[-73.992059084937338,40.775497598192516],[-73.992125372394938,40.775509075053385],[-73.992226867797001,40.775482211026116],[-73.992329346608813,40.775468900958522],[-73.992361756801131,40.775501899766638],[-73.992386042960277,40.775557180424634],[-73.992087684712729,40.775983970821372],[-73.990927174149746,40.777566878763238],[-73.99039616003671,40.777585065679204],[-73.989461267506471,40.778875124584417],[-73.989175778438053,40.779287524015778],[-73.988868617400072,40.779692922911607],[-73.988871874499793,40.779713738253008],[-73.989219022880576,40.779697895209402],[-73.98927785904425,40.779723439271038],[-73.989409054180143,40.779737706471963],[-73.989498614927044,40.779725044389757],[-73.989596493388234,40.779698146683387],[-73.989679812902509,40.779677568658038],[-73.989752702937935,40.779671244211556],[-73.989842247806507,40.779680752670664],[-73.990040102120489,40.779707677698219],[-73.990137977524839,40.779699769704784],[-73.99033584033225,40.779661794394983],[-73.990430598697046,40.779664973055503],[-73.990622199396725,40.779676064914298],[-73.990745069505479,40.779671328184051],[-73.990872114282197,40.779646007643876],[-73.990961672224358,40.779639683751753],[-73.991057472829539,40.779652352625774],[-73.991157429497036,40.779669775606465],[-73.991242817404469,40.779671367084504],[-73.991255318289745,40.779650782516491],[-73.991294887120119,40.779630209208889],[-73.991321967649895,40.779631796041372],[-73.991359455569423,40.779585883337383],[-73.991551059227476,40.779574821437407],[-73.99141982585985,40.779755280287233],[-73.988886144117032,40.779878898532999],[-73.988939656706265,40.779956178440393],[-73.988926103530844,40.780059292013632],[-73.988911680264692,40.780096037146606],[-73.988919261468567,40.780226094343945],[-73.988381050202634,40.780981074045783],[-73.988232413846987,40.781233144215555],[-73.988210420831663,40.781225482542055],[-73.988140000000143,40.781409000000224],[-73.988041288067166,40.781585961353777],[-73.98810029382463,40.781602878305286],[-73.988076449145055,40.781650935001608],[-73.988018059972219,40.781634188810422],[-73.987960792842145,40.781770987031535],[-73.985465811970457,40.785360700575431],[-73.986172704965611,40.786068452258647],[-73.986455862401996,40.785919219081421],[-73.987072345615601,40.785189638820121],[-73.98711901394276,40.785210319004058],[-73.986497781023601,40.785951202887254],[-73.986164628806279,40.786121882448327],[-73.986128422486075,40.786239001331111],[-73.986071135219746,40.786240706026611],[-73.986027274789123,40.786228964236727],[-73.986097637849426,40.78605822569795],[-73.985429321269592,40.785413942184597],[-73.985081137732209,40.785921935110366],[-73.985198833254501,40.785966552197777],[-73.985170502389906,40.78601333415817],[-73.985216218673656,40.786030501816427],[-73.98525509797993,40.785976205511588],[-73.98524273937646,40.785972572653328],[-73.98524962933017,40.785963139855845],[-73.985281779186749,40.785978620950075],[-73.985240032884533,40.786035858136792],[-73.985683885242182,40.786222123919686],[-73.985717529004575,40.786175994668795],[-73.985765660297687,40.786196274858618],[-73.985682871922691,40.786309786213067],[-73.985636270930442,40.786290150649279],[-73.985670722564691,40.786242911993817],[-73.98520511880038,40.786047669212785],[-73.985211035607492,40.786039554883686],[-73.985162639946992,40.786020999769754],[-73.985131636312062,40.786060297019972],[-73.985016964065125,40.78601423719563],[-73.984655078830457,40.786534741807841],[-73.985743787901043,40.786570082854738],[-73.98589227228328,40.786426529019593],[-73.985942854994988,40.786452847880334],[-73.985949561556794,40.78648711396653],[-73.985812373526713,40.786616865357047],[-73.985135209703174,40.78658761889551],[-73.984619428584324,40.786586016349787],[-73.981952458164173,40.790393724337193],[-73.972823037363767,40.803428052816756],[-73.971036786332192,40.805918478839672],[-73.966701,40.804169000000186],[-73.959647,40.801156000000113],[-73.958508540159471,40.800682279767472],[-73.95853274080838,40.800491362464697],[-73.958357552055688,40.800369095633819]]]}),
'Upper East Side', dynamic({"type":"Polygon","coordinates":[[[-73.943592454622546,40.782747908206574],[-73.943648235390199,40.782656161333449],[-73.943870759887162,40.781273026571704],[-73.94345932494096,40.780048275653243],[-73.943213862652243,40.779317588660199],[-73.943004239504688,40.779639495474292],[-73.942716005450905,40.779544169476175],[-73.942712374762181,40.779214856940001],[-73.942535563208608,40.779090956062532],[-73.942893408188027,40.778614093246276],[-73.942438481745029,40.777315235766039],[-73.942244919522594,40.777104088947254],[-73.942074188038887,40.776917846977142],[-73.942002667222781,40.776185317382648],[-73.942620205199006,40.775180871576474],[-73.94285645694552,40.774796600349191],[-73.94293043781397,40.774676268036011],[-73.945870899588215,40.771692257932997],[-73.946618690150586,40.77093339256956],[-73.948664164778933,40.768857624399587],[-73.950069793030679,40.767025088383498],[-73.954418260786071,40.762184104951245],[-73.95650786241211,40.760285256574043],[-73.958787773424007,40.758213471309809],[-73.973015157270069,40.764278692864671],[-73.955760332998182,40.787906554459667],[-73.944023,40.782960000000301],[-73.943592454622546,40.782747908206574]]]}),
];
let Coordinates =
datatable(longitude:real, latitude:real)
[
real(-73.9741), 40.7914, // Upper West Side
real(-73.9950), 40.7340, // Greenwich Village
real(-73.9584), 40.7688, // Upper East Side
];
let Level = 16;
Polygons
| extend covering = geo_polygon_to_s2cells(polygon, Level) // cover every polygon with s2 cell token array
| mv-expand covering to typeof(string) // expand cells array such that every row will have one cell mapped to its polygon
| join kind=inner hint.strategy=broadcast // assume that Polygons count is small (In some specific case)
(
Coordinates
| extend covering = geo_point_to_s2cell(longitude, latitude, Level) // cover point with cell
) on covering // join on the cell, this filters out rows of point and polygons where the point definitely does not belong to the polygon
| where geo_point_in_polygon(longitude, latitude, polygon) // final filtering for exact result
| project longitude, latitude, description
Output
longitude | latitude | description |
---|---|---|
-73.9741 | 40.7914 | Upper West Side |
-73.995 | 40.734 | Greenwich Village |
-73.9584 | 40.7688 | Upper East Side |
Here is even more improvement on the above query. Count storm events per US state. The below query performs a very efficient join because it doesn’t carry polygons through the join and uses lookup operator
" target="_blank">Run the query
let Level = 6;
let polygons = materialize(
US_States
| project StateName = tostring(features.properties.NAME), polygon = features.geometry, id = new_guid());
let tmp =
polygons
| project id, covering = geo_polygon_to_s2cells(polygon, Level)
| mv-expand covering to typeof(string)
| join kind=inner hint.strategy=broadcast
(
StormEvents
| project lng = BeginLon, lat = BeginLat
| project lng, lat, covering = geo_point_to_s2cell(lng, lat, Level)
) on covering
| project-away covering, covering1;
tmp | lookup polygons on id
| project-away id
| where geo_point_in_polygon(lng, lat, polygon)
| summarize StormEventsCountByState = count() by StateName
Output
StateName | StormEventsCountByState |
---|---|
Florida | 960 |
Georgia | 1085 |
… | … |
The following example filters out polygons that don’t intersect with the area of the polygon of interest. The maximum error is diagonal of s2cell length. This example is based on a polygonized earth at night raster file.
let intersection_level_hint = 7;
let area_of_interest = dynamic({"type": "Polygon","coordinates": [[[-73.94966125488281,40.79698248639272],[-73.95841598510742,40.800426144169315],[-73.98124694824219,40.76806170936614],[-73.97283554077148,40.7645513650551],[-73.94966125488281,40.79698248639272]]]});
let area_of_interest_covering = geo_polygon_to_s2cells(area_of_interest, intersection_level_hint);
EarthAtNight
| project value = features.properties.DN, polygon = features.geometry
| extend covering = geo_polygon_to_s2cells(polygon, intersection_level_hint)
| mv-apply c = covering to typeof(string) on
(
summarize is_intersects = take_anyif(1, array_index_of(area_of_interest_covering, c) != -1)
)
| where is_intersects == 1
| count
Output
Count |
---|
83 |
Count of cells that will be needed in order to cover some polygon with S2 cells of level 5.
let polygon = dynamic({"type":"Polygon","coordinates":[[[0,0],[0,50],[100,50],[0,0]]]});
print s2_cell_token_count = array_length(geo_polygon_to_s2cells(polygon, 5));
Output
s2_cell_token_count |
---|
286 |
Covering a large-area polygon with small-area cells returns null.
let polygon = dynamic({"type":"Polygon","coordinates":[[[0,0],[0,50],[100,50],[0,0]]]});
print geo_polygon_to_s2cells(polygon, 30);
Output
print_0 |
---|
Covering a large-area polygon with small-area cells returns null.
let polygon = dynamic({"type":"Polygon","coordinates":[[[0,0],[0,50],[100,50],[0,0]]]});
print isnull(geo_polygon_to_s2cells(polygon, 30));
Output
print_0 |
---|
1 |
6.42 - geo_s2cell_neighbors()
Calculates S2 cell neighbors.
Read more about S2 cell hierarchy.
Syntax
geo_s2cell_neighbors(
s2cell)
Parameters
Name | Type | Required | Description |
---|---|---|---|
s2cell | string | ✔️ | S2 cell token value as it was calculated by geo_point_to_s2cell(). The S2 cell token maximum string length is 16 characters. |
Returns
An array of S2 cell neighbors. If the S2 Cell is invalid, the query produces a null result.
Examples
The following example calculates S2 cell neighbors.
print neighbors = geo_s2cell_neighbors('89c259')
Output
neighbors |
---|
[“89c25d”,“89c2f9”,“89c251”,“89c257”,“89c25f”,“89c25b”,“89c2f7”,“89c2f5”] |
The following example calculates an array of input S2 cell with its neighbors.
let s2cell = '89c259';
print cells = array_concat(pack_array(s2cell), geo_s2cell_neighbors(s2cell))
Output
cells |
---|
[“89c259”,“89c25d”,“89c2f9”,“89c251”,“89c257”,“89c25f”,“89c25b”,“89c2f7”,“89c2f5”] |
The following example calculates S2 cells polygons GeoJSON geometry collection.
let s2cell = '89c259';
print cells = array_concat(pack_array(s2cell), geo_s2cell_neighbors(s2cell))
| mv-expand cells to typeof(string)
| project polygons = geo_s2cell_to_polygon(cells)
| summarize arr = make_list(polygons)
| project geojson = bag_pack("type", "Feature","geometry", bag_pack("type", "GeometryCollection", "geometries", arr), "properties", bag_pack("name", "polygons"))
Output
geojson |
---|
{“type”: “Feature”,“geometry”: {“type”: “GeometryCollection”,“geometries”: [ {“type”: “Polygon”,“coordinates”: [[[ -74.030012249838478, 40.8012684339439],[ -74.030012249838478, 40.7222262918358],[ -73.935982114337421, 40.708880489804564],[ -73.935982114337421, 40.787917134506841],[ -74.030012249838478, 40.8012684339439]]]}, {“type”: “Polygon”,“coordinates”: [[[ -73.935982114337421, 40.708880489804564],[ -73.935982114337421, 40.629736433321796],[ -73.841906340776248, 40.616308079144915],[ -73.841906340776248, 40.695446474556284],[ -73.935982114337421, 40.708880489804564]]]}, {“type”: “Polygon”,“coordinates”: [[[ -74.1239959854733, 40.893471289549765],[ -74.1239959854733, 40.814531536204242],[ -74.030012249838478, 40.8012684339439],[ -74.030012249838478, 40.880202851376716],[ -74.1239959854733, 40.893471289549765]]]}, {“type”: “Polygon”,“coordinates”: [[[ -74.1239959854733, 40.735483949993387],[ -74.1239959854733, 40.656328734184143],[ -74.030012249838478, 40.643076628676461],[ -74.030012249838478, 40.7222262918358],[ -74.1239959854733, 40.735483949993387]]]}, {“type”: “Polygon”,“coordinates”: [[[ -74.1239959854733, 40.814531536204242],[ -74.1239959854733, 40.735483949993387],[ -74.030012249838478, 40.7222262918358],[ -74.030012249838478, 40.8012684339439],[ -74.1239959854733, 40.814531536204242]]]}, {“type”: “Polygon”,“coordinates”: [[[ -73.935982114337421, 40.787917134506841],[ -73.935982114337421, 40.708880489804564],[ -73.841906340776248, 40.695446474556284],[ -73.841906340776248, 40.774477568182071],[ -73.935982114337421, 40.787917134506841]]]}, {“type”: “Polygon”,“coordinates”: [[[ -74.030012249838478, 40.7222262918358],[ -74.030012249838478, 40.643076628676461],[ -73.935982114337421, 40.629736433321796],[ -73.935982114337421, 40.708880489804564],[ -74.030012249838478, 40.7222262918358]]]}, {“type”: “Polygon”,“coordinates”: [[[ -74.030012249838478, 40.880202851376716],[ -74.030012249838478, 40.8012684339439],[ -73.935982114337421, 40.787917134506841],[ -73.935982114337421, 40.866846163445771],[ -74.030012249838478, 40.880202851376716]]]}, {“type”: “Polygon”,“coordinates”: [[[ -73.935982114337421, 40.866846163445771],[ -73.935982114337421, 40.787917134506841],[ -73.841906340776248, 40.774477568182071],[ -73.841906340776248, 40.853401155678846],[ -73.935982114337421, 40.866846163445771]]]}]}, “properties”: {“name”: “polygons”}} |
The following example calculates polygon unions that represent S2 cell and its neighbors.
let s2cell = '89c259';
print cells = array_concat(pack_array(s2cell), geo_s2cell_neighbors(s2cell))
| mv-expand cells to typeof(string)
| project polygons = geo_s2cell_to_polygon(cells)
| summarize arr = make_list(polygons)
| project polygon = geo_union_polygons_array(arr)
Output
polygon |
---|
{“type”: “Polygon”,“coordinates”: [[[-73.841906340776248,40.695446474556284],[-73.841906340776248,40.774477568182071],[-73.841906340776248,40.853401155678846],[-73.935982114337421,40.866846163445771],[-74.030012249838478,40.880202851376716],[-74.1239959854733,40.893471289549758],[-74.1239959854733,40.814531536204242],[-74.1239959854733,40.735483949993387],[-74.1239959854733,40.656328734184143],[-74.030012249838478,40.643076628676461],[-73.935982114337421,40.629736433321796],[-73.841906340776248,40.616308079144915],[-73.841906340776248,40.695446474556284]]]} |
The following example returns true because of the invalid S2 Cell token input.
print invalid = isnull(geo_s2cell_neighbors('a'))
Output
invalid |
---|
1 |
6.43 - geo_s2cell_to_central_point()
Calculates the geospatial coordinates that represent the center of an S2 cell.
Read more about S2 cell hierarchy.
Syntax
geo_s2cell_to_central_point(
s2cell)
Parameters
Name | Type | Required | Description |
---|---|---|---|
s2cell | string | ✔️ | S2 cell token value as it was calculated by geo_point_to_s2cell(). The S2 cell token maximum string length is 16 characters. |
Returns
The geospatial coordinate values in GeoJSON Format and of a dynamic data type. If the S2 cell token is invalid, the query will produce a null result.
Examples
print point = geo_s2cell_to_central_point("1234567")
| extend coordinates = point.coordinates
| extend longitude = coordinates[0], latitude = coordinates[1]
Output
point | coordinates | longitude | latitude |
---|---|---|---|
{ “type”: “Point”, “coordinates”: [ 9.86830731850408, 27.468392925827604 ] } | [ 9.86830731850408, 27.468392925827604 ] | 9.86830731850408 | 27.4683929258276 |
The following example returns a null result because of the invalid S2 cell token input.
print point = geo_s2cell_to_central_point("a")
Output
point |
---|
6.44 - geo_s2cell_to_polygon()
Calculates the polygon that represents the S2 Cell rectangular area.
Read more about S2 Cells.
Syntax
geo_s2cell_to_polygon(
s2cell)
Parameters
Name | Type | Required | Description |
---|---|---|---|
s2cell | string | ✔️ | S2 cell token value as it was calculated by geo_point_to_s2cell(). The S2 cell token maximum string length is 16 characters. |
Returns
Polygon in GeoJSON Format and of a dynamic data type. If the s2cell is invalid, the query produces a null result.
Examples
print s2cellPolygon = geo_s2cell_to_polygon("89c259")
Output
s2cellPolygon |
---|
{ “type”: “Polygon”, “coordinates”: [[[-74.030012249838478, 40.8012684339439], [-74.030012249838478, 40.7222262918358], [-73.935982114337421, 40.708880489804564], [-73.935982114337421, 40.787917134506841], [-74.030012249838478, 40.8012684339439]]] } |
The following example assembles GeoJSON geometry collection of S2 Cell polygons.
datatable(lng:real, lat:real)
[
-73.956683, 40.807907,
-73.916869, 40.818314,
-73.989148, 40.743273,
]
| project s2_hash = geo_point_to_s2cell(lng, lat, 10)
| project s2_hash_polygon = geo_s2cell_to_polygon(s2_hash)
| summarize s2_hash_polygon_lst = make_list(s2_hash_polygon)
| project bag_pack(
"type", "Feature",
"geometry", bag_pack("type", "GeometryCollection", "geometries", s2_hash_polygon_lst),
"properties", bag_pack("name", "S2 Cell polygons collection"))
Output
Column1 |
---|
{ “type”: “Feature”, “geometry”: {“type”: “GeometryCollection”, “geometries”: [ {“type”: “Polygon”, “coordinates”: [[[-74.030012249838478, 40.880202851376716], [-74.030012249838478, 40.8012684339439], [-73.935982114337421, 40.787917134506841], [-73.935982114337421, 40.866846163445771], [-74.030012249838478, 40.880202851376716]]]}, {“type”: “Polygon”, “coordinates”: [[[-73.935982114337421, 40.866846163445771], [-73.935982114337421, 40.787917134506841], [-73.841906340776248, 40.774477568182071], [-73.841906340776248, 40.853401155678846], [-73.935982114337421, 40.866846163445771]]]}, {“type”: “Polygon”, “coordinates”: [[[-74.030012249838478, 40.8012684339439], [-74.030012249838478, 40.7222262918358], [-73.935982114337421, 40.708880489804564], [-73.935982114337421, 40.787917134506841], [-74.030012249838478, 40.8012684339439]]]}] }, “properties”: {“name”: “S2 Cell polygons collection”} } |
The following example returns a null result because of the invalid s2cell token input.
print s2cellPolygon = geo_s2cell_to_polygon("a")
Output
s2cellPolygon |
---|
6.45 - geo_simplify_polygons_array()
Simplifies polygons by replacing nearly straight chains of short edges with a single long edge on Earth.
Syntax
geo_simplify_polygons_array(
polygons,
tolerance)
Parameters
Name | Type | Required | Description |
---|---|---|---|
polygon | dynamic | ✔️ | Polygon or multipolygon in the GeoJSON format. |
tolerance | int, long, or real | Defines minimum distance in meters between any two vertices. Supported values are in the range [0, ~7,800,000 meters]. If unspecified, the default value 10 is used. |
Returns
Simplified polygon or a multipolygon in the GeoJSON format and of a dynamic data type, with no two vertices with distance less than tolerance. If either the polygon or tolerance is invalid, the query will produce a null result.
Polygon definition and constraints
dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})
dynamic({“type”: “MultiPolygon”,“coordinates”: [[ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})
- LinearRingShell is required and defined as a
counterclockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell. - LinearRingHole is optional and defined as a
clockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes. - LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
- Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
- LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
- LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
- LinearRings must not cross and must not share edges. LinearRings may share vertices.
Examples
The following example simplifies polygons with mutual borders (USA states), by removing vertices that are within a 100-meter distance from each other.
US_States
| project polygon = features.geometry
| summarize lst = make_list(polygon)
| project polygons = geo_simplify_polygons_array(lst, 100)
Output
polygons |
---|
{ “type”: “MultiPolygon”, “coordinates”: [ … ]]} |
The following example returns True because one of the polygons is invalid.
datatable(polygons:dynamic)
[
dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807,40.80068],[-73.98201,40.76825],[-73.97317,40.76455],[-73.9495,40.7969]]]}),
dynamic({"type":"Polygon","coordinates":[[[-73.94622,40.79249]]]}),
dynamic({"type":"Polygon","coordinates":[[[-73.97335,40.77274],[-73.9936,40.76630],[-73.97171,40.75655],[-73.97335,40.77274]]]})
]
| summarize arr = make_list(polygons)
| project is_invalid_polygon = isnull(geo_simplify_polygons_array(arr))
Output
is_invalid_polygon |
---|
1 |
The following example returns True because of the invalid tolerance.
datatable(polygons:dynamic)
[
dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807,40.80068],[-73.98201,40.76825],[-73.97317,40.76455],[-73.9495,40.7969]]]}),
dynamic({"type":"Polygon","coordinates":[[[-73.94622,40.79249],[-73.96888,40.79282],[-73.9577,40.7789],[-73.94622,40.79249]]]}),
dynamic({"type":"Polygon","coordinates":[[[-73.97335,40.77274],[-73.9936,40.76630],[-73.97171,40.75655],[-73.97335,40.77274]]]})
]
| summarize arr = make_list(polygons)
| project is_null = isnull(geo_simplify_polygons_array(arr, -1))
Output
is_null |
---|
1 |
The following example returns True because high tolerance causes polygon to disappear.
datatable(polygons:dynamic)
[
dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807,40.80068],[-73.98201,40.76825],[-73.97317,40.76455],[-73.9495,40.7969]]]}),
dynamic({"type":"Polygon","coordinates":[[[-73.94622,40.79249],[-73.96888,40.79282],[-73.9577,40.7789],[-73.94622,40.79249]]]}),
dynamic({"type":"Polygon","coordinates":[[[-73.97335,40.77274],[-73.9936,40.76630],[-73.97171,40.75655],[-73.97335,40.77274]]]})
]
| summarize arr = make_list(polygons)
| project is_null = isnull(geo_simplify_polygons_array(arr, 10000))
Output
is_null |
---|
1 |
6.46 - geo_union_lines_array()
Calculates the union of lines or multilines on Earth.
Syntax
geo_union_lines_array(
lineStrings)
Parameters
Name | Type | Required | Description |
---|---|---|---|
lineStrings | dynamic | ✔️ | An array of lines or multilines in the GeoJSON format. |
Returns
A line or a multiline in GeoJSON Format and of a dynamic data type. If any of the provided lines or multilines is invalid, the query will produce a null result.
LineString definition and constraints
dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})
dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})
- LineString coordinates array must contain at least two entries.
- Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
- Edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
Examples
The following example performs geospatial union on line rows.
datatable(lines:dynamic)
[
dynamic({"type":"LineString","coordinates":[[-73.95683884620665,40.80502891480884],[-73.95633727312088,40.8057171711177],[-73.95489156246185,40.80510200431311]]}),
dynamic({"type":"LineString","coordinates":[[-73.95633727312088,40.8057171711177],[-73.95489156246185,40.80510200431311],[-73.95537436008453,40.804413741624515]]}),
dynamic({"type":"LineString","coordinates":[[-73.95633727312088,40.8057171711177],[-73.95489156246185,40.80510200431311]]})
]
| summarize lines_arr = make_list(lines)
| project lines_union = geo_union_lines_array(lines_arr)
Output
lines_union |
---|
{“type”: “LineString”, “coordinates”: [[-73.956838846206651, 40.805028914808844], [-73.95633727312088, 40.8057171711177], [ -73.954891562461853, 40.80510200431312], [-73.955374360084534, 40.804413741624522]]} |
The following example performs geospatial union on line columns.
datatable(line1:dynamic, line2:dynamic)
[
dynamic({"type":"LineString","coordinates":[[-73.95683884620665,40.80502891480884],[-73.95633727312088,40.8057171711177],[-73.95489156246185,40.80510200431311]]}), dynamic({"type":"LineString","coordinates":[[-73.95633727312088,40.8057171711177],[-73.95489156246185,40.80510200431311],[-73.95537436008453,40.804413741624515]]})
]
| project lines_arr = pack_array(line1, line2)
| project lines_union = geo_union_lines_array(lines_arr)
Output
lines_union |
---|
{“type”: “LineString”, “coordinates”:[[-73.956838846206651, 40.805028914808844], [-73.95633727312088, 40.8057171711177], [-73.954891562461853, 40.80510200431312], [-73.955374360084534, 40.804413741624522]]} |
The following example returns True because one of the lines is invalid.
datatable(lines:dynamic)
[
dynamic({"type":"LineString","coordinates":[[-73.95683884620665,40.80502891480884],[-73.95633727312088,40.8057171711177],[-73.95489156246185,40.80510200431311]]}),
dynamic({"type":"LineString","coordinates":[[1, 1]]})
]
| summarize lines_arr = make_list(lines)
| project invalid_union = isnull(geo_union_lines_array(lines_arr))
Output
invalid_union |
---|
True |
6.47 - geo_union_polygons_array()
Calculates the union of polygons or multipolygons on Earth.
Syntax
geo_union_polygons_array(
polygons)
Parameters
Name | Type | Required | Description |
---|---|---|---|
polygons | dynamic | ✔️ | An array of polygons or multipolygons in the GeoJSON format. |
Returns
A polygon or a multipolygon in GeoJSON Format and of a dynamic data type. If any of the provided polygons or multipolygons is invalid, the query will produce a null result.
Polygon definition and constraints
dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})
dynamic({“type”: “MultiPolygon”,“coordinates”: [[ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})
- LinearRingShell is required and defined as a
counterclockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell. - LinearRingHole is optional and defined as a
clockwise
ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes. - LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
- Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
- LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
- LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
- LinearRings must not cross and must not share edges. LinearRings may share vertices.
Examples
The following example performs geospatial union on polygon rows.
datatable(polygons:dynamic)
[
dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807,40.80068],[-73.98201,40.76825],[-73.97317,40.76455],[-73.9495,40.7969]]]}),
dynamic({"type":"Polygon","coordinates":[[[-73.94622,40.79249],[-73.96888,40.79282],[-73.9577,40.7789],[-73.94622,40.79249]]]}),
dynamic({"type":"Polygon","coordinates":[[[-73.97335,40.77274],[-73.9936,40.76630],[-73.97171,40.75655],[-73.97335,40.77274]]]})
]
| summarize polygons_arr = make_list(polygons)
| project polygons_union = geo_union_polygons_array(polygons_arr)
Output
polygons_union |
---|
{“type”:“Polygon”,“coordinates”:[[[-73.972599326729608,40.765330371902991],[-73.960302383706178,40.782140794645024],[-73.9577,40.7789],[-73.94622,40.79249],[-73.9526593223173,40.792584227716468],[-73.9495,40.7969],[-73.95807,40.80068],[-73.9639277517478,40.792748258673875],[-73.96888,40.792819999999992],[-73.9662719791645,40.7895734224338],[-73.9803360309571,40.770518810606404],[-73.9936,40.7663],[-73.97171,40.756550000000004],[-73.972599326729608,40.765330371902991]]]} |
The following example performs geospatial union on polygon columns.
datatable(polygon1:dynamic, polygon2:dynamic)
[
dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807,40.80068],[-73.98201,40.76825],[-73.97317,40.76455],[-73.9495,40.7969]]]}), dynamic({"type":"Polygon","coordinates":[[[-73.94622,40.79249],[-73.96888,40.79282],[-73.9577,40.7789],[-73.94622,40.79249]]]})
]
| project polygons_arr = pack_array(polygon1, polygon2)
| project polygons_union = geo_union_polygons_array(polygons_arr)
Output
polygons_union |
---|
{“type”:“Polygon”,“coordinates”:[[[-73.9495,40.7969],[-73.95807,40.80068],[-73.9639277517478,40.792748258673875],[-73.96888,40.792819999999992],[-73.9662719791645,40.7895734224338],[-73.98201,40.76825],[-73.97317,40.76455],[-73.960302383706178,40.782140794645024],[-73.9577,40.7789],[-73.94622,40.79249],[-73.9526593223173,40.792584227716468],[-73.9495,40.7969]]]} |
The following example returns True because one of the polygons is invalid.
datatable(polygons:dynamic)
[
dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807,40.80068],[-73.98201,40.76825],[-73.97317,40.76455],[-73.9495,40.7969]]]}),
dynamic({"type":"Polygon","coordinates":[[[-73.94622,40.79249]]]})
]
| summarize polygons_arr = make_list(polygons)
| project invalid_union = isnull(geo_union_polygons_array(polygons_arr))
Output
invalid_union |
---|
True |
6.48 - Geospatial data visualizations
Geospatial data can be visualized as part of your query using the render operator as points, pies, or bubbles on a map.
Visualize points on a map
You can visualize points either using [Longitude, Latitude] columns, or GeoJSON column. Using a series column is optional. The [Longitude, Latitude] pair defines each point, in that order.
Example: Visualize points on a map
The following example finds storm events and visualizes 100 on a map.
StormEvents
| take 100
| project BeginLon, BeginLat
| render scatterchart with (kind = map)
Example: Visualize multiple series of points on a map
The following example visualizes multiple series of points, where the [Longitude, Latitude] pair defines each point, and a third column defines the series. In this example, the series is EventType
.
StormEvents
| take 100
| project BeginLon, BeginLat, EventType
| render scatterchart with (kind = map)
Example: Visualize series of points on data with multiple columns
The following example visualizes a series of points on a map. If you have multiple columns in the result, you must specify the columns to be used for xcolumn (Longitude), ycolumn (Latitude), and series.
StormEvents
| take 100
| render scatterchart with (kind = map, xcolumn = BeginLon, ycolumns = BeginLat, series = EventType)
Example: Visualize points on a map defined by GeoJSON dynamic values
The following example visualizes points on the map using GeoJSON dynamic values to define the points.
StormEvents
| project BeginLon, BeginLat
| summarize by hash=geo_point_to_s2cell(BeginLon, BeginLat, 5)
| project geo_s2cell_to_central_point(hash)
| render scatterchart with (kind = map)
Visualization of pies or bubbles on a map
You can visualize pies or bubbles either using [Longitude, Latitude] columns, or GeoJSON column. These visualizations can be created with color or numeric axes.
Example: Visualize pie charts by location
The following example shows storm events aggregated by S2 cells. The chart aggregates events in bubbles by location in one color.
StormEvents
| project BeginLon, BeginLat, EventType
| where geo_point_in_circle(BeginLon, BeginLat, real(-81.3891), 28.5346, 1000 * 100)
| summarize count() by EventType, hash = geo_point_to_s2cell(BeginLon, BeginLat)
| project geo_s2cell_to_central_point(hash), count_
| extend Events = "count"
| render piechart with (kind = map)
Example: Visualize bubbles using a color axis
The following example shows storm events aggregated by S2 cells. The chart aggregates events by event type in pie charts by location.
StormEvents
| project BeginLon, BeginLat, EventType
| where geo_point_in_circle(BeginLon, BeginLat, real(-81.3891), 28.5346, 1000 * 100)
| summarize count() by EventType, hash = geo_point_to_s2cell(BeginLon, BeginLat)
| project geo_s2cell_to_central_point(hash), EventType, count_
| render piechart with (kind = map)
Related content
- Render operator
- Data analytics for automotive test fleets (geospatial clustering use case)
- Learn about Azure architecture for geospatial data processing and analytics
6.49 - Geospatial grid system
Geospatial data can be analyzed efficiently using grid systems to create geospatial clusters. You can use geospatial tools to aggregate, cluster, partition, reduce, join, and index geospatial data. These tools improve query runtime performance, reduce stored data size, and visualize aggregated geospatial data.
The following methods of geospatial clustering are supported:
The core functionalities of these methods are:
- Calculate hash\index\cell token of geospatial coordinate. Different geospatial coordinates that belong to same cell will have same cell token value.
- Calculate center point of hash\index\cell token. This point is useful because it may represent all the values in the cell.
- Calculate cell polygon. Calculating cell polygons is useful in cell visualization or other calculations, for example, distance, or point in polygon checks.
Compare methods
Criteria | Geohash | S2 Cell | H3 Cell |
---|---|---|---|
Levels of hierarchy | 18 | 31 | 16 |
Cell shape | Rectangle | Rectangle | Hexagon |
Cell edges | straight | geodesic | straight |
Projection system | None. Encodes latitude and longitude. | Cube face centered quadratic transform. | Icosahedron face centered gnomonic. |
Neighbors count | 8 | 8 | 6 |
Noticeable feature | Common prefixes indicate points proximity. | 31 hierarchy levels. | Cell shape is hexagonal. |
Performance | Superb | Superb | Fast |
Cover polygon with cells | Not supported | Supported | Not supported |
Cell parent | Not supported | Not Supported | Supported |
Cell children | Not supported | Not Supported | Supported |
Cell rings | Not supported | Not Supported | Supported |
Geohash functions
Function Name |
---|
geo_point_to_geohash() |
geo_geohash_to_central_point() |
geo_geohash_neighbors() |
geo_geohash_to_polygon() |
S2 Cell functions
Function Name |
---|
geo_point_to_s2cell() |
geo_s2cell_to_central_point() |
geo_s2cell_neighbors() |
geo_s2cell_to_polygon() |
geo_polygon_to_s2cells() |
H3 Cell functions
Function Name |
---|
geo_point_to_h3cell() |
geo_h3cell_to_central_point() |
geo_h3cell_neighbors() |
geo_h3cell_to_polygon() |
geo_h3cell_parent() |
geo_h3cell_children() |
geo_h3cell_rings() |
Related content
- See a use case for geospatial clustering: Data analytics for automotive test fleets
- Learn about Azure architecture for geospatial data processing and analytics
7 - Graph operators
7.1 - Best practices for Kusto Query Language (KQL) graph semantics
Best practices for Kusto Query Language (KQL) graph semantics
This article explains how to use the graph semantics feature in KQL effectively and efficiently for different use cases and scenarios. It shows how to create and query graphs with the syntax and operators, and how to integrate them with other KQL features and functions. It also helps users avoid common pitfalls or errors. For instance, creating graphs that exceed memory or performance limits, or applying unsuitable or incompatible filters, projections, or aggregations.
Size of graph
The make-graph operator creates an in-memory representation of a graph. It consists of the graph structure itself and its properties. When making a graph, use appropriate filters, projections, and aggregations to select only the relevant nodes and edges and their properties.
The following example shows how to reduce the number of nodes and edges and their properties. In this scenario, Bob changed manager from Alice to Eve and the user only wants to see the latest state of the graph for their organization. To reduce the size of the graph, the nodes are first filtered by the organization property and then the property is removed from the graph using the project-away operator. The same happens for edges. Then summarize operator together with arg_max is used to get the last known state of the graph.
let allEmployees = datatable(organization: string, name:string, age:long)
[
"R&D", "Alice", 32,
"R&D","Bob", 31,
"R&D","Eve", 27,
"R&D","Mallory", 29,
"Marketing", "Alex", 35
];
let allReports = datatable(employee:string, manager:string, modificationDate: datetime)
[
"Bob", "Alice", datetime(2022-05-23),
"Bob", "Eve", datetime(2023-01-01),
"Eve", "Mallory", datetime(2022-05-23),
"Alice", "Dave", datetime(2022-05-23)
];
let filteredEmployees =
allEmployees
| where organization == "R&D"
| project-away age, organization;
let filteredReports =
allReports
| summarize arg_max(modificationDate, *) by employee
| project-away modificationDate;
filteredReports
| make-graph employee --> manager with filteredEmployees on name
| graph-match (employee)-[hasManager*2..5]-(manager)
where employee.name == "Bob"
project employee = employee.name, topManager = manager.name
Output
employee | topManager |
---|---|
Bob | Mallory |
Last known state of the graph
The Size of graph example demonstrated how to get the last known state of the edges of a graph by using summarize
operator and the arg_max
aggregation function. Obtaining the last known state is a compute-intensive operation.
Consider creating a materialized view to improve the query performance, as follows:
Create tables that have some notion of version as part of their model. We recommend using a
datetime
column that you can later use to create a graph time series..create table employees (organization: string, name:string, stateOfEmployment:string, properties:dynamic, modificationDate:datetime) .create table reportsTo (employee:string, manager:string, modificationDate: datetime)
Create a materialized view for each table and use the arg_max aggregation function to determine the last known state of employees and the reportsTo relation.
.create materialized-view employees_MV on table employees { employees | summarize arg_max(modificationDate, *) by name } .create materialized-view reportsTo_MV on table reportsTo { reportsTo | summarize arg_max(modificationDate, *) by employee }
Create two functions that ensure that only the materialized component of the materialized view is used and other filters and projections are applied.
.create function currentEmployees () { materialized_view('employees_MV') | where stateOfEmployment == "employed" } .create function reportsTo_lastKnownState () { materialized_view('reportsTo_MV') | project-away modificationDate }
The resulting query using materialized makes the query faster and more efficient for larger graphs. It also enables higher concurrency and lower latency queries for the latest state of the graph. The user can still query the graph history based on the employees and reportsTo tables, if needed
let filteredEmployees =
currentEmployees
| where organization == "R&D"
| project-away organization;
reportsTo_lastKnownState
| make-graph employee --> manager with filteredEmployees on name
| graph-match (employee)-[hasManager*2..5]-(manager)
where employee.name == "Bob"
project employee = employee.name, reportingPath = map(hasManager, manager)
Graph time travel
Some scenarios require you to analyze data based on the state of a graph at a specific point in time. Graph time travel uses a combination of time filters and summarizes using the arg_max aggregation function.
The following KQL statement creates a function with a parameter that defines the interesting point in time for the graph. It returns a ready-made graph.
.create function graph_time_travel (interestingPointInTime:datetime ) {
let filteredEmployees =
employees
| where modificationDate < interestingPointInTime
| summarize arg_max(modificationDate, *) by name;
let filteredReports =
reportsTo
| where modificationDate < interestingPointInTime
| summarize arg_max(modificationDate, *) by employee
| project-away modificationDate;
filteredReports
| make-graph employee --> manager with filteredEmployees on name
}
With the function in place, the user can craft a query to get the top manager of Bob based on the graph in June 2022.
graph_time_travel(datetime(2022-06-01))
| graph-match (employee)-[hasManager*2..5]-(manager)
where employee.name == "Bob"
project employee = employee.name, reportingPath = map(hasManager, manager)
Output
employee | topManager |
---|---|
Bob | Dave |
Dealing with multiple node and edge types
Sometimes it’s required to contextualize time series data with a graph that consists of multiple node types. One way of handling this scenario is creating a general-purpose property graph that is represented by a canonical model.
Occasionally, you might need to contextualize time series data with a graph that has multiple node types. You could approach the problem by creating a general-purpose property graph that is based on a canonical model, such as the following.
- nodes
- nodeId (string)
- label (string)
- properties (dynamic)
- edges
- source (string)
- destination (string)
- label (string)
- properties (dynamic)
The following example shows how to transform the data into a canonical model and how to query it. The base tables for the nodes and edges of the graph have different schemas.
This scenario involves a factory manager who wants to find out why equipment isn’t working well and who is responsible for fixing it. The manager decides to use a graph that combines the asset graph of the production floor and the maintenance staff hierarchy which changes every day.
The following graph shows the relations between assets and their time series, such as speed, temperature, and pressure. The operators and the assets, such as pump, are connected via the operates edge. The operators themselves report up to management.
The data for those entities can be stored directly in your cluster or acquired using query federation to a different service, such as Azure Cosmos DB, Azure SQL, or Azure Digital Twin. To illustrate the example, the following tabular data is created as part of the query:
let sensors = datatable(sensorId:string, tagName:string, unitOfMeasuree:string)
[
"1", "temperature", "°C",
"2", "pressure", "Pa",
"3", "speed", "m/s"
];
let timeseriesData = datatable(sensorId:string, timestamp:string, value:double, anomaly: bool )
[
"1", datetime(2023-01-23 10:00:00), 32, false,
"1", datetime(2023-01-24 10:00:00), 400, true,
"3", datetime(2023-01-24 09:00:00), 9, false
];
let employees = datatable(name:string, age:long)
[
"Alice", 32,
"Bob", 31,
"Eve", 27,
"Mallory", 29,
"Alex", 35,
"Dave", 45
];
let allReports = datatable(employee:string, manager:string)
[
"Bob", "Alice",
"Alice", "Dave",
"Eve", "Mallory",
"Alex", "Dave"
];
let operates = datatable(employee:string, machine:string, timestamp:datetime)
[
"Bob", "Pump", datetime(2023-01-23),
"Eve", "Pump", datetime(2023-01-24),
"Mallory", "Press", datetime(2023-01-24),
"Alex", "Conveyor belt", datetime(2023-01-24),
];
let assetHierarchy = datatable(source:string, destination:string)
[
"1", "Pump",
"2", "Pump",
"Pump", "Press",
"3", "Conveyor belt"
];
The employees, sensors, and other entities and relationships don’t share a canonical data model. You can use the union operator to combine and canonize the data.
The following query joins the sensor data with the time series data to find the sensors that have abnormal readings. Then, it uses a projection to create a common model for the graph nodes.
let nodes =
union
(
sensors
| join kind=leftouter
(
timeseriesData
| summarize hasAnomaly=max(anomaly) by sensorId
) on sensorId
| project nodeId = sensorId, label = "tag", properties = pack_all(true)
),
( employees | project nodeId = name, label = "employee", properties = pack_all(true));
The edges are transformed in a similar way.
let edges =
union
( assetHierarchy | extend label = "hasParent" ),
( allReports | project source = employee, destination = manager, label = "reportsTo" ),
( operates | project source = employee, destination = machine, properties = pack_all(true), label = "operates" );
With the canonized nodes and edges data, you can create a graph using the make-graph operator, as follows:
let graph = edges
| make-graph source --> destination with nodes on nodeId;
Once created, define the path pattern and project the information required. The pattern starts at a tag node followed by a variable length edge to an asset. That asset is operated by an operator that reports to a top manager via a variable length edge, called reportsTo. The constraints section of the graph-match operator, in this instance where, reduces the tags to the ones that have an anomaly and were operated on a specific day.
graph
| graph-match (tag)-[hasParent*1..5]->(asset)<-[operates]-(operator)-[reportsTo*1..5]->(topManager)
where tag.label=="tag" and tobool(tag.properties.hasAnomaly) and
startofday(todatetime(operates.properties.timestamp)) == datetime(2023-01-24)
and topManager.label=="employee"
project
tagWithAnomaly = tostring(tag.properties.tagName),
impactedAsset = asset.nodeId,
operatorName = operator.nodeId,
responsibleManager = tostring(topManager.nodeId)
Output
tagWithAnomaly | impactedAsset | operatorName | responsibleManager |
---|---|---|---|
temperature | Pump | Eve | Mallory |
The projection in graph-match outputs the information that the temperature sensor showed an anomaly on the specified day. It was operated by Eve who ultimately reports to Mallory. With this information, the factory manager can reach out to Eve and potentially Mallory to get a better understanding of the anomaly.
Related content
7.2 - Graph operators
Kusto Query Language (KQL) graph operators enable graph analysis of data by representing tabular data as a graph with nodes and edges. This setup lets us use graph operations to study the connections and relationships between different data points.
Graph analysis is typically comprised of the following steps:
- Prepare and preprocess the data using tabular operators
- Build a graph from the prepared tabular data using make-graph
- Perform graph analysis using graph-match
- Transform the results of the graph analysis back into tabular form using graph-to-table
- Continue the query with tabular operators
Supported graph operators
The following table describes the supported graph operators.
Operator | Description |
---|---|
make-graph | Builds a graph from tabular data. |
graph-match | Searches for patterns in a graph. |
graph-to-table | Builds nodes or edges tables from a graph. |
graph-shortest-paths | Finds the shortest paths from a given set of source nodes to a set of target nodes. |
graph-mark-components | Finds and marks all connected components. |
Graph model
A graph is modeled as a directed property graph that represents the data as a network of vertices, or nodes, connected by edges. Both nodes and edges can have properties that store more information about them, and a node in the graph must have a unique identifier. A pair of nodes can have multiple edges between them that have different properties or direction. There’s no special distinction of labels in the graph, and any property can act as a label.
Graph lifetime
A graph is a transient object. It’s built in each query that contains graph operators and ceases to exist once the query is completed. To persist a graph, it has to first be transformed back into tabular form and then stored as edges or nodes tables.
Limitations and recommendations
The graph object is built in memory on the fly for each graph query. As such, there’s a performance cost for building the graph and a limit to the size of the graph that can be built.
Although it isn’t strictly enforced, we recommend building graphs with at most 10 million elements (nodes and edges). The actual memory limit for the graph is determined by query operators memory limit.
Related content
7.3 - graph-mark-components operator (Preview)
The graph-mark-components
operator finds all connected components of a graph and marks each node with a component identifier.
Syntax
G |
graph-mark-components
[kind
=
Kind] [with_component_id
=
ComponentId]
Parameters
Name | Type | Required | Description |
---|---|---|---|
G | string | ✔️ | The graph source. |
Kind | string | The connected component kind, either weak (default) or strong . A weak component is a set of nodes connected by a path, ignoring the direction of edges. A strong component is a set of nodes connected in both directions, considering the edges’ directions. | |
ComponentId | string | The property name that denotes the component identifier. The default property name is ComponentId . |
Returns
The graph-mark-components
operator returns a graph result, where each node has a component identifier in the ComponentId property. The identifier is a zero-based consecutive index of the components. Each component index is chosen arbitrarily and might not be consistent across runs.
Examples
The examples in this section show how to use the syntax to help you get started.
Find families by their relationships
The following example creates a graph from a set of child-parent pairs and identifies connected components using a family
identifier.
let ChildOf = datatable(child:string, parent:string)
[
"Alice", "Bob",
"Carol", "Alice",
"Carol", "Dave",
"Greg", "Alice",
"Greg", "Dave",
"Howard", "Alice",
"Howard", "Dave",
"Eve", "Frank",
"Frank", "Mallory",
"Eve", "Kirk",
];
ChildOf
| make-graph child --> parent with_node_id=name
| graph-mark-components with_component_id = family
| graph-to-table nodes
Output
name | family |
---|---|
Alice | 0 |
Bob | 0 |
Carol | 0 |
Dave | 0 |
Greg | 0 |
Howard | 0 |
Eve | 1 |
Frank | 1 |
Mallory | 1 |
Kirk | 1 |
Find a greatest common ancestor for each family
The following example uses the connected component family
identifier and the graph-match
operator to identify the greatest ancestor of each family in a set of child-parent data.
let ChildOf = datatable(child:string, parent:string)
[
"Alice", "Bob",
"Carol", "Alice",
"Carol", "Dave",
"Greg", "Alice",
"Greg", "Dave",
"Howard", "Alice",
"Howard", "Dave",
"Eve", "Frank",
"Frank", "Mallory",
"Eve", "Kirk",
];
ChildOf
| make-graph child --> parent with_node_id=name
| graph-mark-components with_component_id = family
| graph-match (descendant)-[childOf*1..5]->(ancestor)
project name = ancestor.name, lineage = map(childOf, child), family = ancestor.family
| summarize (generations, name) = arg_max(array_length(lineage),name) by family
Output
family | generations | name |
---|---|---|
1 | 2 | Mallory |
0 | 2 | Bob |
Related content
7.4 - graph-match operator
The graph-match
operator searches for all occurrences of a graph pattern in an input graph source.
Syntax
G |
graph-match
[cycles
=
CyclesOption] Pattern [where
Constraints] project
[ColumnName =
] Expression [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
G | string | ✔️ | The input graph source. |
Pattern | string | ✔️ | One or more comma delimited sequences of graph node elements connected by graph edge elements using graph notations. See Graph pattern notation. |
Constraints | string | A Boolean expression composed of properties of named variables in the Pattern. Each graph element (node/edge) has a set of properties that were attached to it during the graph construction. The constraints define which elements (nodes and edges) are matched by the pattern. A property is referenced by the variable name followed by a dot (. ) and the property name. | |
Expression | string | ✔️ | The project clause converts each pattern to a row in a tabular result. The project expressions must be scalar and reference properties of named variables defined in the Pattern. A property is referenced by the variable name followed by a dot (. ) and the attribute name. |
CyclesOption | string | Controls whether cycles are matched in the Pattern, allowed values: all , none , unique_edges . If all is specified, then all cycles are matched, if none is specified cycles aren’t matched, if unique_edges (default) is specified, cycles are matched but only if the cycles don’t include the same edge more than once. |
Graph pattern notation
The following table shows the supported graph notation:
Element | Named variable | Anonymous |
---|---|---|
Node | ( n) | () |
Directed edge: left to right | -[ e]-> | --> |
Directed edge: right to left | <-[ e]- | <-- |
Any direction edge | -[ e]- | -- |
Variable length edge | -[ e*3..5]- | -[*3..5]- |
Variable length edge
A variable length edge allows a specific pattern to be repeated multiple times within defined limits. This type of edge is denoted by an asterisk (*
), followed by the minimum and maximum occurrence values in the format min..
max. Both the minimum and maximum values must be integer scalars. Any sequence of edges falling within this occurrence range can match the variable edge of the pattern, if all the edges in the sequence satisfy the constraints outlined in the where
clause.
Multiple sequences
Multiple comma delimited sequences are used to express nonlinear patterns. To describe the connection between different sequences, they have to share one or more variable name of a node. For example, to represent a star pattern with node n at the center connected to nodes a,b,c, and d, the following pattern could be used:
(
a)--(
n)--(
b)
,(
c)--(
n)--(
d)
Only single connected component patterns are supported.
Returns
The graph-match
operator returns a tabular result, where each record corresponds to a match of the pattern in the graph.
The returned columns are defined in the operator’s project
clause using properties of edges and/or nodes defined in the pattern. Properties and functions of properties of variable length edges are returned as a dynamic array, each value in the array corresponds to an occurrence of the variable length edge.
Examples
The examples in this section show how to use the syntax to help you get started.
All employees in a manager’s organization
The following example represents an organizational hierarchy. It demonstrates how a variable length edge could be used to find employees of different levels of the hierarchy in a single query. The nodes in the graph represent employees and the edges are from an employee to their manager. After we build the graph using make-graph
, we search for employees in Alice
’s organization that are younger than 30
.
let employees = datatable(name:string, age:long)
[
"Alice", 32,
"Bob", 31,
"Eve", 27,
"Joe", 29,
"Chris", 45,
"Alex", 35,
"Ben", 23,
"Richard", 39,
];
let reports = datatable(employee:string, manager:string)
[
"Bob", "Alice",
"Chris", "Alice",
"Eve", "Bob",
"Ben", "Chris",
"Joe", "Alice",
"Richard", "Bob"
];
reports
| make-graph employee --> manager with employees on name
| graph-match (alice)<-[reports*1..5]-(employee)
where alice.name == "Alice" and employee.age < 30
project employee = employee.name, age = employee.age, reportingPath = map(reports, manager)
Output
employee | age | reportingPath |
---|---|---|
Joe | 29 | [ “Alice” ] |
Eve | 27 | [ “Alice”, “Bob” ] |
Ben | 23 | [ “Alice”, “Chris” ] |
Attack path
The following example builds a graph from the Actions
and Entities
tables. The entities are people and systems, and the actions describe different relations between entities. Following the make-graph
operator that builds the graph is a call to graph-match
with a graph pattern that searches for attack paths to the "Apollo"
system.
let Entities = datatable(name:string, type:string, age:long)
[
"Alice", "Person", 23,
"Bob", "Person", 31,
"Eve", "Person", 17,
"Mallory", "Person", 29,
"Apollo", "System", 99
];
let Actions = datatable(source:string, destination:string, action_type:string)
[
"Alice", "Bob", "communicatesWith",
"Alice", "Apollo", "trusts",
"Bob", "Apollo", "hasPermission",
"Eve", "Alice", "attacks",
"Mallory", "Alice", "attacks",
"Mallory", "Bob", "attacks"
];
Actions
| make-graph source --> destination with Entities on name
| graph-match (mallory)-[attacks]->(compromised)-[hasPermission]->(apollo)
where mallory.name == "Mallory" and apollo.name == "Apollo" and attacks.action_type == "attacks" and hasPermission.action_type == "hasPermission"
project Attacker = mallory.name, Compromised = compromised.name, System = apollo.name
Output
Attacker | Compromised | System |
---|---|---|
Mallory | Bob | Apollo |
Star pattern
The following example is similar to the previous attack path example, but with an extra constraint: we want the compromised entity to also communicate with Alice. The graph-match
pattern prefix is the same as the previous example and we add another sequence with the compromised as a link between the sequences.
let Entities = datatable(name:string, type:string, age:long)
[
"Alice", "Person", 23,
"Bob", "Person", 31,
"Eve", "Person", 17,
"Mallory", "Person", 29,
"Apollo", "System", 99
];
let Actions = datatable(source:string, destination:string, action_type:string)
[
"Alice", "Bob", "communicatesWith",
"Alice", "Apollo", "trusts",
"Bob", "Apollo", "hasPermission",
"Eve", "Alice", "attacks",
"Mallory", "Alice", "attacks",
"Mallory", "Bob", "attacks"
];
Actions
| make-graph source --> destination with Entities on name
| graph-match (mallory)-[attacks]->(compromised)-[hasPermission]->(apollo), (compromised)-[communicates]-(alice)
where mallory.name == "Mallory" and apollo.name == "Apollo" and attacks.action_type == "attacks" and hasPermission.action_type == "hasPermission" and alice.name == "Alice"
project Attacker = mallory.name, Compromised = compromised.name, System = apollo.name
Output
Attacker | Compromised | System |
---|---|---|
Mallory | Bob | Apollo |
Related content
7.5 - graph-shortest-paths Operator (Preview)
The graph-shortest-paths
operator finds the shortest paths between a set of source nodes and a set of target nodes in a graph and returns a table with the results.
Syntax
G |
graph-shortest-paths
[output
=
OutputOption] Pattern where
Predicate project
[ColumnName =
] Expression [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
G | string | ✔️ | The graph source, typically the output from a make-graph operation. |
Pattern | string | ✔️ | A path pattern that describes the path to find. Patterns must include at least one variable length edge and can’t contain multiple sequences. |
Predicate | expression | A boolean expression that consists of properties of named variables in the pattern and constants. | |
Expression | expression | ✔️ | A scalar expression that defines the output row for each found path, using constants and references to properties of named variables in the pattern. |
OutputOption | string | Specifies the search output as any (default) or all . Output is specified as any for a single shortest path per source/target pair and all for all shortest paths of equal minimum length. |
Path pattern notation
The following table shows the supported path pattern notations.
Element | Named variable | Anonymous element |
---|---|---|
Node | ( n) | () |
Directed edge from left to right | -[ e]-> | --> |
Directed edge from right to left | <-[ e]- | <-- |
Any direction edge | -[ e]- | -- |
Variable length edge | -[ e*3..5]- | -[*3..5]- |
Variable length edge
A variable length edge allows a specific pattern to repeat multiple times within defined limits. An asterisk (*
) denotes this type of edge, followed by the minimum and maximum occurrence values in the format min..
max. These values must be integer scalars. Any sequence of edges within this range can match the variable edge of the pattern, provided all the edges in the sequence meet the where
clause constraints.
Returns
The graph-shortest-paths
operator returns a tabular result, where each record corresponds to a path found in the graph. The returned columns are defined in the operator’s project
clause using properties of nodes and edges defined in the pattern. Properties and functions of properties of variable length edges, are returned as a dynamic array. Each value in the array corresponds to an occurrence of the variable length edge.
Examples
This section provides practical examples demonstrating how to use the graph-shortest-paths
operator in different scenarios.
Find any
shortest path between two train stations
The following example demonstrates how to use the graph-shortest-paths
operator to find the shortest path between two stations in a transportation network. The query constructs a graph from the data in connections
and finds the shortest path from the "South-West"
to the "North"
station, considering paths up to five connections long. Since the default output is any
, it finds any shortest path.
let connections = datatable(from_station:string, to_station:string, line:string)
[
"Central", "North", "red",
"North", "Central", "red",
"Central", "South", "red",
"South", "Central", "red",
"South", "South-West", "red",
"South-West", "South", "red",
"South-West", "West", "red",
"West", "South-West", "red",
"Central", "East", "blue",
"East", "Central", "blue",
"Central", "West", "blue",
"West", "Central", "blue",
];
connections
| make-graph from_station --> to_station with_node_id=station
| graph-shortest-paths (start)-[connections*1..5]->(destination)
where start.station == "South-West" and destination.station == "North"
project from = start.station, path = map(connections, to_station), line = map(connections, line), to = destination.station
Output
from | path | line | to |
---|---|---|---|
South-West | [ “South”, “Central”, “North” ] | [ “red”, “red”, “red” ] | North |
Find all shortest paths between two train stations
The following example, like the previous example, finds the shortest paths in a transportation network. However, it uses output=all
, so returns all shortest paths.
let connections = datatable(from_station:string, to_station:string, line:string)
[
"Central", "North", "red",
"North", "Central", "red",
"Central", "South", "red",
"South", "Central", "red",
"South", "South-West", "red",
"South-West", "South", "red",
"South-West", "West", "red",
"West", "South-West", "red",
"Central", "East", "blue",
"East", "Central", "blue",
"Central", "West", "blue",
"West", "Central", "blue",
];
connections
| make-graph from_station --> to_station with_node_id=station
| graph-shortest-paths output=all (start)-[connections*1..5]->(destination)
where start.station == "South-West" and destination.station == "North"
project from = start.station, path = map(connections, to_station), line = map(connections, line), to = destination.station
Output
from | path | line | to |
---|---|---|---|
South-West | [ “South”, “Central”, “North” ] | [ “red”, “red”, “red” ] | North |
South-West | [ “West”, “Central”, “North” ] | [ “red”, “blue”, “red” ] | North |
Related content
7.6 - graph-to-table operator
The graph-to-table
operator exports nodes or edges from a graph to tables.
Syntax
Nodes
G |
graph-to-table
nodes
[ with_node_id=
ColumnName ]
Edges
G |
graph-to-table
edges
[ with_source_id=
ColumnName ] [ with_target_id=
ColumnName ] [ as
TableName ]
Nodes and edges
G |
graph-to-table
nodes
as
NodesTableName [ with_node_id=
ColumnName ],
edges
as
EdgesTableName [ with_source_id=
ColumnName ] [ with_target_id=
ColumnName ]
Parameters
Name | Type | Required | Description |
---|---|---|---|
G | string | ✔️ | The input graph source. |
NodesTableName | string | The name of the exported nodes table. | |
EdgesTableName | string | The name of the exported edges table. | |
ColumnName | string | Export the node hash ID, source node hash ID, or target node hash ID with the given column name. |
Returns
Nodes
The graph-to-table
operator returns a tabular result, in which each row corresponds to a node in the source graph. The returned columns are the node’s properties. When with_node_id
is provided, the node hash column is of long
type.
Edges
The graph-to-table
operator returns a tabular result, in which each row corresponds to an edge in the source graph. The returned columns are the node’s properties. When with_source_id
or with_target_id
are provided, the node hash column is of long
type.
Nodes and edges
The graph-to-table
operator returns two tabular results, matching the previous descriptions.
Examples
The following examples use the make-graph
operator to build a graph from edges and nodes tables. The nodes represent people and systems, and the edges are different relations between nodes. Then, each example shows a different usage of graph-to-table
.
Get edges
In this example, the graph-to-table
operator exports the edges from a graph to a table. The with_source_id
and with_target_id
parameters export the node hash for source and target nodes of each edge.
let nodes = datatable(name:string, type:string, age:long)
[
"Alice", "Person", 23,
"Bob", "Person", 31,
"Eve", "Person", 17,
"Mallory", "Person", 29,
"Trent", "System", 99
];
let edges = datatable(source:string, destination:string, edge_type:string)
[
"Alice", "Bob", "communicatesWith",
"Alice", "Trent", "trusts",
"Bob", "Trent", "hasPermission",
"Eve", "Alice", "attacks",
"Mallory", "Alice", "attacks",
"Mallory", "Bob", "attacks"
];
edges
| make-graph source --> destination with nodes on name
| graph-to-table edges with_source_id=SourceId with_target_id=TargetId
Output
SourceId | TargetId | source | destination | edge_type |
---|---|---|---|---|
-3122868243544336885 | -7133945255344544237 | Alice | Bob | communicatesWith |
-3122868243544336885 | 2533909231875758225 | Alice | Trent | trusts |
-7133945255344544237 | 2533909231875758225 | Bob | Trent | hasPermission |
4363395278938690453 | -3122868243544336885 | Eve | Alice | attacks |
3855580634910899594 | -3122868243544336885 | Mallory | Alice | attacks |
3855580634910899594 | -7133945255344544237 | Mallory | Bob | attacks |
Get nodes
In this example, the graph-to-table
operator exports the nodes from a graph to a table. The with_node_id
parameter exports the node hash.
let nodes = datatable(name:string, type:string, age:long)
[
"Alice", "Person", 23,
"Bob", "Person", 31,
"Eve", "Person", 17,
"Trent", "System", 99
];
let edges = datatable(source:string, destination:string, edge_type:string)
[
"Alice", "Bob", "communicatesWith",
"Alice", "Trent", "trusts",
"Bob", "Trent", "hasPermission",
"Eve", "Alice", "attacks",
"Mallory", "Alice", "attacks",
"Mallory", "Bob", "attacks"
];
edges
| make-graph source --> destination with nodes on name
| graph-to-table nodes with_node_id=NodeId
Output
NodeId | name | type | age |
---|---|---|---|
-3122868243544336885 | Alice | Person | 23 |
-7133945255344544237 | Bob | Person | 31 |
4363395278938690453 | Eve | Person | 17 |
2533909231875758225 | Trent | System | 99 |
3855580634910899594 | Mallory |
Get nodes and edges
In this example, the graph-to-table
operator exports the nodes and edges from a graph to a table.
let nodes = datatable(name:string, type:string, age:long)
[
"Alice", "Person", 23,
"Bob", "Person", 31,
"Eve", "Person", 17,
"Trent", "System", 99
];
let edges = datatable(source:string, destination:string, edge_type:string)
[
"Alice", "Bob", "communicatesWith",
"Alice", "Trent", "trusts",
"Bob", "Trent", "hasPermission",
"Eve", "Alice", "attacks",
"Mallory", "Alice", "attacks",
"Mallory", "Bob", "attacks"
];
edges
| make-graph source --> destination with nodes on name
| graph-to-table nodes as N with_node_id=NodeId, edges as E with_source_id=SourceId;
N;
E
Output table 1
NodeId | name | type | age |
---|---|---|---|
-3122868243544336885 | Alice | Person | 23 |
-7133945255344544237 | Bob | Person | 31 |
4363395278938690453 | Eve | Person | 17 |
2533909231875758225 | Trent | System | 99 |
3855580634910899594 | Mallory |
Output table 2
SourceId | source | destination | edge_type |
---|---|---|---|
-3122868243544336885 | Alice | Bob | communicatesWith |
-3122868243544336885 | Alice | Trent | trusts |
-7133945255344544237 | Bob | Trent | hasPermission |
4363395278938690453 | Eve | Alice | attacks |
3855580634910899594 | Mallory | Alice | attacks |
3855580634910899594 | Mallory | Bob | attacks |
Related content
7.7 - Kusto Query Language (KQL) graph semantics overview
Kusto Query Language (KQL) graph semantics overview
Graph semantics in Kusto Query Language (KQL) allows you to model and query data as graphs. The structure of a graph comprises nodes and edges that connect them. Both nodes and edges can have properties that describe them.
Graphs are useful for representing complex and dynamic data that involve many-to-many, hierarchical, or networked relationships, such as social networks, recommendation systems, connected assets, or knowledge graphs. For example, the following graph illustrates a social network that consists of four nodes and three edges. Each node has a property for its name, such as Bob, and each edge has a property for its type, such as reportsTo.
Graphs store data differently from relational databases, which use tables and need indexes and joins to connect related data. In graphs, each node has a direct pointer to its neighbors (adjacency), so there’s no need to index or join anything, making it easy and fast to traverse the graph. Graph queries can use the graph structure and meaning to do complex and powerful operations, such as finding paths, patterns, shortest distances, communities, or centrality measures.
You can create and query graphs using KQL graph semantics, which has a simple and intuitive syntax that works well with the existing KQL features. You can also mix graph queries with other KQL features, such as time-based, location-based, and machine-learning queries, to do more advanced and powerful data analysis. By using KQL with graph semantics, you get the speed and scale of KQL queries with the flexibility and expressiveness of graphs.
For example, you can use:
- Time-based queries to analyze the evolution of a graph over time, such as how the network structure or the node properties change
- Geospatial queries to analyze the spatial distribution or proximity of nodes and edges, such as how the location or distance affects the relationship
- Machine learning queries to apply various algorithms or models to graph data, such as clustering, classification, or anomaly detection
How does it work?
Every query of the graph semantics in Kusto requires creating a new graph representation. You use a graph operator that converts tabular expressions for edges and optionally nodes into a graph representation of the data. Once the graph is created, you can apply different operations to further enhance or examine the graph data.
The graph semantics extension uses an in-memory graph engine that works on the data in the memory of your cluster, making graph analysis interactive and fast. The memory consumption of a graph representation is affected by the number of nodes and edges and their respective properties. The graph engine uses a property graph model that supports arbitrary properties for nodes and edges. It also integrates with all the existing scalar operators of KQL, which gives users the ability to write expressive and complex graph queries that can use the full power and functionality of KQL.
Why use graph semantics in KQL?
There are several reasons to use graph semantics in KQL, such as the following examples:
KQL doesn’t support recursive joins, so you have to explicitly define the traversals you want to run (see Scenario: Friends of a friend). You can use the make-graph operator to define hops of variable length, which is useful when the relationship distance or depth isn’t fixed. For example, you can use this operator to find all the resources that are connected in a graph or all the places you can reach from a source in a transportation network.
Time-aware graphs are a unique feature of graph semantics in KQL that allow users to model graph data as a series of graph manipulation events over time. Users can examine how the graph evolves over time, such as how the graph’s network structure or the node properties change, or how the graph events or anomalies happen. For example, users can use time series queries to discover trends, patterns, or outliers in the graph data, such as how the network density, centrality, or modularity change over time
The intellisense feature of the KQL query editor assists users in writing and executing queries in the query language. It provides syntax highlighting, autocompletion, error checking, and suggestions. It also helps users with the graph semantics extension by offering graph-specific keywords, operators, functions, and examples to guide users through the graph creation and querying process.
Limits
The following are some of the main limits of the graph semantics feature in KQL:
- You can only create or query graphs that fit into the memory of one cluster node.
- Graph data isn’t persisted or distributed across cluster nodes, and is discarded after the query execution.
Therefore, When using the graph semantics feature in KQL, you should consider the memory consumption and performance implications of creating and querying large or dense graphs. Where possible, you should use filters, projections, and aggregations to reduce the graph size and complexity.
Related content
7.8 - make-graph operator
The make-graph
operator builds a graph structure from tabular inputs of edges and nodes.
Syntax
Edges |
make-graph
SourceNodeId -->
TargetNodeId [ with
Nodes1 on
NodeId1 [,
Nodes2 on
NodeId2 ]]
Edges |
make-graph
SourceNodeId -->
TargetNodeId [ with_node_id=
DefaultNodeId ]
Parameters
Name | Type | Required | Description |
---|---|---|---|
Edges | string | ✔️ | The tabular source containing the edges of the graph, each row represents an edge in the graph. |
SourceNodeId | string | ✔️ | The column in Edges with the source node IDs of the edges. |
TargetNodeId | string | ✔️ | The column in Edges with the target node IDs of the edges. |
Nodes | string | The tabular expressions containing the properties of the nodes in the graph. | |
NodesId | string | The columns with the node IDs in Nodes. | |
DefaultNodeId | string | The name of the column for the default node ID. |
Returns
The make-graph
operator returns a graph expression and must be followed by a graph operator. Each row in the source Edges expression becomes an edge in the graph with properties that are the column values of the row. Each row in the Nodes tabular expression becomes a node in the graph with properties that are the column values of the row. Nodes that appear in the Edges table but don’t have a corresponding row in the Nodes table are created as nodes with the corresponding node ID and empty properties.
Users can handle node information in the following ways:
- No node information required:
make-graph
completes with source and target. - Explicit node properties: use up to two tabular expressions using “
with
Nodes1on
NodeId1 [,
Nodes2on
NodeId2 ].” - Default node identifier: use “
with_node_id=
DefaultNodeId.”
Example
Edges and nodes graph
The following example builds a graph from edges and nodes tables. The nodes represent people and systems, and the edges represent different relationships between nodes. The make-graph
operator builds the graph. Then, the graph-match
operator is used with a graph pattern to search for attack paths leading to the "Trent"
system node.
let nodes = datatable(name:string, type:string, age:int)
[
"Alice", "Person", 23,
"Bob", "Person", 31,
"Eve", "Person", 17,
"Mallory", "Person", 29,
"Trent", "System", 99
];
let edges = datatable(Source:string, Destination:string, edge_type:string)
[
"Alice", "Bob", "communicatesWith",
"Alice", "Trent", "trusts",
"Bob", "Trent", "hasPermission",
"Eve", "Alice", "attacks",
"Mallory", "Alice", "attacks",
"Mallory", "Bob", "attacks"
];
edges
| make-graph Source --> Destination with nodes on name
| graph-match (mallory)-[attacks]->(compromised)-[hasPermission]->(trent)
where mallory.name == "Mallory" and trent.name == "Trent" and attacks.edge_type == "attacks" and hasPermission.edge_type == "hasPermission"
project Attacker = mallory.name, Compromised = compromised.name, System = trent.name
Output
Attacker | Compromised | System |
---|---|---|
Mallory | Bob | Trent |
Default node identifier
The following example builds a graph using only edges, with the name
property as the default node identifier. This approach is useful when creating a graph from a tabular expression of edges, ensuring that the node identifier is available for the constraints section of the subsequent graph-match
operator.
let edges = datatable(source:string, destination:string, edge_type:string)
[
"Alice", "Bob", "communicatesWith",
"Alice", "Trent", "trusts",
"Bob", "Trent", "hasPermission",
"Eve", "Alice", "attacks",
"Mallory", "Alice", "attacks",
"Mallory", "Bob", "attacks"
];
edges
| make-graph source --> destination with_node_id=name
| graph-match (mallory)-[attacks]->(compromised)-[hasPermission]->(trent)
where mallory.name == "Mallory" and trent.name == "Trent" and attacks.edge_type == "attacks" and hasPermission.edge_type == "hasPermission"
project Attacker = mallory.name, Compromised = compromised.name, System = trent.name
Output
Attacker | Compromised | System |
---|---|---|
Mallory | Bob | Trent |
Related content
7.9 - Scenarios for using Kusto Query Language (KQL) graph semantics
What are common scenarios for using Kusto Query Language (KQL) graph semantics?
Graph semantics in Kusto Query Language (KQL) allows you to model and query data as graphs. There are many scenarios where graphs are useful for representing complex and dynamic data that involve many-to-many, hierarchical, or networked relationships, such as social networks, recommendation systems, connected assets, or knowledge graphs.
In this article, you learn about the following common scenarios for using KQL graph semantics:
Friends of a friend
One common use case for graphs is to model and query social networks, where nodes are users and edges are friendships or interactions. For example, imagine we have a table called Users that has data about users, such as their name and organization, and a table called Knows that has data about the friendships between users as shown in the following diagram:
Without using graph semantics in KQL, you could create a graph to find friends of a friend by using multiple joins, as follows:
let Users = datatable (UserId: string, name: string, org: string)[]; // nodes
let Knows = datatable (FirstUser: string, SecondUser: string)[]; // edges
Users
| where org == "Contoso"
| join kind=inner (Knows) on $left.UserId == $right.FirstUser
| join kind=innerunique(Users) on $left.SecondUser == $right.UserId
| join kind=inner (Knows) on $left.SecondUser == $right.FirstUser
| join kind=innerunique(Users) on $left.SecondUser1 == $right.UserId
| where UserId != UserId1
| project name, name1, name2
You can use graph semantics in KQL to perform the same query in a more intuitive and efficient way. The following query uses the make-graph operator to create a directed graph from FirstUser to SecondUser and enriches the properties on the nodes with the columns provided by the Users table. Once the graph is instantiated, the graph-match operator provides the friend-of-a-friend pattern including filters and a projection that results in a tabular output.
let Users = datatable (UserId:string , name:string , org:string)[]; // nodes
let Knows = datatable (FirstUser:string , SecondUser:string)[]; // edges
Knows
| make-graph FirstUser --> SecondUser with Users on UserId
| graph-match (user)-->(middle_man)-->(friendOfAFriend)
where user.org == "Contoso" and user.UserId != friendOfAFriend.UserId
project contoso_person = user.name, middle_man = middle_man.name, kusto_friend_of_friend = friendOfAFriend.name
Insights from log data
In some use cases, you want to gain insights from a simple flat table containing time series information, such as log data. The data in each row is a string that contains raw data. To create a graph from this data, you must first identify the entities and relationships that are relevant to the graph analysis. For example, suppose you have a table called rawLogs from a web server that contains information about requests, such as the timestamp, the source IP address, the destination resource, and much more.
The following table shows an example of the raw data:
let rawLogs = datatable (rawLog: string) [
"31.56.96.51 - - [2019-01-22 03:54:16 +0330] \"GET /product/27 HTTP/1.1\" 200 5379 \"https://www.contoso.com/m/filter/b113\" \"some client\" \"-\"",
"31.56.96.51 - - [2019-01-22 03:55:17 +0330] \"GET /product/42 HTTP/1.1\" 200 5667 \"https://www.contoso.com/m/filter/b113\" \"some client\" \"-\"",
"54.36.149.41 - - [2019-01-22 03:56:14 +0330] \"GET /product/27 HTTP/1.1\" 200 30577 \"-\" \"some client\" \"-\""
];
One possible way to model a graph from this table is to treat the source IP addresses as nodes and the web requests to resources as edges. You can use the parse operator to extract the columns you need for the graph and then you can create a graph that represents the network traffic and interactions between different sources and destinations. To create the graph, you can use the make-graph operator specifying the source and destination columns as the edge endpoints, and optionally providing additional columns as edge or node properties.
The following query creates a graph from the raw logs:
let parsedLogs = rawLogs
| parse rawLog with ipAddress: string " - - [" timestamp: datetime "] \"" httpVerb: string " " resource: string " " *
| project-away rawLog;
let edges = parsedLogs;
let nodes =
union
(parsedLogs
| distinct ipAddress
| project nodeId = ipAddress, label = "IP address"),
(parsedLogs | distinct resource | project nodeId = resource, label = "resource");
let graph = edges
| make-graph ipAddress --> resource with nodes on nodeId;
This query parses the raw logs and creates a directed graph where the nodes are either IP addresses or resources and each edge is a request from the source to the destination, with the timestamp and HTTP verb as edge properties.
Once the graph is created, you can use the graph-match operator to query the graph data using patterns, filters, and projections. For example, you can create a pattern that makes a simple recommendation based on the resources that other IP addresses requested within the last five minutes, as follows:
graph
| graph-match (startIp)-[request]->(resource)<--(otherIP)-[otherRequest]->(otherResource)
where startIp.label == "IP address" and //start with an IP address
resource.nodeId != otherResource.nodeId and //recommending a different resource
startIp.nodeId != otherIP.nodeId and //only other IP addresses are interesting
(request.timestamp - otherRequest.timestamp < 5m) //filter on recommendations based on the last 5 minutes
project Recommendation=otherResource.nodeId
Output
Recommendation |
---|
/product/42 |
The query returns “/product/42” as a recommendation based on a raw text-based log.
Related content
8 - Limits and Errors
8.1 - Query consistency
Query consistency refers to how queries and updates are synchronized. There are two supported modes of query consistency:
Strong consistency: Strong consistency ensures immediate access to the most recent updates, such as data appends, deletions, and schema modifications. Strong consistency is the default consistency mode. Due to synchronization, this consistency mode performs slightly less well than weak consistency mode in terms of concurrency.
Weak consistency: With weak consistency, there may be a delay before query results reflect the latest database updates. Typically, this delay ranges from 1 to 2 minutes. Weak consistency can support higher query concurrency rates than strong consistency.
For example, if 1000 records are ingested each minute into a table in the database, queries over that table running with strong consistency will have access to the most-recently ingested records, whereas queries over that table running with weak consistency may not have access to some of records from the last few minutes.
Use cases for strong consistency
If you have a strong dependency on updates that occurred in the database in the last few minutes, use strong consistency.
For example, the following query counts the number of error records in the 5 minutes and triggers an alert that count is larger than 0. This use case is best handled with strong consistency, since your insights may be altered you don’t have access to records ingested in the past few minutes, as may be the case with weak consistency.
my_table
| where timestamp between(ago(5m)..now())
| where level == "error"
| count
In addition, strong consistency should be used when database metadata is large. For instance. there are millions of data extents in the database, using weak consistency would result in query heads downloading and deserializing extensive metadata artifacts from persistent storage, which may increase the likelihood of transient failures in downloads and related operations.
Use cases for weak consistency
If you don’t have a strong dependency on updates that occurred in the database in the last few minutes, and you need high query concurrency, use weak consistency.
For example, the following query counts the number of error records per week in the last 90 days. Weak consistency is appropriate in this case, since your insights are unlikely to be impacted records ingested in the past few minutes are omitted.
my_table
| where timestamp between(ago(90d) .. now())
| where level == "error"
| summarize count() by level, startofweek(Timestamp)
Weak consistency modes
The following table summarizes the four modes of weak query consistency.
Mode | Description |
---|---|
Random | Queries are routed randomly to one of the nodes in the cluster that can serve as a weakly consistent query head. |
Affinity by database | Queries within the same database are routed to the same weakly consistent query head, ensuring consistent execution for that database. |
Affinity by query text | Queries with the same query text hash are routed to the same weakly consistent query head, which is beneficial for leveraging query caching. |
Affinity by session ID | Queries with the same session ID hash are routed to the same weakly consistent query head, ensuring consistent execution within a session. |
Affinity by database
The affinity by database mode ensures that queries running against the same database are executed against the same version of the database, although not necessarily the most recent version of the database. This mode is useful when ensuring consistent execution within a specific database is important. However. there’s an imbalance in the number of queries across databases, then this mode may result in uneven load distribution.
Affinity by query text
The affinity by query text mode is beneficial when queries leverage the Query results cache. This mode routes repeating queries frequently executed by the same identity to the same query head, allowing them to benefit from cached results and reducing the load on the cluster.
Affinity by session ID
The affinity by session ID mode ensures that queries belonging to the same user activity or session are executed against the same version of the database, although not necessarily the most recent one. To use this mode, the session ID needs to be explicitly specified in each query’s client request properties. This mode is helpful in scenarios where consistent execution within a session is essential.
How to specify query consistency
You can specify the query consistency mode by the client sending the request or using a server side policy. If it isn’t specified by either, the default mode of strong consistency applies.
Client sending the request: Use the
queryconsistency
client request property. This method sets the query consistency mode for a specific query and doesn’t affect the overall effective consistency mode, which is determined by the default or the server-side policy. For more information, see client request properties.Server side policy: Use the
QueryConsistency
property of the Query consistency policy. This method sets the query consistency mode at the workload group level, which eliminates the need for users to specify the consistency mode in their client request properties and allows for enforcing desired consistency modes. For more information, see Query consistency policy.
Related content
- To customize parameters for queries running with weak consistency, use the Query weak consistency policy.
8.2 - Query limits
Kusto is an ad-hoc query engine that hosts large datasets and attempts to satisfy queries by holding all relevant data in-memory. There’s an inherent risk that queries will monopolize the service resources without bounds. Kusto provides several built-in protections in the form of default query limits. If you’re considering removing these limits, first determine whether you actually gain any value by doing so.
Limit on request concurrency
Request concurrency is a limit that is imposed on several requests running at the same time.
- The default value of the limit depends on the SKU the database is running on, and is calculated as:
Cores-Per-Node x 10
.- For example, for a database that’s set up on D14v2 SKU, where each machine has 16 vCores, the default limit is
16 cores x10 = 160
.
- For example, for a database that’s set up on D14v2 SKU, where each machine has 16 vCores, the default limit is
- The default value can be changed by configuring the request rate limit policy of the
default
workload group.- The actual number of requests that can run concurrently on a database depends on various factors. The most dominant factors are database SKU, database’s available resources, and usage patterns. The policy can be configured based on load tests performed on production-like usage patterns.
For more information, see Optimize for high concurrency with Azure Data Explorer.
Limit on result set size (result truncation)
Result truncation is a limit set by default on the result set returned by the query. Kusto limits the number of records returned to the client to 500,000, and the overall data size for those records to 64 MB. When either of these limits is exceeded, the query fails with a “partial query failure”. Exceeding overall data size will generate an exception with the message:
The Kusto DataEngine has failed to execute a query: 'Query result set has exceeded the internal data size limit 67108864 (E_QUERY_RESULT_SET_TOO_LARGE).'
Exceeding the number of records will fail with an exception that says:
The Kusto DataEngine has failed to execute a query: 'Query result set has exceeded the internal record count limit 500000 (E_QUERY_RESULT_SET_TOO_LARGE).'
There are several strategies for dealing with this error.
- Reduce the result set size by modifying the query to only return interesting data. This strategy is useful when the initial failing query is too “wide”. For example, the query doesn’t project away data columns that aren’t needed.
- Reduce the result set size by shifting post-query processing, such as aggregations, into the query itself. The strategy is useful in scenarios where the output of the query is fed to another processing system, and that then does other aggregations.
- Switch from queries to using data export when you want to export large sets of data from the service.
- Instruct the service to suppress this query limit using
set
statements listed below or flags in client request properties.
Methods for reducing the result set size produced by the query include:
- Use the summarize operator group and aggregate over similar records in the query output. Potentially sample some columns by using the take_any aggregation function.
- Use a take operator to sample the query output.
- Use the substring function to trim wide free-text columns.
- Use the project operator to drop any uninteresting column from the result set.
You can disable result truncation by using the notruncation
request option.
We recommend that some form of limitation is still put in place.
For example:
set notruncation;
MyTable | take 1000000
It’s also possible to have more refined control over result truncation
by setting the value of truncationmaxsize
(maximum data size in bytes,
defaults to 64 MB) and truncationmaxrecords
(maximum number of records,
defaults to 500,000). For example, the following query sets result truncation
to happen at either 1,105 records or 1 MB, whichever is exceeded.
set truncationmaxsize=1048576;
set truncationmaxrecords=1105;
MyTable | where User=="UserId1"
Removing the result truncation limit means that you intend to move bulk data out of Kusto.
You can remove the result truncation limit either for export purposes by using the .export
command or for later aggregation. If you choose later aggregation, consider aggregating by using Kusto.
Kusto provides a number of client libraries that can handle “infinitely large” results by streaming them to the caller. Use one of these libraries, and configure it to streaming mode. For example, use the .NET Framework client (Microsoft.Azure.Kusto.Data) and either set the streaming property of the connection string to true, or use the ExecuteQueryV2Async() call that always streams results. For an example of how to use ExecuteQueryV2Async(), see the HelloKustoV2 application.
You may also find the C# streaming ingestion sample application helpful.
Result truncation is applied by default, not just to the result stream returned to the client. It’s also applied by default to any subquery that one cluster issues to another cluster in a cross-cluster query, with similar effects.
It’s also applied by default to any subquery that one Eventhouse issues to another Eventhouse in a cross-Eventhouse query, with similar effects.
Setting multiple result truncation properties
The following apply when using set
statements, and/or when specifying flags in client request properties.
- If
notruncation
is set, and any oftruncationmaxsize
,truncationmaxrecords
, orquery_take_max_records
are also set -notruncation
is ignored. - If
truncationmaxsize
,truncationmaxrecords
and/orquery_take_max_records
are set multiple times - the lower value for each property applies.
Limit on memory consumed by query operators (E_RUNAWAY_QUERY)
Kusto limits the memory that each query operator can consume to protect against “runaway” queries.
This limit might be reached by some query operators, such as join
and summarize
, that operate by
holding significant data in memory. By default the limit is 5GB (per node), and it can be increased by setting the request option
maxmemoryconsumptionperiterator
:
set maxmemoryconsumptionperiterator=16106127360;
MyTable | summarize count() by Use
When this limit is reached, a partial query failure is emitted with a message that includes the text E_RUNAWAY_QUERY
.
The ClusterBy operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete E_RUNAWAY_QUERY.
The DemultiplexedResultSetCache operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).
The ExecuteAndCache operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).
The HashJoin operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).
The Sort operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).
The Summarize operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).
The TopNestedAggregator operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).
The TopNested operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).
If maxmemoryconsumptionperiterator
is set multiple times, for example in both client request properties and using a set
statement, the lower value applies.
The maximum supported value for this request option is 32212254720 (30 GB).
An additional limit that might trigger an E_RUNAWAY_QUERY
partial query failure is a limit on the max accumulated size of
strings held by a single operator. This limit cannot be overridden by the request option above:
Runaway query (E_RUNAWAY_QUERY). Aggregation over string column exceeded the memory budget of 8GB during evaluation.
When this limit is exceeded, most likely the relevant query operator is a join
, summarize
, or make-series
.
To work-around the limit, one should modify the query to use the shuffle query strategy.
(This is also likely to improve the performance of the query.)
In all cases of E_RUNAWAY_QUERY
, an additional option (beyond increasing the limit by setting the request option and changing the
query to use a shuffle strategy) is to switch to sampling.
The two queries below show how to do the sampling. The first query is a statistical sampling, using a random number generator. The second query is deterministic sampling, done by hashing some column from the dataset, usually some ID.
T | where rand() < 0.1 | ...
T | where hash(UserId, 10) == 1 | ...
Limit on memory per node
Max memory per query per node is another limit used to protect against “runaway” queries. This limit, represented by the request option max_memory_consumption_per_query_per_node
, sets an upper bound
on the amount of memory that can be used on a single node for a specific query.
set max_memory_consumption_per_query_per_node=68719476736;
MyTable | ...
If max_memory_consumption_per_query_per_node
is set multiple times, for example in both client request properties and using a set
statement, the lower value applies.
If the query uses summarize
, join
, or make-series
operators, you can use the shuffle query strategy to reduce memory pressure on a single machine.
Limit execution timeout
Server timeout is a service-side timeout that is applied to all requests. Timeout on running requests (queries and management commands) is enforced at multiple points in the Kusto:
- client library (if used)
- service endpoint that accepts the request
- service engine that processes the request
By default, timeout is set to four minutes for queries, and 10 minutes for management commands. This value can be increased if needed (capped at one hour).
- Various client tools support changing the timeout as part of their global or per-connection settings. For example, in Kusto.Explorer, use Tools > Options* > Connections > Query Server Timeout.
- Programmatically, SDKs support setting the timeout through the
servertimeout
property. For example, in .NET SDK this is done through a client request property, by setting a value of typeSystem.TimeSpan
.
Notes about timeouts
- On the client side, the timeout is applied from the request being created until the time that the response starts arriving to the client. The time it takes to read the payload back at the client isn’t treated as part of the timeout. It depends on how quickly the caller pulls the data from the stream.
- Also on the client side, the actual timeout value used is slightly higher than the server timeout value requested by the user. This difference, is to allow for network latencies.
- To automatically use the maximum allowed request timeout, set the client request property
norequesttimeout
totrue
.
Limit on query CPU resource usage
Kusto lets you run queries and use all the available CPU resources that the database has. It attempts to do a fair round-robin between queries if more than one is running. This method yields the best performance for query-defined functions. At other times, you may want to limit the CPU resources used for a particular query. If you run a “background job”, for example, the system might tolerate higher latencies to give concurrent inline queries high priority.
Kusto supports specifying two request properties when running a query. The properties are query_fanout_threads_percent and query_fanout_nodes_percent. Both properties are integers that default to the maximum value (100), but may be reduced for a specific query to some other value.
The first, query_fanout_threads_percent, controls the fanout factor for thread use. When this property is set 100%, all CPUs will be assigned on each node. For example, 16 CPUs deployed on Azure D14 nodes. When this property is set to 50%, then half of the CPUs will be used, and so on. The numbers are rounded up to a whole CPU, so it’s safe to set the property value to 0.
The second, query_fanout_nodes_percent, controls how many of the query nodes to use per subquery distribution operation. It functions in a similar manner.
If query_fanout_nodes_percent
or query_fanout_threads_percent
are set multiple times, for example, in both client request properties and using a set
statement - the lower value for each property applies.
Limit on query complexity
During query execution, the query text is transformed into a tree of relational operators representing the query. If the tree depth exceeds an internal threshold, the query is considered too complex for processing, and will fail with an error code. The failure indicates that the relational operators tree exceeds its limits.
The following examples show common query patterns that can cause the query to exceed this limit and fail:
- a long list of binary operators that are chained together. For example:
T
| where Column == "value1" or
Column == "value2" or
.... or
Column == "valueN"
For this specific case, rewrite the query using the in()
operator.
T
| where Column in ("value1", "value2".... "valueN")
- a query which has a union operator that is running too wide schema analysis especially that the default flavor of union is to return “outer” union schema (meaning – that output will include all columns of the underlying table).
The suggestion in this case is to review the query and reduce the columns being used by the query.
Related content
8.3 - Partial query failures
8.3.1 - Kusto query result set exceeds internal limit
A query result set has exceeded the internal … limit is a kind of partial query failure that happens when the query’s result has exceeded one of two limits:
- A limit on the number of records (
record count limit
, set by default to 500,000) - A limit on the total amount of data (
data size limit
, set by default to 67,108,864 (64MB))
There are several possible courses of action:
- Change the query to consume fewer resources. For example, you can:
- Limit the number of records returned by the query using the take operator or adding additional where clauses.
- Try to reduce the number of columns returned by the query. Use the project operator, the project-away operator, or the project-keep operator.
- Use the summarize operator to get aggregated data
- Increase the relevant query limit temporarily for that query. For more information, see Result truncation under query limits.
[!NOTE] We don’t recommend that you increase the query limit, since the limits exist to protect the database. The limits make sure that a single query doesn’t disrupt concurrent queries running on the database.
8.3.2 - Overflows
An overflow occurs when the result of a computation is too large for the destination type. The overflow usually leads to a partial query failure.
For example, the following query will result in an overflow.
let Weight = 92233720368547758;
range x from 1 to 3 step 1
| summarize percentilesw(x, Weight * 100, 50)
Kusto’s percentilesw()
implementation accumulates the Weight
expression for values that are “close enough”.
In this case, the accumulation triggers an overflow because it doesn’t fit into a signed 64-bit integer.
Usually, overflows are a result of a “bug” in the query, since Kusto uses 64-bit types for arithmetic computations. The best course of action is to look at the error message, and identify the function or aggregation that triggered the overflow. Make sure the input arguments evaluate to values that make sense.
8.3.3 - Runaway queries
A runaway query is a kind of partial query failure that happens when some internal query limit was exceeded during query execution.
For example, the following error may be reported:
HashJoin operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete.
There are several possible courses of action.
- Change the query to consume fewer resources. For example, if the error indicates that the query result set is too large, you can:
- Limit the number of records returned by the query by
- Using the take operator
- Adding additional where clauses
- Reduce the number of columns returned by the query by
- Using the project operator
- Using the project-away operator
- Using the project-keep operator
- Use the summarize operator to get aggregated data.
- Limit the number of records returned by the query by
- Increase the relevant query limit temporarily for that query. For more information, see query limits - limit on memory per iterator. This method, however, isn’t recommended. The limits exist to protect the cluster and to make sure that a single query doesn’t disrupt concurrent queries running on the cluster.
- Increase the relevant query limit temporarily for that query. For more information, see query limits - limit on memory per iterator. This method, however, isn’t recommended. The limits exist to protect the Eventhouse and to make sure that a single query doesn’t disrupt concurrent queries running on the Eventhouse.
9 - Plugins
9.1 - Data reshaping plugins
9.1.1 - bag_unpack plugin
The bag_unpack
plugin unpacks a single column of type dynamic
, by treating each property bag top-level slot as a column. The plugin is invoked with the evaluate
operator.
Syntax
T |
evaluate
bag_unpack(
Column [,
OutputColumnPrefix ] [,
columnsConflict ] [,
ignoredProperties ] )
[:
OutputSchema]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose column Column is to be unpacked. |
Column | dynamic | ✔️ | The column of T to unpack. |
OutputColumnPrefix | string | A common prefix to add to all columns produced by the plugin. | |
columnsConflict | string | The direction for column conflict resolution. Valid values:error - Query produces an error (default)replace_source - Source column is replacedkeep_source - Source column is kept | |
ignoredProperties | dynamic | An optional set of bag properties to be ignored. } | |
OutputSchema | The names and types for the expected columns of the bag_unpack plugin output. Specifying the expected schema optimizes query execution by not having to first run the actual query to explore the schema. For syntax information, see Output schema syntax. |
Output schema syntax
(
ColumnName :
ColumnType [,
…] )
To add all columns of the input table to the plugin output, use a wildcard *
as the first parameter, as follows:
(
*
,
ColumnName :
ColumnType [,
…] )
Returns
The bag_unpack
plugin returns a table with as many records as its tabular input (T). The schema of the table is the same as the schema of its tabular input with the following modifications:
- The specified input column (Column) is removed.
- The schema is extended with as many columns as there are distinct slots in
the top-level property bag values of T. The name of each column corresponds
to the name of each slot, optionally prefixed by OutputColumnPrefix. Its
type is either the type of the slot, if all values of the same slot have the
same type, or
dynamic
, if the values differ in type.
Examples
Expand a bag
datatable(d:dynamic)
[
dynamic({"Name": "John", "Age":20}),
dynamic({"Name": "Dave", "Age":40}),
dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d)
Output
Age | Name |
---|---|
20 | John |
40 | Dave |
30 | Jasmine |
Expand a bag with OutputColumnPrefix
Expand a bag and use the OutputColumnPrefix
option to produce column names that begin with the prefix ‘Property_’.
datatable(d:dynamic)
[
dynamic({"Name": "John", "Age":20}),
dynamic({"Name": "Dave", "Age":40}),
dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, 'Property_')
Output
Property_Age | Property_Name |
---|---|
20 | John |
40 | Dave |
30 | Jasmine |
Expand a bag with columnsConflict
Expand a bag and use the columnsConflict
option to resolve conflicts between existing columns and columns produced by the bag_unpack()
operator.
datatable(Name:string, d:dynamic)
[
'Old_name', dynamic({"Name": "John", "Age":20}),
'Old_name', dynamic({"Name": "Dave", "Age":40}),
'Old_name', dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, columnsConflict='replace_source') // Use new name
Output
Age | Name |
---|---|
20 | John |
40 | Dave |
30 | Jasmine |
datatable(Name:string, d:dynamic)
[
'Old_name', dynamic({"Name": "John", "Age":20}),
'Old_name', dynamic({"Name": "Dave", "Age":40}),
'Old_name', dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, columnsConflict='keep_source') // Keep old name
Output
Age | Name |
---|---|
20 | Old_name |
40 | Old_name |
30 | Old_name |
Expand a bag with ignoredProperties
Expand a bag and use the ignoredProperties
option to ignore certain properties in the property bag.
datatable(d:dynamic)
[
dynamic({"Name": "John", "Age":20, "Address": "Address-1" }),
dynamic({"Name": "Dave", "Age":40, "Address": "Address-2"}),
dynamic({"Name": "Jasmine", "Age":30, "Address": "Address-3"}),
]
// Ignore 'Age' and 'Address' properties
| evaluate bag_unpack(d, ignoredProperties=dynamic(['Address', 'Age']))
Output
Name |
---|
John |
Dave |
Jasmine |
Expand a bag with a query-defined OutputSchema
Expand a bag and use the OutputSchema
option to allow various optimizations to be evaluated before running the actual query.
datatable(d:dynamic)
[
dynamic({"Name": "John", "Age":20}),
dynamic({"Name": "Dave", "Age":40}),
dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d) : (Name:string, Age:long)
Output
Name | Age |
---|---|
John | 20 |
Dave | 40 |
Jasmine | 30 |
Expand a bag and use the OutputSchema
option to allow various optimizations to be evaluated before running the actual query. Use a wildcard *
to return all columns of the input table.
datatable(d:dynamic, Description: string)
[
dynamic({"Name": "John", "Age":20}), "Student",
dynamic({"Name": "Dave", "Age":40}), "Teacher",
dynamic({"Name": "Jasmine", "Age":30}), "Student",
]
| evaluate bag_unpack(d) : (*, Name:string, Age:long)
Output
Description | Name | Age |
---|---|---|
Student | John | 20 |
Teacher | Dave | 40 |
Student | Jasmine | 30 |
9.1.2 - narrow plugin
The narrow
plugin “unpivots” a wide table into a table with three columns:
- Row number
- Column type
- Column value (as
string
)
The narrow
plugin is designed mainly for display purposes, as it allows wide
tables to be displayed comfortably without the need of horizontal scrolling.
The plugin is invoked with the evaluate
operator.
Syntax
T | evaluate narrow()
Examples
The following example shows an easy way to read the output of the Kusto
.show diagnostics
management command.
.show diagnostics
| evaluate narrow()
The results of .show diagnostics
itself is a table with a single row and
33 columns. By using the narrow
plugin we “rotate” the output to something
like this:
Row | Column | Value |
---|---|---|
0 | IsHealthy | True |
0 | IsRebalanceRequired | False |
0 | IsScaleOutRequired | False |
0 | MachinesTotal | 2 |
0 | MachinesOffline | 0 |
0 | NodeLastRestartedOn | 2017-03-14 10:59:18.9263023 |
0 | AdminLastElectedOn | 2017-03-14 10:58:41.6741934 |
0 | ClusterWarmDataCapacityFactor | 0.130552847673333 |
0 | ExtentsTotal | 136 |
0 | DiskColdAllocationPercentage | 5 |
0 | InstancesTargetBasedOnDataCapacity | 2 |
0 | TotalOriginalDataSize | 5167628070 |
0 | TotalExtentSize | 1779165230 |
0 | IngestionsLoadFactor | 0 |
0 | IngestionsInProgress | 0 |
0 | IngestionsSuccessRate | 100 |
0 | MergesInProgress | 0 |
0 | BuildVersion | 1.0.6281.19882 |
0 | BuildTime | 2017-03-13 11:02:44.0000000 |
0 | ClusterDataCapacityFactor | 0.130552847673333 |
0 | IsDataWarmingRequired | False |
0 | RebalanceLastRunOn | 2017-03-21 09:14:53.8523455 |
0 | DataWarmingLastRunOn | 2017-03-21 09:19:54.1438800 |
0 | MergesSuccessRate | 100 |
0 | NotHealthyReason | [null] |
0 | IsAttentionRequired | False |
0 | AttentionRequiredReason | [null] |
0 | ProductVersion | KustoRelease_2017.03.13.2 |
0 | FailedIngestOperations | 0 |
0 | FailedMergeOperations | 0 |
0 | MaxExtentsInSingleTable | 64 |
0 | TableWithMaxExtents | KustoMonitoringPersistentDatabase.KustoMonitoringTable |
0 | WarmExtentSize | 1779165230 |
9.1.3 - pivot plugin
Rotates a table by turning the unique values from one column in the input table into multiple columns in the output table and performs aggregations as required on any remaining column values that will appear in the final output.
Syntax
T | evaluate pivot(
pivotColumn[,
aggregationFunction] [,
column1 [,
column2 … ]])
[:
OutputSchema]
Parameters
Name | Type | Required | Description |
---|---|---|---|
pivotColumn | string | ✔️ | The column to rotate. Each unique value from this column will be a column in the output table. |
aggregationFunction | string | An aggregation function used to aggregate multiple rows in the input table to a single row in the output table. Currently supported functions: min() , max() , take_any() , sum() , dcount() , avg() , stdev() , variance() , make_list() , make_bag() , make_set() , count() . The default is count() . | |
column1, column2, … | string | A column name or comma-separated list of column names. The output table will contain an additional column per each specified column. The default is all columns other than the pivoted column and the aggregation column. | |
OutputSchema | The names and types for the expected columns of the pivot plugin output.Syntax: ( ColumnName : ColumnType [, …] ) Specifying the expected schema optimizes query execution by not having to first run the actual query to explore the schema. An error is raised if the run-time schema doesn’t match the OutputSchema schema. |
Returns
Pivot returns the rotated table with specified columns (column1, column2, …) plus all unique values of the pivot columns. Each cell for the pivoted columns will contain the aggregate function computation.
Examples
Pivot by a column
For each EventType and State starting with ‘AL’, count the number of events of this type in this state.
StormEvents
| project State, EventType
| where State startswith "AL"
| where EventType has "Wind"
| evaluate pivot(State)
Output
EventType | ALABAMA | ALASKA |
---|---|---|
Thunderstorm Wind | 352 | 1 |
High Wind | 0 | 95 |
Extreme Cold/Wind Chill | 0 | 10 |
Strong Wind | 22 | 0 |
Pivot by a column with aggregation function
For each EventType and State starting with ‘AR’, display the total number of direct deaths.
StormEvents
| where State startswith "AR"
| project State, EventType, DeathsDirect
| where DeathsDirect > 0
| evaluate pivot(State, sum(DeathsDirect))
Output
EventType | ARKANSAS | ARIZONA |
---|---|---|
Heavy Rain | 1 | 0 |
Thunderstorm Wind | 1 | 0 |
Lightning | 0 | 1 |
Flash Flood | 0 | 6 |
Strong Wind | 1 | 0 |
Heat | 3 | 0 |
Pivot by a column with aggregation function and a single additional column
Result is identical to previous example.
StormEvents
| where State startswith "AR"
| project State, EventType, DeathsDirect
| where DeathsDirect > 0
| evaluate pivot(State, sum(DeathsDirect), EventType)
Output
EventType | ARKANSAS | ARIZONA |
---|---|---|
Heavy Rain | 1 | 0 |
Thunderstorm Wind | 1 | 0 |
Lightning | 0 | 1 |
Flash Flood | 0 | 6 |
Strong Wind | 1 | 0 |
Heat | 3 | 0 |
Specify the pivoted column, aggregation function, and multiple additional columns
For each event type, source, and state, sum the number of direct deaths.
StormEvents
| where State startswith "AR"
| where DeathsDirect > 0
| evaluate pivot(State, sum(DeathsDirect), EventType, Source)
Output
EventType | Source | ARKANSAS | ARIZONA |
---|---|---|---|
Heavy Rain | Emergency Manager | 1 | 0 |
Thunderstorm Wind | Emergency Manager | 1 | 0 |
Lightning | Newspaper | 0 | 1 |
Flash Flood | Trained Spotter | 0 | 2 |
Flash Flood | Broadcast Media | 0 | 3 |
Flash Flood | Newspaper | 0 | 1 |
Strong Wind | Law Enforcement | 1 | 0 |
Heat | Newspaper | 3 | 0 |
Pivot with a query-defined output schema
The following example selects specific columns in the StormEvents table. It uses an explicit schema definition that allows various optimizations to be evaluated before running the actual query.
StormEvents
| project State, EventType
| where EventType has "Wind"
| evaluate pivot(State): (EventType:string, ALABAMA:long, ALASKA:long)
Output
EventType | ALABAMA | ALASKA |
---|---|---|
Thunderstorm Wind | 352 | 1 |
High Wind | 0 | 95 |
Marine Thunderstorm Wind | 0 | 0 |
Strong Wind | 22 | 0 |
Extreme Cold/Wind Chill | 0 | 10 |
Cold/Wind Chill | 0 | 0 |
Marine Strong Wind | 0 | 0 |
Marine High Wind | 0 | 0 |
9.2 - General plugins
9.2.1 - dcount_intersect plugin
Calculates intersection between N sets based on hll
values (N in range of [2..16]), and returns N dcount
values. The plugin is invoked with the evaluate
operator.
Syntax
T | evaluate
dcount_intersect(
hll_1, hll_2, [,
hll_3,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
hll_i | The values of set Si calculated with the hll() function. |
Returns
Returns a table with N dcount
values (per column, representing set intersections).
Column names are s0, s1, … (until n-1).
Given sets S1, S2, .. Sn return values will be representing distinct counts of:
S1,
S1 ∩ S2,
S1 ∩ S2 ∩ S3,
… ,
S1 ∩ S2 ∩ … ∩ Sn
Examples
// Generate numbers from 1 to 100
range x from 1 to 100 step 1
| extend isEven = (x % 2 == 0), isMod3 = (x % 3 == 0), isMod5 = (x % 5 == 0)
// Calculate conditional HLL values (note that '0' is included in each of them as additional value, so we will subtract it later)
| summarize hll_even = hll(iif(isEven, x, 0), 2),
hll_mod3 = hll(iif(isMod3, x, 0), 2),
hll_mod5 = hll(iif(isMod5, x, 0), 2)
// Invoke the plugin that calculates dcount intersections
| evaluate dcount_intersect(hll_even, hll_mod3, hll_mod5)
| project evenNumbers = s0 - 1, // 100 / 2 = 50
even_and_mod3 = s1 - 1, // gcd(2,3) = 6, therefor: 100 / 6 = 16
even_and_mod3_and_mod5 = s2 - 1 // gcd(2,3,5) is 30, therefore: 100 / 30 = 3
Output
evenNumbers | even_and_mod3 | even_and_mod3_and_mod5 |
---|---|---|
50 | 16 | 3 |
Related content
9.2.2 - infer_storage_schema plugin
This plugin infers the schema of external data, and returns it as CSL schema string. The string can be used when creating external tables. The plugin is invoked with the evaluate
operator.
Authentication and authorization
In the properties of the request, you specify storage connection strings to access. Each storage connection string specifies the authorization method to use for access to the storage. Depending on the authorization method, the principal may need to be granted permissions on the external storage to perform the schema inference.
The following table lists the supported authentication methods and any required permissions by storage type.
Authentication method | Azure Blob Storage / Data Lake Storage Gen2 | Data Lake Storage Gen1 |
---|---|---|
Impersonation | Storage Blob Data Reader | Reader |
Shared Access (SAS) token | List + Read | This authentication method isn’t supported in Gen1. |
Microsoft Entra access token | ||
Storage account access key | This authentication method isn’t supported in Gen1. |
Syntax
evaluate
infer_storage_schema(
Options )
Parameters
Name | Type | Required | Description |
---|---|---|---|
Options | dynamic | ✔️ | A property bag specifying the properties of the request. |
Supported properties of the request
Name | Type | Required | Description |
---|---|---|---|
StorageContainers | dynamic | ✔️ | An array of storage connection strings that represent prefix URI for stored data artifacts. |
DataFormat | string | ✔️ | One of the supported data formats. |
FileExtension | string | If specified, the function only scans files ending with this file extension. Specifying the extension may speed up the process or eliminate data reading issues. | |
FileNamePrefix | string | If specified, the function only scans files starting with this prefix. Specifying the prefix may speed up the process. | |
Mode | string | The schema inference strategy. A value of: any , last , all . The function infers the data schema from the first found file, from the last written file, or from all files respectively. The default value is last . | |
InferenceOptions | dynamic | More inference options. Valid options: UseFirstRowAsHeader for delimited file formats. For example, 'InferenceOptions': {'UseFirstRowAsHeader': true} . |
Returns
The infer_storage_schema
plugin returns a single result table containing a single row/column containing CSL schema string.
Example
let options = dynamic({
'StorageContainers': [
h@'https://storageaccount.blob.core.windows.net/MobileEvents;secretKey'
],
'FileExtension': '.parquet',
'FileNamePrefix': 'part-',
'DataFormat': 'parquet'
});
evaluate infer_storage_schema(options)
Output
CslSchema |
---|
app_id:string, user_id:long, event_time:datetime, country:string, city:string, device_type:string, device_vendor:string, ad_network:string, campaign:string, site_id:string, event_type:string, event_name:string, organic:string, days_from_install:int, revenue:real |
Use the returned schema in external table definition:
.create external table MobileEvents(
app_id:string, user_id:long, event_time:datetime, country:string, city:string, device_type:string, device_vendor:string, ad_network:string, campaign:string, site_id:string, event_type:string, event_name:string, organic:string, days_from_install:int, revenue:real
)
kind=blob
partition by (dt:datetime = bin(event_time, 1d), app:string = app_id)
pathformat = ('app=' app '/dt=' datetime_pattern('yyyyMMdd', dt))
dataformat = parquet
(
h@'https://storageaccount.blob.core.windows.net/MovileEvents;secretKey'
)
Related content
9.2.3 - infer_storage_schema_with_suggestions plugin
This infer_storage_schema_with_suggestions
plugin infers the schema of external data and returns a JSON object. For each column, the object provides inferred type, a recommended type, and the recommended mapping transformation. The recommended type and mapping are provided by the suggestion logic that determines the optimal type using the following logic:
- Identity columns: If the inferred type for a column is
long
and the column name ends withid
, the suggested type isstring
since it provides optimized indexing for identity columns where equality filters are common. - Unix datetime columns: If the inferred type for a column is
long
and one of the unix-time to datetime mapping transformations produces a valid datetime value, the suggested type isdatetime
and the suggestedApplicableTransformationMapping
mapping is the one that produced a valid datetime value.
The plugin is invoked with the evaluate
operator. To obtain the table schema that uses the inferred schema for Create and alter Azure Storage external tables without suggestions, use the infer_storage_schema plugin.
Authentication and authorization
In the properties of the request, you specify storage connection strings to access. Each storage connection string specifies the authorization method to use for access to the storage. Depending on the authorization method, the principal may need to be granted permissions on the external storage to perform the schema inference.
The following table lists the supported authentication methods and any required permissions by storage type.
Authentication method | Azure Blob Storage / Data Lake Storage Gen2 | Data Lake Storage Gen1 |
---|---|---|
Impersonation | Storage Blob Data Reader | Reader |
Shared Access (SAS) token | List + Read | This authentication method isn’t supported in Gen1. |
Microsoft Entra access token | ||
Storage account access key | This authentication method isn’t supported in Gen1. |
Syntax
evaluate
infer_storage_schema_with_suggestions(
Options )
Parameters
Name | Type | Required | Description |
---|---|---|---|
Options | dynamic | ✔️ | A property bag specifying the properties of the request. |
Supported properties of the request
Name | Type | Required | Description |
---|---|---|---|
StorageContainers | dynamic | ✔️ | An array of storage connection strings that represent prefix URI for stored data artifacts. |
DataFormat | string | ✔️ | One of the supported Data formats supported for ingestion |
FileExtension | string | If specified, the function only scans files ending with this file extension. Specifying the extension may speed up the process or eliminate data reading issues. | |
FileNamePrefix | string | If specified, the function only scans files starting with this prefix. Specifying the prefix may speed up the process. | |
Mode | string | The schema inference strategy. A value of: any , last , all . The function infers the data schema from the first found file, from the last written file, or from all files respectively. The default value is last . | |
InferenceOptions | dynamic | More inference options. Valid options: UseFirstRowAsHeader for delimited file formats. For example, 'InferenceOptions': {'UseFirstRowAsHeader': true} . |
Returns
The infer_storage_schema_with_suggestions
plugin returns a single result table containing a single row/column containing a JSON string.
Example
let options = dynamic({
'StorageContainers': [
h@'https://storageaccount.blob.core.windows.net/MobileEvents;secretKey'
],
'FileExtension': '.json',
'FileNamePrefix': 'js-',
'DataFormat': 'json'
});
evaluate infer_storage_schema_with_suggestions(options)
Example input data
{
"source": "DataExplorer",
"created_at": "2022-04-10 15:47:57",
"author_id": 739144091473215488,
"time_millisec":1547083647000
}
Output
{
"Columns": [
{
"OriginalColumn": {
"Name": "source",
"CslType": {
"type": "string",
"IsNumeric": false,
"IsSummable": false
}
},
"RecommendedColumn": {
"Name": "source",
"CslType": {
"type": "string",
"IsNumeric": false,
"IsSummable": false
}
},
"ApplicableTransformationMapping": "None"
},
{
"OriginalColumn": {
"Name": "created_at",
"CslType": {
"type": "datetime",
"IsNumeric": false,
"IsSummable": true
}
},
"RecommendedColumn": {
"Name": "created_at",
"CslType": {
"type": "datetime",
"IsNumeric": false,
"IsSummable": true
}
},
"ApplicableTransformationMapping": "None"
},
{
"OriginalColumn": {
"Name": "author_id",
"CslType": {
"type": "long",
"IsNumeric": true,
"IsSummable": true
}
},
"RecommendedColumn": {
"Name": "author_id",
"CslType": {
"type": "string",
"IsNumeric": false,
"IsSummable": false
}
},
"ApplicableTransformationMapping": "None"
},
{
"OriginalColumn": {
"Name": "time_millisec",
"CslType": {
"type": "long",
"IsNumeric": true,
"IsSummable": true
}
},
"RecommendedColumn": {
"Name": "time_millisec",
"CslType": {
"type": "datetime",
"IsNumeric": false,
"IsSummable": true
}
},
"ApplicableTransformationMapping": "DateTimeFromUnixMilliseconds"
}
]
}
Related content
9.2.4 - ipv4_lookup plugin
The ipv4_lookup
plugin looks up an IPv4 value in a lookup table and returns rows with matched values. The plugin is invoked with the evaluate
operator.
Syntax
T |
evaluate
ipv4_lookup(
LookupTable ,
SourceIPv4Key ,
IPv4LookupKey [,
ExtraKey1 [.. ,
ExtraKeyN [,
return_unmatched ]]] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose column SourceIPv4Key is used for IPv4 matching. |
LookupTable | string | ✔️ | Table or tabular expression with IPv4 lookup data, whose column LookupKey is used for IPv4 matching. IPv4 values can be masked using IP-prefix notation. |
SourceIPv4Key | string | ✔️ | The column of T with IPv4 string to be looked up in LookupTable. IPv4 values can be masked using IP-prefix notation. |
IPv4LookupKey | string | ✔️ | The column of LookupTable with IPv4 string that is matched against each SourceIPv4Key value. |
ExtraKey1 .. ExtraKeyN | string | Additional column references that are used for lookup matches. Similar to join operation: records with equal values are considered matching. Column name references must exist both is source table T and LookupTable . | |
return_unmatched | bool | A boolean flag that defines if the result should include all or only matching rows (default: false - only matching rows returned). |
Returns
The ipv4_lookup
plugin returns a result of join (lookup) based on IPv4 key. The schema of the table is the union of the source table and the lookup table, similar to the result of the lookup
operator.
If the return_unmatched argument is set to true
, the resulting table includes both matched and unmatched rows (filled with nulls).
If the return_unmatched argument is set to false
, or omitted (the default value of false
is used), the resulting table has as many records as matching results. This variant of lookup has better performance compared to return_unmatched=true
execution.
Examples
IPv4 lookup - matching rows only
// IP lookup table: IP_Data
// Partial data from: https://raw.githubusercontent.com/datasets/geoip2-ipv4/master/data/geoip2-ipv4.csv
let IP_Data = datatable(network:string, continent_code:string ,continent_name:string, country_iso_code:string, country_name:string)
[
"111.68.128.0/17","AS","Asia","JP","Japan",
"5.8.0.0/19","EU","Europe","RU","Russia",
"223.255.254.0/24","AS","Asia","SG","Singapore",
"46.36.200.51/32","OC","Oceania","CK","Cook Islands",
"2.20.183.0/24","EU","Europe","GB","United Kingdom",
];
let IPs = datatable(ip:string)
[
'2.20.183.12', // United Kingdom
'5.8.1.2', // Russia
'192.165.12.17', // Unknown
];
IPs
| evaluate ipv4_lookup(IP_Data, ip, network)
Output
ip | network | continent_code | continent_name | country_iso_code | country_name |
---|---|---|---|---|---|
2.20.183.12 | 2.20.183.0/24 | EU | Europe | GB | United Kingdom |
5.8.1.2 | 5.8.0.0/19 | EU | Europe | RU | Russia |
IPv4 lookup - return both matching and nonmatching rows
// IP lookup table: IP_Data
// Partial data from:
// https://raw.githubusercontent.com/datasets/geoip2-ipv4/master/data/geoip2-ipv4.csv
let IP_Data = datatable(network:string,continent_code:string ,continent_name:string ,country_iso_code:string ,country_name:string )
[
"111.68.128.0/17","AS","Asia","JP","Japan",
"5.8.0.0/19","EU","Europe","RU","Russia",
"223.255.254.0/24","AS","Asia","SG","Singapore",
"46.36.200.51/32","OC","Oceania","CK","Cook Islands",
"2.20.183.0/24","EU","Europe","GB","United Kingdom",
];
let IPs = datatable(ip:string)
[
'2.20.183.12', // United Kingdom
'5.8.1.2', // Russia
'192.165.12.17', // Unknown
];
IPs
| evaluate ipv4_lookup(IP_Data, ip, network, return_unmatched = true)
Output
ip | network | continent_code | continent_name | country_iso_code | country_name |
---|---|---|---|---|---|
2.20.183.12 | 2.20.183.0/24 | EU | Europe | GB | United Kingdom |
5.8.1.2 | 5.8.0.0/19 | EU | Europe | RU | Russia |
192.165.12.17 |
IPv4 lookup - using source in external_data()
let IP_Data = external_data(network:string,geoname_id:long,continent_code:string,continent_name:string ,country_iso_code:string,country_name:string,is_anonymous_proxy:bool,is_satellite_provider:bool)
['https://raw.githubusercontent.com/datasets/geoip2-ipv4/master/data/geoip2-ipv4.csv'];
let IPs = datatable(ip:string)
[
'2.20.183.12', // United Kingdom
'5.8.1.2', // Russia
'192.165.12.17', // Sweden
];
IPs
| evaluate ipv4_lookup(IP_Data, ip, network, return_unmatched = true)
Output
ip | network | geoname_id | continent_code | continent_name | country_iso_code | country_name | is_anonymous_proxy | is_satellite_provider |
---|---|---|---|---|---|---|---|---|
2.20.183.12 | 2.20.183.0/24 | 2635167 | EU | Europe | GB | United Kingdom | 0 | 0 |
5.8.1.2 | 5.8.0.0/19 | 2017370 | EU | Europe | RU | Russia | 0 | 0 |
192.165.12.17 | 192.165.8.0/21 | 2661886 | EU | Europe | SE | Sweden | 0 | 0 |
IPv4 lookup - using extra columns for matching
let IP_Data = external_data(network:string,geoname_id:long,continent_code:string,continent_name:string ,country_iso_code:string,country_name:string,is_anonymous_proxy:bool,is_satellite_provider:bool)
['https://raw.githubusercontent.com/datasets/geoip2-ipv4/master/data/geoip2-ipv4.csv'];
let IPs = datatable(ip:string, continent_name:string, country_iso_code:string)
[
'2.20.183.12', 'Europe', 'GB', // United Kingdom
'5.8.1.2', 'Europe', 'RU', // Russia
'192.165.12.17', 'Europe', '', // Sweden is 'SE' - so it won't be matched
];
IPs
| evaluate ipv4_lookup(IP_Data, ip, network, continent_name, country_iso_code)
Output
ip | continent_name | country_iso_code | network | geoname_id | continent_code | country_name | is_anonymous_proxy | is_satellite_provider |
---|---|---|---|---|---|---|---|---|
2.20.183.12 | Europe | GB | 2.20.183.0/24 | 2635167 | EU | United Kingdom | 0 | 0 |
5.8.1.2 | Europe | RU | 5.8.0.0/19 | 2017370 | EU | Russia | 0 | 0 |
Related content
- Overview of IPv4/IPv6 functions
- Overview of IPv4 text match functions
9.2.5 - ipv6_lookup plugin
The ipv6_lookup
plugin looks up an IPv6 value in a lookup table and returns rows with matched values. The plugin is invoked with the evaluate
operator.
Syntax
T |
evaluate
ipv6_lookup(
LookupTable ,
SourceIPv6Key ,
IPv6LookupKey [,
return_unmatched ] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose column SourceIPv6Key is used for IPv6 matching. |
LookupTable | string | ✔️ | Table or tabular expression with IPv6 lookup data, whose column LookupKey is used for IPv6 matching. IPv6 values can be masked using IP-prefix notation. |
SourceIPv6Key | string | ✔️ | The column of T with IPv6 string to be looked up in LookupTable. IPv6 values can be masked using IP-prefix notation. |
IPv6LookupKey | string | ✔️ | The column of LookupTable with IPv6 string that is matched against each SourceIPv6Key value. |
return_unmatched | bool | A boolean flag that defines if the result should include all or only matching rows (default: false - only matching rows returned). |
Returns
The ipv6_lookup
plugin returns a result of join (lookup) based on IPv6 key. The schema of the table is the union of the source table and the lookup table, similar to the result of the lookup
operator.
If the return_unmatched argument is set to true
, the resulting table includes both matched and unmatched rows (filled with nulls).
If the return_unmatched argument is set to false
, or omitted (the default value of false
is used), the resulting table has as many records as matching results. This variant of lookup has better performance compared to return_unmatched=true
execution.
Examples
IPv6 lookup - matching rows only
// IP lookup table: IP_Data (the data is generated by ChatGPT).
let IP_Data = datatable(network:string, continent_code:string ,continent_name:string, country_iso_code:string, country_name:string)
[
"2001:0db8:85a3::/48","NA","North America","US","United States",
"2404:6800:4001::/48","AS","Asia","JP","Japan",
"2a00:1450:4001::/48","EU","Europe","DE","Germany",
"2800:3f0:4001::/48","SA","South America","BR","Brazil",
"2c0f:fb50:4001::/48","AF","Africa","ZA","South Africa",
"2607:f8b0:4001::/48","NA","North America","CA","Canada",
"2a02:26f0:4001::/48","EU","Europe","FR","France",
"2400:cb00:4001::/48","AS","Asia","IN","India",
"2801:0db8:85a3::/48","SA","South America","AR","Argentina",
"2a03:2880:4001::/48","EU","Europe","GB","United Kingdom"
];
let IPs = datatable(ip:string)
[
"2001:0db8:85a3:0000:0000:8a2e:0370:7334", // United States
"2404:6800:4001:0001:0000:8a2e:0370:7334", // Japan
"2a02:26f0:4001:0006:0000:8a2e:0370:7334", // France
"a5e:f127:8a9d:146d:e102:b5d3:c755:abcd", // N/A
"a5e:f127:8a9d:146d:e102:b5d3:c755:abce" // N/A
];
IPs
| evaluate ipv6_lookup(IP_Data, ip, network)
Output
network | continent_code | continent_name | country_iso_code | country_name | ip |
---|---|---|---|---|---|
2001:0db8:85a3::/48 | NA | North America | US | United States | 2001:0db8:85a3:0000:0000:8a2e:0370:7334 |
2404:6800:4001::/48 | AS | Asia | JP | Japan | 2404:6800:4001:0001:0000:8a2e:0370:7334 |
2a02:26f0:4001::/48 | EU | Europe | FR | France | 2a02:26f0:4001:0006:0000:8a2e:0370:7334 |
IPv6 lookup - return both matching and nonmatching rows
// IP lookup table: IP_Data (the data is generated by ChatGPT).
let IP_Data = datatable(network:string, continent_code:string ,continent_name:string, country_iso_code:string, country_name:string)
[
"2001:0db8:85a3::/48","NA","North America","US","United States",
"2404:6800:4001::/48","AS","Asia","JP","Japan",
"2a00:1450:4001::/48","EU","Europe","DE","Germany",
"2800:3f0:4001::/48","SA","South America","BR","Brazil",
"2c0f:fb50:4001::/48","AF","Africa","ZA","South Africa",
"2607:f8b0:4001::/48","NA","North America","CA","Canada",
"2a02:26f0:4001::/48","EU","Europe","FR","France",
"2400:cb00:4001::/48","AS","Asia","IN","India",
"2801:0db8:85a3::/48","SA","South America","AR","Argentina",
"2a03:2880:4001::/48","EU","Europe","GB","United Kingdom"
];
let IPs = datatable(ip:string)
[
"2001:0db8:85a3:0000:0000:8a2e:0370:7334", // United States
"2404:6800:4001:0001:0000:8a2e:0370:7334", // Japan
"2a02:26f0:4001:0006:0000:8a2e:0370:7334", // France
"a5e:f127:8a9d:146d:e102:b5d3:c755:abcd", // N/A
"a5e:f127:8a9d:146d:e102:b5d3:c755:abce" // N/A
];
IPs
| evaluate ipv6_lookup(IP_Data, ip, network, true)
Output
network | continent_code | continent_name | country_iso_code | country_name | ip |
---|---|---|---|---|---|
2001:0db8:85a3::/48 | NA | North America | US | United States | 2001:0db8:85a3:0000:0000:8a2e:0370:7334 |
2404:6800:4001::/48 | AS | Asia | JP | Japan | 2404:6800:4001:0001:0000:8a2e:0370:7334 |
2a02:26f0:4001::/48 | EU | Europe | FR | France | 2a02:26f0:4001:0006:0000:8a2e:0370:7334 |
a5e:f127:8a9d:146d:e102:b5d3:c755:abcd | |||||
a5e:f127:8a9d:146d:e102:b5d3:c755:abce |
Related content
- Overview of IPv6/IPv6 functions
9.2.6 - preview plugin
Returns a table with up to the specified number of rows from the input record set, and the total number of records in the input record set.
Syntax
T |
evaluate
preview(
NumberOfRows)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The table to preview. |
NumberOfRows | int | ✔️ | The number of rows to preview from the table. |
Returns
The preview
plugin returns two result tables:
- A table with up to the specified number of rows.
For example, the sample query above is equivalent to running
T | take 50
. - A table with a single row/column, holding the number of records in the
input record set.
For example, the sample query above is equivalent to running
T | count
.
Example
StormEvents | evaluate preview(5)
Table1
The following output table only includes the first 6 columns. To see the full result, run the query.
|StartTime|EndTime|EpisodeId|EventId|State|EventType|…| |–|–|–| |2007-12-30T16:00:00Z|2007-12-30T16:05:00Z|11749|64588|GEORGIA| Thunderstorm Wind|…| |2007-12-20T07:50:00Z|2007-12-20T07:53:00Z|12554|68796|MISSISSIPPI| Thunderstorm Wind|…| |2007-09-29T08:11:00Z|2007-09-29T08:11:00Z|11091|61032|ATLANTIC SOUTH| Waterspout|…| |2007-09-20T21:57:00Z|2007-09-20T22:05:00Z|11078|60913|FLORIDA| Tornado|…| |2007-09-18T20:00:00Z|2007-09-19T18:00:00Z|11074|60904|FLORIDA| Heavy Rain|…|
Table2
Count |
---|
59066 |
9.2.7 - schema_merge plugin
Merges tabular schema definitions into a unified schema.
Schema definitions are expected to be in the format produced by the getschema
operator.
The schema merge
operation joins columns in input schemas and tries to reduce
data types to common ones. If data types can’t be reduced, an error is displayed on the problematic column.
The plugin is invoked with the evaluate
operator.
Syntax
T
|
evaluate
schema_merge(
PreserveOrder)
Parameters
Name | Type | Required | Description |
---|---|---|---|
PreserveOrder | bool | When set to true , directs the plugin to validate the column order as defined by the first tabular schema that is kept. If the same column is in several schemas, the column ordinal must be like the column ordinal of the first schema that it appeared in. Default value is true . |
Returns
The schema_merge
plugin returns output similar to what getschema
operator returns.
Examples
Merge with a schema that has a new column appended.
let schema1 = datatable(Uri:string, HttpStatus:int)[] | getschema;
let schema2 = datatable(Uri:string, HttpStatus:int, Referrer:string)[] | getschema;
union schema1, schema2 | evaluate schema_merge()
Output
ColumnName | ColumnOrdinal | DataType | ColumnType |
---|---|---|---|
Uri | 0 | System.String | string |
HttpStatus | 1 | System.Int32 | int |
Referrer | 2 | System.String | string |
Merge with a schema that has different column ordering (HttpStatus
ordinal changes from 1
to 2
in the new variant).
let schema1 = datatable(Uri:string, HttpStatus:int)[] | getschema;
let schema2 = datatable(Uri:string, Referrer:string, HttpStatus:int)[] | getschema;
union schema1, schema2 | evaluate schema_merge()
Output
ColumnName | ColumnOrdinal | DataType | ColumnType |
---|---|---|---|
Uri | 0 | System.String | string |
Referrer | 1 | System.String | string |
HttpStatus | -1 | ERROR(unknown CSL type:ERROR(columns are out of order)) | ERROR(columns are out of order) |
Merge with a schema that has different column ordering, but with PreserveOrder
set to false
.
let schema1 = datatable(Uri:string, HttpStatus:int)[] | getschema;
let schema2 = datatable(Uri:string, Referrer:string, HttpStatus:int)[] | getschema;
union schema1, schema2 | evaluate schema_merge(PreserveOrder = false)
Output
ColumnName | ColumnOrdinal | DataType | ColumnType |
---|---|---|---|
Uri | 0 | System.String | string |
Referrer | 1 | System.String | string |
HttpStatus | 2 | System.Int32 | int |
9.3 - Language plugins
9.3.1 - Python plugin
9.3.2 - Python plugin packages
This article lists the available Python packages in the Python plugin. For more information, see Python plugin.
3.11.7 (Preview)
Python engine 3.11.7 + common data science and ML packages
Package | Version |
---|---|
annotated-types | 0.6.0 |
anytree | 2.12.1 |
arrow | 1.3.0 |
attrs | 23.2.0 |
blinker | 1.7.0 |
blis | 0.7.11 |
Bottleneck | 1.3.8 |
Brotli | 1.1.0 |
brotlipy | 0.7.0 |
catalogue | 2.0.10 |
certifi | 2024.2.2 |
cffi | 1.16.0 |
chardet | 5.2.0 |
charset-normalizer | 3.3.2 |
click | 8.1.7 |
cloudpathlib | 0.16.0 |
cloudpickle | 3.0.0 |
colorama | 0.4.6 |
coloredlogs | 15.0.1 |
confection | 0.1.4 |
contourpy | 1.2.1 |
cycler | 0.12.1 |
cymem | 2.0.8 |
Cython | 3.0.10 |
daal | 2024.3.0 |
daal4py | 2024.3.0 |
dask | 2024.4.2 |
diff-match-patch | 20230430 |
dill | 0.3.8 |
distributed | 2024.4.2 |
filelock | 3.13.4 |
flashtext | 2.7 |
Flask | 3.0.3 |
Flask-Compress | 1.15 |
flatbuffers | 24.3.25 |
fonttools | 4.51.0 |
fsspec | 2024.3.1 |
gensim | 4.3.2 |
humanfriendly | 10.0 |
idna | 3.7 |
importlib_metadata | 7.1.0 |
intervaltree | 3.1.0 |
itsdangerous | 2.2.0 |
jellyfish | 1.0.3 |
Jinja2 | 3.1.3 |
jmespath | 1.0.1 |
joblib | 1.4.0 |
json5 | 0.9.25 |
jsonschema | 4.21.1 |
jsonschema-specifications | 2023.12.1 |
kiwisolver | 1.4.5 |
langcodes | 3.4.0 |
language_data | 1.2.0 |
locket | 1.0.0 |
lxml | 5.2.1 |
marisa-trie | 1.1.0 |
MarkupSafe | 2.1.5 |
mlxtend | 0.23.1 |
mpmath | 1.3.0 |
msgpack | 1.0.8 |
murmurhash | 1.0.10 |
networkx | 3.3 |
nltk | 3.8.1 |
numpy | 1.26.4 |
onnxruntime | 1.17.3 |
packaging | 24.0 |
pandas | 2.2.2 |
partd | 1.4.1 |
patsy | 0.5.6 |
pillow | 10.3.0 |
platformdirs | 4.2.1 |
plotly | 5.21.0 |
preshed | 3.0.9 |
protobuf | 5.26.1 |
psutil | 5.9.8 |
pycparser | 2.22 |
pydantic | 2.7.1 |
pydantic_core | 2.18.2 |
pyfpgrowth | 1.0 |
pyparsing | 3.1.2 |
pyreadline3 | 3.4.1 |
python-dateutil | 2.9.0.post0 |
pytz | 2024.1 |
PyWavelets | 1.6.0 |
PyYAML | 6.0.1 |
queuelib | 1.6.2 |
referencing | 0.35.0 |
regex | 2024.4.16 |
requests | 2.31.0 |
requests-file | 2.0.0 |
rpds-py | 0.18.0 |
scikit-learn | 1.4.2 |
scipy | 1.13.0 |
sip | 6.8.3 |
six | 1.16.0 |
smart-open | 6.4.0 |
snowballstemmer | 2.2.0 |
sortedcollections | 2.1.0 |
sortedcontainers | 2.4.0 |
spacy | 3.7.4 |
spacy-legacy | 3.0.12 |
spacy-loggers | 1.0.5 |
srsly | 2.4.8 |
statsmodels | 0.14.2 |
sympy | 1.12 |
tbb | 2021.12.0 |
tblib | 3.0.0 |
tenacity | 8.2.3 |
textdistance | 4.6.2 |
thinc | 8.2.3 |
threadpoolctl | 3.4.0 |
three-merge | 0.1.1 |
tldextract | 5.1.2 |
toolz | 0.12.1 |
tornado | 6.4 |
tqdm | 4.66.2 |
typer | 0.9.4 |
types-python-dateutil | 2.9.0.20240316 |
typing_extensions | 4.11.0 |
tzdata | 2024.1 |
ujson | 5.9.0 |
Unidecode | 1.3.8 |
urllib3 | 2.2.1 |
wasabi | 1.1.2 |
weasel | 0.3.4 |
Werkzeug | 3.0.2 |
xarray | 2024.3.0 |
zict | 3.0.0 |
zipp | 3.18.1 |
zstandard | 0.22.0 |
3.11.7 DL (Preview)
Python engine 3.11.7 + common data science and ML packages + deep learning packages (tensorflow & torch)
Package | Version |
---|---|
absl-py | 2.1.0 |
alembic | 1.13.1 |
aniso8601 | 9.0.1 |
annotated-types | 0.6.0 |
anytree | 2.12.1 |
arch | 7.0.0 |
arrow | 1.3.0 |
astunparse | 1.6.3 |
attrs | 23.2.0 |
blinker | 1.7.0 |
blis | 0.7.11 |
Bottleneck | 1.3.8 |
Brotli | 1.1.0 |
brotlipy | 0.7.0 |
cachetools | 5.3.3 |
catalogue | 2.0.10 |
certifi | 2024.2.2 |
cffi | 1.16.0 |
chardet | 5.2.0 |
charset-normalizer | 3.3.2 |
click | 8.1.7 |
cloudpathlib | 0.16.0 |
cloudpickle | 3.0.0 |
colorama | 0.4.6 |
coloredlogs | 15.0.1 |
confection | 0.1.4 |
contourpy | 1.2.1 |
cycler | 0.12.1 |
cymem | 2.0.8 |
Cython | 3.0.10 |
daal | 2024.3.0 |
daal4py | 2024.3.0 |
dask | 2024.4.2 |
Deprecated | 1.2.14 |
diff-match-patch | 20230430 |
dill | 0.3.8 |
distributed | 2024.4.2 |
docker | 7.1.0 |
entrypoints | 0.4 |
filelock | 3.13.4 |
flashtext | 2.7 |
Flask | 3.0.3 |
Flask-Compress | 1.15 |
flatbuffers | 24.3.25 |
fonttools | 4.51.0 |
fsspec | 2024.3.1 |
gast | 0.5.4 |
gensim | 4.3.2 |
gitdb | 4.0.11 |
GitPython | 3.1.43 |
google-pasta | 0.2.0 |
graphene | 3.3 |
graphql-core | 3.2.3 |
graphql-relay | 3.2.0 |
greenlet | 3.0.3 |
grpcio | 1.64.0 |
h5py | 3.11.0 |
humanfriendly | 10.0 |
idna | 3.7 |
importlib-metadata | 7.0.0 |
iniconfig | 2.0.0 |
intervaltree | 3.1.0 |
itsdangerous | 2.2.0 |
jellyfish | 1.0.3 |
Jinja2 | 3.1.3 |
jmespath | 1.0.1 |
joblib | 1.4.0 |
json5 | 0.9.25 |
jsonschema | 4.21.1 |
jsonschema-specifications | 2023.12.1 |
keras | 3.3.3 |
kiwisolver | 1.4.5 |
langcodes | 3.4.0 |
language_data | 1.2.0 |
libclang | 18.1.1 |
locket | 1.0.0 |
lxml | 5.2.1 |
Mako | 1.3.5 |
marisa-trie | 1.1.0 |
Markdown | 3.6 |
markdown-it-py | 3.0.0 |
MarkupSafe | 2.1.5 |
mdurl | 0.1.2 |
ml-dtypes | 0.3.2 |
mlflow | 2.13.0 |
mlxtend | 0.23.1 |
mpmath | 1.3.0 |
msgpack | 1.0.8 |
murmurhash | 1.0.10 |
namex | 0.0.8 |
networkx | 3.3 |
nltk | 3.8.1 |
numpy | 1.26.4 |
onnxruntime | 1.17.3 |
opentelemetry-api | 1.24.0 |
opentelemetry-sdk | 1.24.0 |
opentelemetry-semantic-conventions | 0.45b0 |
opt-einsum | 3.3.0 |
optree | 0.11.0 |
packaging | 24.0 |
pandas | 2.2.2 |
partd | 1.4.1 |
patsy | 0.5.6 |
pillow | 10.3.0 |
platformdirs | 4.2.1 |
plotly | 5.21.0 |
pluggy | 1.5.0 |
preshed | 3.0.9 |
protobuf | 4.25.3 |
psutil | 5.9.8 |
pyarrow | 15.0.2 |
pycparser | 2.22 |
pydantic | 2.7.1 |
pydantic_core | 2.18.2 |
pyfpgrowth | 1.0 |
Pygments | 2.18.0 |
pyparsing | 3.1.2 |
pyreadline3 | 3.4.1 |
pytest | 8.2.1 |
python-dateutil | 2.9.0.post0 |
pytz | 2024.1 |
PyWavelets | 1.6.0 |
pywin32 | 306 |
PyYAML | 6.0.1 |
querystring-parser | 1.2.4 |
queuelib | 1.6.2 |
referencing | 0.35.0 |
regex | 2024.4.16 |
requests | 2.31.0 |
requests-file | 2.0.0 |
rich | 13.7.1 |
rpds-py | 0.18.0 |
rstl | 0.1.3 |
scikit-learn | 1.4.2 |
scipy | 1.13.0 |
seasonal | 0.3.1 |
sip | 6.8.3 |
six | 1.16.0 |
smart-open | 6.4.0 |
smmap | 5.0.1 |
snowballstemmer | 2.2.0 |
sortedcollections | 2.1.0 |
sortedcontainers | 2.4.0 |
spacy | 3.7.4 |
spacy-legacy | 3.0.12 |
spacy-loggers | 1.0.5 |
SQLAlchemy | 2.0.30 |
sqlparse | 0.5.0 |
srsly | 2.4.8 |
statsmodels | 0.14.2 |
sympy | 1.12 |
tbb | 2021.12.0 |
tblib | 3.0.0 |
tenacity | 8.2.3 |
tensorboard | 2.16.2 |
tensorboard-data-server | 0.7.2 |
tensorflow | 2.16.1 |
tensorflow-intel | 2.16.1 |
tensorflow-io-gcs-filesystem | 0.31.0 |
termcolor | 2.4.0 |
textdistance | 4.6.2 |
thinc | 8.2.3 |
threadpoolctl | 3.4.0 |
three-merge | 0.1.1 |
time-series-anomaly-detector | 0.2.7 |
tldextract | 5.1.2 |
toolz | 0.12.1 |
torch | 2.2.2 |
torchaudio | 2.2.2 |
torchvision | 0.17.2 |
tornado | 6.4 |
tqdm | 4.66.2 |
typer | 0.9.4 |
types-python-dateutil | 2.9.0.20240316 |
typing_extensions | 4.11.0 |
tzdata | 2024.1 |
ujson | 5.9.0 |
Unidecode | 1.3.8 |
urllib3 | 2.2.1 |
waitress | 3.0.0 |
wasabi | 1.1.2 |
weasel | 0.3.4 |
Werkzeug | 3.0.2 |
wrapt | 1.16.0 |
xarray | 2024.3.0 |
zict | 3.0.0 |
zipp | 3.18.1 |
zstandard | 0.22.0 |
3.10.8
Python engine 3.10.8 + common data science and ML packages
Package | Version |
---|---|
alembic | 1.11.1 |
anytree | 2.8.0 |
arrow | 1.2.3 |
attrs | 22.2.0 |
blis | 0.7.9 |
Bottleneck | 1.3.5 |
Brotli | 1.0.9 |
brotlipy | 0.7.0 |
catalogue | 2.0.8 |
certifi | 2022.12.7 |
cffi | 1.15.1 |
chardet | 5.0.0 |
charset-normalizer | 2.1.1 |
click | 8.1.3 |
cloudpickle | 2.2.1 |
colorama | 0.4.6 |
coloredlogs | 15.0.1 |
confection | 0.0.4 |
contourpy | 1.0.7 |
cycler | 0.11.0 |
cymem | 2.0.7 |
Cython | 0.29.28 |
daal | 2021.6.0 |
daal4py | 2021.6.3 |
dask | 2022.10.2 |
databricks-cli | 0.17.7 |
diff-match-patch | 20200713 |
dill | 0.3.6 |
distributed | 2022.10.2 |
docker | 6.1.3 |
entrypoints | 0.4 |
filelock | 3.9.1 |
flashtext | 2.7 |
Flask | 2.2.3 |
Flask-Compress | 1.13 |
flatbuffers | 23.3.3 |
fonttools | 4.39.0 |
fsspec | 2023.3.0 |
gensim | 4.2.0 |
gitdb | 4.0.10 |
GitPython | 3.1.31 |
greenlet | 2.0.2 |
HeapDict | 1.0.1 |
humanfriendly | 10.0 |
idna | 3.4 |
importlib-metadata | 6.7.0 |
intervaltree | 3.1.0 |
itsdangerous | 2.1.2 |
jellyfish | 0.9.0 |
Jinja2 | 3.1.2 |
jmespath | 1.0.1 |
joblib | 1.2.0 |
json5 | 0.9.10 |
jsonschema | 4.16.0 |
kiwisolver | 1.4.4 |
langcodes | 3.3.0 |
locket | 1.0.0 |
lxml | 4.9.1 |
Mako | 1.2.4 |
Markdown | 3.4.3 |
MarkupSafe | 2.1.2 |
mlflow | 2.4.1 |
mlxtend | 0.21.0 |
mpmath | 1.3.0 |
msgpack | 1.0.5 |
murmurhash | 1.0.9 |
networkx | 2.8.7 |
nltk | 3.7 |
numpy | 1.23.4 |
oauthlib | 3.2.2 |
onnxruntime | 1.13.1 |
packaging | 23.0 |
pandas | 1.5.1 |
partd | 1.3.0 |
pathy | 0.10.1 |
patsy | 0.5.3 |
Pillow | 9.4.0 |
pip | 23.0.1 |
platformdirs | 2.5.2 |
plotly | 5.11.0 |
ply | 3.11 |
preshed | 3.0.8 |
protobuf | 4.22.1 |
psutil | 5.9.3 |
pyarrow | 12.0.1 |
pycparser | 2.21 |
pydantic | 1.10.6 |
pyfpgrowth | 1.0 |
PyJWT | 2.7.0 |
pyparsing | 3.0.9 |
pyreadline3 | 3.4.1 |
pyrsistent | 0.19.3 |
python-dateutil | 2.8.2 |
pytz | 2022.7.1 |
PyWavelets | 1.4.1 |
pywin32 | 306 |
PyYAML | 6.0 |
querystring-parser | 1.2.4 |
queuelib | 1.6.2 |
regex | 2022.10.31 |
requests | 2.28.2 |
requests-file | 1.5.1 |
scikit-learn | 1.1.3 |
scipy | 1.9.3 |
setuptools | 67.6.0 |
sip | 6.7.3 |
six | 1.16.0 |
smart-open | 6.3.0 |
smmap | 5.0.0 |
snowballstemmer | 2.2.0 |
sortedcollections | 2.1.0 |
sortedcontainers | 2.4.0 |
spacy | 3.4.2 |
spacy-legacy | 3.0.12 |
spacy-loggers | 1.0.4 |
SQLAlchemy | 2.0.18 |
sqlparse | 0.4.4 |
srsly | 2.4.5 |
statsmodels | 0.13.2 |
sympy | 1.11.1 |
tabulate | 0.9.0 |
tbb | 2021.7.1 |
tblib | 1.7.0 |
tenacity | 8.2.2 |
textdistance | 4.5.0 |
thinc | 8.1.9 |
threadpoolctl | 3.1.0 |
three-merge | 0.1.1 |
tldextract | 3.4.0 |
toml | 0.10.2 |
toolz | 0.12.0 |
tornado | 6.1 |
tqdm | 4.65.0 |
typer | 0.4.2 |
typing_extensions | 4.5.0 |
ujson | 5.5.0 |
Unidecode | 1.3.6 |
urllib3 | 1.26.15 |
waitress | 2.1.2 |
wasabi | 0.10.1 |
websocket-client | 1.6.1 |
Werkzeug | 2.2.3 |
wheel | 0.40.0 |
xarray | 2022.10.0 |
zict | 2.2.0 |
zipp | 3.15.0 |
3.10.8 DL
Not supported
3.6.5 (Legacy)
Not supported
This article lists the available managed Python packages in the Python plugin. For more information, see Python plugin.
To create a custom image, see Create a custom image.
3.11.7 (Preview)
Python engine 3.11.7 + common data science and ML packages
Package | Version |
---|---|
annotated-types | 0.6.0 |
anytree | 2.12.1 |
arrow | 1.3.0 |
attrs | 23.2.0 |
blinker | 1.7.0 |
blis | 0.7.11 |
Bottleneck | 1.3.8 |
Brotli | 1.1.0 |
brotlipy | 0.7.0 |
catalogue | 2.0.10 |
certifi | 2024.2.2 |
cffi | 1.16.0 |
chardet | 5.2.0 |
charset-normalizer | 3.3.2 |
click | 8.1.7 |
cloudpathlib | 0.16.0 |
cloudpickle | 3.0.0 |
colorama | 0.4.6 |
coloredlogs | 15.0.1 |
confection | 0.1.4 |
contourpy | 1.2.1 |
cycler | 0.12.1 |
cymem | 2.0.8 |
Cython | 3.0.10 |
daal | 2024.3.0 |
daal4py | 2024.3.0 |
dask | 2024.4.2 |
diff-match-patch | 20230430 |
dill | 0.3.8 |
distributed | 2024.4.2 |
filelock | 3.13.4 |
flashtext | 2.7 |
Flask | 3.0.3 |
Flask-Compress | 1.15 |
flatbuffers | 24.3.25 |
fonttools | 4.51.0 |
fsspec | 2024.3.1 |
gensim | 4.3.2 |
humanfriendly | 10.0 |
idna | 3.7 |
importlib_metadata | 7.1.0 |
intervaltree | 3.1.0 |
itsdangerous | 2.2.0 |
jellyfish | 1.0.3 |
Jinja2 | 3.1.3 |
jmespath | 1.0.1 |
joblib | 1.4.0 |
json5 | 0.9.25 |
jsonschema | 4.21.1 |
jsonschema-specifications | 2023.12.1 |
kiwisolver | 1.4.5 |
langcodes | 3.4.0 |
language_data | 1.2.0 |
locket | 1.0.0 |
lxml | 5.2.1 |
marisa-trie | 1.1.0 |
MarkupSafe | 2.1.5 |
matplotlib | 3.8.4 |
mlxtend | 0.23.1 |
mpmath | 1.3.0 |
msgpack | 1.0.8 |
murmurhash | 1.0.10 |
networkx | 3.3 |
nltk | 3.8.1 |
numpy | 1.26.4 |
onnxruntime | 1.17.3 |
packaging | 24.0 |
pandas | 2.2.2 |
partd | 1.4.1 |
patsy | 0.5.6 |
pillow | 10.3.0 |
platformdirs | 4.2.1 |
plotly | 5.21.0 |
preshed | 3.0.9 |
protobuf | 5.26.1 |
psutil | 5.9.8 |
pycparser | 2.22 |
pydantic | 2.7.1 |
pydantic_core | 2.18.2 |
pyfpgrowth | 1.0 |
pyparsing | 3.1.2 |
pyreadline3 | 3.4.1 |
python-dateutil | 2.9.0.post0 |
pytz | 2024.1 |
PyWavelets | 1.6.0 |
PyYAML | 6.0.1 |
queuelib | 1.6.2 |
referencing | 0.35.0 |
regex | 2024.4.16 |
requests | 2.31.0 |
requests-file | 2.0.0 |
rpds-py | 0.18.0 |
scikit-learn | 1.4.2 |
scipy | 1.13.0 |
sip | 6.8.3 |
six | 1.16.0 |
smart-open | 6.4.0 |
snowballstemmer | 2.2.0 |
sortedcollections | 2.1.0 |
sortedcontainers | 2.4.0 |
spacy | 3.7.4 |
spacy-legacy | 3.0.12 |
spacy-loggers | 1.0.5 |
srsly | 2.4.8 |
statsmodels | 0.14.2 |
sympy | 1.12 |
tbb | 2021.12.0 |
tblib | 3.0.0 |
tenacity | 8.2.3 |
textdistance | 4.6.2 |
thinc | 8.2.3 |
threadpoolctl | 3.4.0 |
three-merge | 0.1.1 |
tldextract | 5.1.2 |
toolz | 0.12.1 |
tornado | 6.4 |
tqdm | 4.66.2 |
typer | 0.9.4 |
types-python-dateutil | 2.9.0.20240316 |
typing_extensions | 4.11.0 |
tzdata | 2024.1 |
ujson | 5.9.0 |
Unidecode | 1.3.8 |
urllib3 | 2.2.1 |
wasabi | 1.1.2 |
weasel | 0.3.4 |
Werkzeug | 3.0.2 |
xarray | 2024.3.0 |
zict | 3.0.0 |
zipp | 3.18.1 |
zstandard | 0.22.0 |
3.11.7 DL (Preview)
Python engine 3.11.7 + common data science and ML packages + deep learning packages (tensorflow & torch)
Package | Version |
---|---|
absl-py | 2.1.0 |
alembic | 1.13.1 |
aniso8601 | 9.0.1 |
annotated-types | 0.6.0 |
anytree | 2.12.1 |
arch | 7.0.0 |
arrow | 1.3.0 |
astunparse | 1.6.3 |
attrs | 23.2.0 |
blinker | 1.7.0 |
blis | 0.7.11 |
Bottleneck | 1.3.8 |
Brotli | 1.1.0 |
brotlipy | 0.7.0 |
cachetools | 5.3.3 |
catalogue | 2.0.10 |
certifi | 2024.2.2 |
cffi | 1.16.0 |
chardet | 5.2.0 |
charset-normalizer | 3.3.2 |
click | 8.1.7 |
cloudpathlib | 0.16.0 |
cloudpickle | 3.0.0 |
colorama | 0.4.6 |
coloredlogs | 15.0.1 |
confection | 0.1.4 |
contourpy | 1.2.1 |
cycler | 0.12.1 |
cymem | 2.0.8 |
Cython | 3.0.10 |
daal | 2024.3.0 |
daal4py | 2024.3.0 |
dask | 2024.4.2 |
Deprecated | 1.2.14 |
diff-match-patch | 20230430 |
dill | 0.3.8 |
distributed | 2024.4.2 |
docker | 7.1.0 |
entrypoints | 0.4 |
filelock | 3.13.4 |
flashtext | 2.7 |
Flask | 3.0.3 |
Flask-Compress | 1.15 |
flatbuffers | 24.3.25 |
fonttools | 4.51.0 |
fsspec | 2024.3.1 |
gast | 0.5.4 |
gensim | 4.3.2 |
gitdb | 4.0.11 |
GitPython | 3.1.43 |
google-pasta | 0.2.0 |
graphene | 3.3 |
graphql-core | 3.2.3 |
graphql-relay | 3.2.0 |
greenlet | 3.0.3 |
grpcio | 1.64.0 |
h5py | 3.11.0 |
humanfriendly | 10.0 |
idna | 3.7 |
importlib-metadata | 7.0.0 |
iniconfig | 2.0.0 |
intervaltree | 3.1.0 |
itsdangerous | 2.2.0 |
jellyfish | 1.0.3 |
Jinja2 | 3.1.3 |
jmespath | 1.0.1 |
joblib | 1.4.0 |
json5 | 0.9.25 |
jsonschema | 4.21.1 |
jsonschema-specifications | 2023.12.1 |
keras | 3.3.3 |
kiwisolver | 1.4.5 |
langcodes | 3.4.0 |
language_data | 1.2.0 |
libclang | 18.1.1 |
locket | 1.0.0 |
lxml | 5.2.1 |
Mako | 1.3.5 |
marisa-trie | 1.1.0 |
Markdown | 3.6 |
markdown-it-py | 3.0.0 |
MarkupSafe | 2.1.5 |
matplotlib | 3.8.4 |
mdurl | 0.1.2 |
ml-dtypes | 0.3.2 |
mlflow | 2.13.0 |
mlxtend | 0.23.1 |
mpmath | 1.3.0 |
msgpack | 1.0.8 |
murmurhash | 1.0.10 |
namex | 0.0.8 |
networkx | 3.3 |
nltk | 3.8.1 |
numpy | 1.26.4 |
onnxruntime | 1.17.3 |
opentelemetry-api | 1.24.0 |
opentelemetry-sdk | 1.24.0 |
opentelemetry-semantic-conventions | 0.45b0 |
opt-einsum | 3.3.0 |
optree | 0.11.0 |
packaging | 24.0 |
pandas | 2.2.2 |
partd | 1.4.1 |
patsy | 0.5.6 |
pillow | 10.3.0 |
platformdirs | 4.2.1 |
plotly | 5.21.0 |
pluggy | 1.5.0 |
preshed | 3.0.9 |
protobuf | 4.25.3 |
psutil | 5.9.8 |
pyarrow | 15.0.2 |
pycparser | 2.22 |
pydantic | 2.7.1 |
pydantic_core | 2.18.2 |
pyfpgrowth | 1.0 |
Pygments | 2.18.0 |
pyparsing | 3.1.2 |
pyreadline3 | 3.4.1 |
pytest | 8.2.1 |
python-dateutil | 2.9.0.post0 |
pytz | 2024.1 |
PyWavelets | 1.6.0 |
pywin32 | 306 |
PyYAML | 6.0.1 |
querystring-parser | 1.2.4 |
queuelib | 1.6.2 |
referencing | 0.35.0 |
regex | 2024.4.16 |
requests | 2.31.0 |
requests-file | 2.0.0 |
rich | 13.7.1 |
rpds-py | 0.18.0 |
rstl | 0.1.3 |
scikit-learn | 1.4.2 |
scipy | 1.13.0 |
seasonal | 0.3.1 |
sip | 6.8.3 |
six | 1.16.0 |
smart-open | 6.4.0 |
smmap | 5.0.1 |
snowballstemmer | 2.2.0 |
sortedcollections | 2.1.0 |
sortedcontainers | 2.4.0 |
spacy | 3.7.4 |
spacy-legacy | 3.0.12 |
spacy-loggers | 1.0.5 |
SQLAlchemy | 2.0.30 |
sqlparse | 0.5.0 |
srsly | 2.4.8 |
statsmodels | 0.14.2 |
sympy | 1.12 |
tbb | 2021.12.0 |
tblib | 3.0.0 |
tenacity | 8.2.3 |
tensorboard | 2.16.2 |
tensorboard-data-server | 0.7.2 |
tensorflow | 2.16.1 |
tensorflow-intel | 2.16.1 |
tensorflow-io-gcs-filesystem | 0.31.0 |
termcolor | 2.4.0 |
textdistance | 4.6.2 |
thinc | 8.2.3 |
threadpoolctl | 3.4.0 |
three-merge | 0.1.1 |
time-series-anomaly-detector | 0.2.7 |
tldextract | 5.1.2 |
toolz | 0.12.1 |
torch | 2.2.2 |
torchaudio | 2.2.2 |
torchvision | 0.17.2 |
tornado | 6.4 |
tqdm | 4.66.2 |
typer | 0.9.4 |
types-python-dateutil | 2.9.0.20240316 |
typing_extensions | 4.11.0 |
tzdata | 2024.1 |
ujson | 5.9.0 |
Unidecode | 1.3.8 |
urllib3 | 2.2.1 |
waitress | 3.0.0 |
wasabi | 1.1.2 |
weasel | 0.3.4 |
Werkzeug | 3.0.2 |
wrapt | 1.16.0 |
xarray | 2024.3.0 |
zict | 3.0.0 |
zipp | 3.18.1 |
zstandard | 0.22.0 |
3.10.8
Python engine 3.10.8 + common data science and ML packages
Package | Version |
---|---|
alembic | 1.11.1 |
anytree | 2.8.0 |
arrow | 1.2.3 |
attrs | 22.2.0 |
blis | 0.7.9 |
Bottleneck | 1.3.5 |
Brotli | 1.0.9 |
brotlipy | 0.7.0 |
catalogue | 2.0.8 |
certifi | 2022.12.7 |
cffi | 1.15.1 |
chardet | 5.0.0 |
charset-normalizer | 2.1.1 |
click | 8.1.3 |
cloudpickle | 2.2.1 |
colorama | 0.4.6 |
coloredlogs | 15.0.1 |
confection | 0.0.4 |
contourpy | 1.0.7 |
cycler | 0.11.0 |
cymem | 2.0.7 |
Cython | 0.29.28 |
daal | 2021.6.0 |
daal4py | 2021.6.3 |
dask | 2022.10.2 |
databricks-cli | 0.17.7 |
diff-match-patch | 20200713 |
dill | 0.3.6 |
distributed | 2022.10.2 |
docker | 6.1.3 |
entrypoints | 0.4 |
filelock | 3.9.1 |
flashtext | 2.7 |
Flask | 2.2.3 |
Flask-Compress | 1.13 |
flatbuffers | 23.3.3 |
fonttools | 4.39.0 |
fsspec | 2023.3.0 |
gensim | 4.2.0 |
gitdb | 4.0.10 |
GitPython | 3.1.31 |
greenlet | 2.0.2 |
HeapDict | 1.0.1 |
humanfriendly | 10.0 |
idna | 3.4 |
importlib-metadata | 6.7.0 |
intervaltree | 3.1.0 |
itsdangerous | 2.1.2 |
jellyfish | 0.9.0 |
Jinja2 | 3.1.2 |
jmespath | 1.0.1 |
joblib | 1.2.0 |
json5 | 0.9.10 |
jsonschema | 4.16.0 |
kiwisolver | 1.4.4 |
langcodes | 3.3.0 |
locket | 1.0.0 |
lxml | 4.9.1 |
Mako | 1.2.4 |
Markdown | 3.4.3 |
MarkupSafe | 2.1.2 |
mlflow | 2.4.1 |
mlxtend | 0.21.0 |
mpmath | 1.3.0 |
msgpack | 1.0.5 |
murmurhash | 1.0.9 |
networkx | 2.8.7 |
nltk | 3.7 |
numpy | 1.23.4 |
oauthlib | 3.2.2 |
onnxruntime | 1.13.1 |
packaging | 23.0 |
pandas | 1.5.1 |
partd | 1.3.0 |
pathy | 0.10.1 |
patsy | 0.5.3 |
Pillow | 9.4.0 |
pip | 23.0.1 |
platformdirs | 2.5.2 |
plotly | 5.11.0 |
ply | 3.11 |
preshed | 3.0.8 |
protobuf | 4.22.1 |
psutil | 5.9.3 |
pyarrow | 12.0.1 |
pycparser | 2.21 |
pydantic | 1.10.6 |
pyfpgrowth | 1.0 |
PyJWT | 2.7.0 |
pyparsing | 3.0.9 |
pyreadline3 | 3.4.1 |
pyrsistent | 0.19.3 |
python-dateutil | 2.8.2 |
pytz | 2022.7.1 |
PyWavelets | 1.4.1 |
pywin32 | 306 |
PyYAML | 6.0 |
querystring-parser | 1.2.4 |
queuelib | 1.6.2 |
regex | 2022.10.31 |
requests | 2.28.2 |
requests-file | 1.5.1 |
scikit-learn | 1.1.3 |
scipy | 1.9.3 |
setuptools | 67.6.0 |
sip | 6.7.3 |
six | 1.16.0 |
smart-open | 6.3.0 |
smmap | 5.0.0 |
snowballstemmer | 2.2.0 |
sortedcollections | 2.1.0 |
sortedcontainers | 2.4.0 |
spacy | 3.4.2 |
spacy-legacy | 3.0.12 |
spacy-loggers | 1.0.4 |
SQLAlchemy | 2.0.18 |
sqlparse | 0.4.4 |
srsly | 2.4.5 |
statsmodels | 0.13.2 |
sympy | 1.11.1 |
tabulate | 0.9.0 |
tbb | 2021.7.1 |
tblib | 1.7.0 |
tenacity | 8.2.2 |
textdistance | 4.5.0 |
thinc | 8.1.9 |
threadpoolctl | 3.1.0 |
three-merge | 0.1.1 |
tldextract | 3.4.0 |
toml | 0.10.2 |
toolz | 0.12.0 |
tornado | 6.1 |
tqdm | 4.65.0 |
typer | 0.4.2 |
typing_extensions | 4.5.0 |
ujson | 5.5.0 |
Unidecode | 1.3.6 |
urllib3 | 1.26.15 |
waitress | 2.1.2 |
wasabi | 0.10.1 |
websocket-client | 1.6.1 |
Werkzeug | 2.2.3 |
wheel | 0.40.0 |
xarray | 2022.10.0 |
zict | 2.2.0 |
zipp | 3.15.0 |
3.10.8 DL
Python engine 3.10.8 + common data science and ML packages + deep learning packages (tensorflow & torch)
Package | Version |
---|---|
absl-py | 1.4.0 |
alembic | 1.11.1 |
anytree | 2.8.0 |
arrow | 1.2.3 |
astunparse | 1.6.3 |
attrs | 22.1.0 |
blis | 0.7.9 |
Bottleneck | 1.3.5 |
Brotli | 1.0.9 |
brotlipy | 0.7.0 |
cachetools | 5.3.0 |
catalogue | 2.0.8 |
certifi | 2022.9.24 |
cffi | 1.15.1 |
chardet | 5.0.0 |
charset-normalizer | 2.1.1 |
click | 8.1.3 |
cloudpickle | 2.2.0 |
colorama | 0.4.6 |
coloredlogs | 15.0.1 |
confection | 0.0.3 |
contourpy | 1.0.6 |
cycler | 0.11.0 |
cymem | 2.0.7 |
Cython | 0.29.28 |
daal | 2021.6.0 |
daal4py | 2021.6.3 |
dask | 2022.10.2 |
databricks-cli | 0.17.7 |
diff-match-patch | 20200713 |
dill | 0.3.6 |
distributed | 2022.10.2 |
docker | 6.1.3 |
entrypoints | 0.4 |
filelock | 3.8.0 |
flashtext | 2.7 |
Flask | 2.2.2 |
Flask-Compress | 1.13 |
flatbuffers | 22.10.26 |
fonttools | 4.38.0 |
fsspec | 2022.10.0 |
gast | 0.4.0 |
gensim | 4.2.0 |
gitdb | 4.0.10 |
GitPython | 3.1.31 |
google-auth | 2.16.2 |
google-auth-oauthlib | 0.4.6 |
google-pasta | 0.2.0 |
greenlet | 2.0.2 |
grpcio | 1.51.3 |
h5py | 3.8.0 |
HeapDict | 1.0.1 |
humanfriendly | 10.0 |
idna | 3.4 |
importlib-metadata | 6.7.0 |
intervaltree | 3.1.0 |
itsdangerous | 2.1.2 |
jax | 0.4.6 |
jellyfish | 0.9.0 |
Jinja2 | 3.1.2 |
jmespath | 1.0.1 |
joblib | 1.2.0 |
json5 | 0.9.10 |
jsonschema | 4.16.0 |
keras | 2.12.0 |
kiwisolver | 1.4.4 |
langcodes | 3.3.0 |
libclang | 16.0.0 |
locket | 1.0.0 |
lxml | 4.9.1 |
Mako | 1.2.4 |
Markdown | 3.4.2 |
MarkupSafe | 2.1.1 |
mlflow | 2.4.1 |
mlxtend | 0.21.0 |
mpmath | 1.2.1 |
msgpack | 1.0.4 |
murmurhash | 1.0.9 |
networkx | 2.8.7 |
nltk | 3.7 |
numpy | 1.23.4 |
oauthlib | 3.2.2 |
onnxruntime | 1.13.1 |
opt-einsum | 3.3.0 |
packaging | 21.3 |
pandas | 1.5.1 |
partd | 1.3.0 |
pathy | 0.6.2 |
patsy | 0.5.3 |
Pillow | 9.3.0 |
pip | 23.0.1 |
platformdirs | 2.5.2 |
plotly | 5.11.0 |
ply | 3.11 |
preshed | 3.0.8 |
protobuf | 4.21.9 |
psutil | 5.9.3 |
pyarrow | 12.0.1 |
pyasn1 | 0.4.8 |
pyasn1-modules | 0.2.8 |
pycparser | 2.21 |
pydantic | 1.10.2 |
pyfpgrowth | 1.0 |
PyJWT | 2.7.0 |
pyparsing | 3.0.9 |
pyreadline3 | 3.4.1 |
pyrsistent | 0.19.1 |
python-dateutil | 2.8.2 |
pytz | 2022.5 |
PyWavelets | 1.4.1 |
pywin32 | 306 |
PyYAML | 6.0 |
querystring-parser | 1.2.4 |
queuelib | 1.6.2 |
regex | 2022.10.31 |
requests | 2.28.1 |
requests-file | 1.5.1 |
requests-oauthlib | 1.3.1 |
rsa | 4.9 |
scikit-learn | 1.1.3 |
scipy | 1.9.3 |
setuptools | 67.6.0 |
sip | 6.7.3 |
six | 1.16.0 |
smart-open | 5.2.1 |
smmap | 5.0.0 |
snowballstemmer | 2.2.0 |
sortedcollections | 2.1.0 |
sortedcontainers | 2.4.0 |
spacy | 3.4.2 |
spacy-legacy | 3.0.10 |
spacy-loggers | 1.0.3 |
SQLAlchemy | 2.0.18 |
sqlparse | 0.4.4 |
srsly | 2.4.5 |
statsmodels | 0.13.2 |
sympy | 1.11.1 |
tabulate | 0.9.0 |
tbb | 2021.7.0 |
tblib | 1.7.0 |
tenacity | 8.1.0 |
tensorboard | 2.12.0 |
tensorboard-data-server | 0.7.0 |
tensorboard-plugin-wit | 1.8.1 |
tensorflow | 2.12.0 |
tensorflow-estimator | 2.12.0 |
tensorflow-intel | 2.12.0 |
tensorflow-io-gcs-filesystem | 0.31.0 |
termcolor | 2.2.0 |
textdistance | 4.5.0 |
thinc | 8.1.5 |
threadpoolctl | 3.1.0 |
three-merge | 0.1.1 |
tldextract | 3.4.0 |
toml | 0.10.2 |
toolz | 0.12.0 |
torch | 2.0.0 |
torchaudio | 2.0.1 |
torchvision | 0.15.1 |
tornado | 6.1 |
tqdm | 4.64.1 |
typer | 0.4.2 |
typing_extensions | 4.4.0 |
ujson | 5.5.0 |
Unidecode | 1.3.6 |
urllib3 | 1.26.12 |
waitress | 2.1.2 |
wasabi | 0.10.1 |
websocket-client | 1.6.1 |
Werkzeug | 2.2.2 |
wheel | 0.40.0 |
wrapt | 1.14.1 |
xarray | 2022.10.0 |
zict | 2.2.0 |
zipp | 3.15.0 |
3.6.5 (Legacy)
Package | Version |
---|---|
adal | 1.2.0 |
anaconda_navigator | 1.8.7 |
anytree | 2.8.0 |
argparse | 1.1 |
asn1crypto | 0.24.0 |
astor | 0.7.1 |
astroid | 1.6.3 |
astropy | 3.0.2 |
attr | 18.1.0 |
babel | 2.5.3 |
backcall | 0.1.0 |
bitarray | 0.8.1 |
bleach | 2.1.3 |
bokeh | 0.12.16 |
boto | 2.48.0 |
boto3 | 1.9.109 |
botocore | 1.12.109 |
bottleneck | 1.2.1 |
bs4 | 4.6.0 |
certifi | 2018.04.16 |
cffi | 1.11.5 |
cgi | 2.6 |
chardet | 3.0.4 |
click | 6.7 |
cloudpickle | 0.5.3 |
clyent | 1.2.2 |
colorama | 0.3.9 |
conda | 4.5.4 |
conda_build | 3.10.5 |
conda_env | 4.5.4 |
conda_verify | 2.0.0 |
Crypto | 2.6.1 |
cryptography | 2.2.2 |
csv | 1 |
ctypes | 1.1.0 |
cycler | 0.10.0 |
cython | 0.28.2 |
Cython | 0.28.2 |
cytoolz | 0.9.0.1 |
dask | 0.17.5 |
datashape | 0.5.4 |
dateutil | 2.7.3 |
decimal | 1.7 |
decorator | 4.3.0 |
dill | 0.2.8.2 |
distributed | 1.21.8 |
distutils | 3.6.5 |
docutils | 0.14 |
entrypoints | 0.2.3 |
et_xmlfile | 1.0.1 |
fastcache | 1.0.2 |
filelock | 3.0.4 |
flask | 1.0.2 |
flask_cors | 3.0.4 |
future | 0.17.1 |
gensim | 3.7.1 |
geohash | 0.8.5 |
gevent | 1.3.0 |
glob2 | “(0, 6)” |
greenlet | 0.4.13 |
h5py | 2.7.1 |
html5lib | 1.0.1 |
idna | 2.6 |
imageio | 2.3.0 |
imaplib | 2.58 |
ipaddress | 1 |
IPython | 6.4.0 |
ipython_genutils | 0.2.0 |
isort | 4.3.4 |
jdcal | 1.4 |
jedi | 0.12.0 |
jinja2 | 2.1 |
jmespath | 0.9.4 |
joblib | 0.13.0 |
json | 2.0.9 |
jsonschema | 2.6.0 |
jupyter_core | 4.4.0 |
jupyterlab | 0.32.1 |
jwt | 1.7.1 |
keras | 2.2.4 |
keras_applications | 1.0.6 |
keras_preprocessing | 1.0.5 |
kiwisolver | 1.0.1 |
lazy_object_proxy | 1.3.1 |
llvmlite | 0.23.1 |
logging | 0.5.1.2 |
markdown | 3.0.1 |
markupsafe | 1 |
matplotlib | 2.2.2 |
mccabe | 0.6.1 |
menuinst | 1.4.14 |
mistune | 0.8.3 |
mkl | 1.1.2 |
mlxtend | 0.15.0.0 |
mpmath | 1.0.0 |
msrest | 0.6.2 |
msrestazure | 0.6.0 |
multipledispatch | 0.5.0 |
navigator_updater | 0.2.1 |
nbconvert | 5.3.1 |
nbformat | 4.4.0 |
networkx | 2.1 |
nltk | 3.3 |
nose | 1.3.7 |
notebook | 5.5.0 |
numba | 0.38.0 |
numexpr | 2.6.5 |
numpy | 1.19.1 |
numpydoc | 0.8.0 |
oauthlib | 2.1.0 |
olefile | 0.45.1 |
onnxruntime | 1.4.0 |
openpyxl | 2.5.3 |
OpenSSL | 18.0.0 |
optparse | 1.5.3 |
packaging | 17.1 |
pandas | 0.24.1 |
parso | 0.2.0 |
past | 0.17.1 |
path | 11.0.1 |
patsy | 0.5.0 |
pep8 | 1.7.1 |
phonenumbers | 8.10.6 |
pickleshare | 0.7.4 |
PIL | 5.1.0 |
pint | 0.8.1 |
pip | 21.3.1 |
plac | 0.9.6 |
platform | 1.0.8 |
plotly | 4.8.2 |
pluggy | 0.6.0 |
ply | 3.11 |
prompt_toolkit | 1.0.15 |
psutil | 5.4.5 |
py | 1.5.3 |
pycodestyle | 2.4.0 |
pycosat | 0.6.3 |
pycparser | 2.18 |
pyflakes | 1.6.0 |
pyfpgrowth | 1 |
pygments | 2.2.0 |
pylint | 1.8.4 |
pyparsing | 2.2.0 |
pytest | 3.5.1 |
pytest_arraydiff | 0.2 |
pytz | 2018.4 |
pywt | 0.5.2 |
qtconsole | 4.3.1 |
re | 2.2.1 |
regex | 2.4.136 |
requests | 2.18.4 |
requests_oauthlib | 1.0.0 |
ruamel_yaml | 0.15.35 |
s3transfer | 0.2.0 |
sandbox_utils | 1.2 |
scipy | 1.1.0 |
scrubadub | 1.2.0 |
setuptools | 39.1.0 |
six | 1.11.0 |
sklearn | 0.20.3 |
socketserver | 0.4 |
socks | 1.6.7 |
sortedcollections | 0.6.1 |
sortedcontainers | 1.5.10 |
spacy | 2.0.18 |
sphinx | 1.7.4 |
spyder | 3.2.8 |
sqlalchemy | 1.2.7 |
statsmodels | 0.9.0 |
surprise | 1.0.6 |
sympy | 1.1.1 |
tables | 3.4.3 |
tabnanny | 6 |
tblib | 1.3.2 |
tensorflow | 1.12.0 |
terminado | 0.8.1 |
testpath | 0.3.1 |
textblob | 0.10.0 |
tlz | 0.9.0.1 |
toolz | 0.9.0 |
torch | 1.0.0 |
tqdm | 4.31.1 |
traitlets | 4.3.2 |
ujson | 1.35 |
unicodecsv | 0.14.1 |
urllib3 | 1.22 |
werkzeug | 0.14.1 |
wheel | 0.31.1 |
widgetsnbextension | 3.2.1 |
win32rcparser | 0.11 |
winpty | 0.5.1 |
wrapt | 1.10.11 |
xgboost | 0.81 |
xlsxwriter | 1.0.4 |
yaml | 3.12 |
zict | 0.1.3 |
9.3.3 - R plugin (Preview)
The R plugin runs a user-defined function (UDF) using an R script.
The script gets tabular data as its input, and produces tabular output. The plugin’s runtime is hosted in a sandbox on the cluster’s nodes. The sandbox provides an isolated and secure environment.
Syntax
T |
evaluate
[hint.distribution
=
(single
| per_node
)] r(
output_schema,
script [,
script_parameters] [,
external_artifacts])
Parameters
Name | Type | Required | Description |
---|---|---|---|
output_schema | string | ✔️ | A type literal that defines the output schema of the tabular data, returned by the R code. The format is: typeof( ColumnName: ColumnType[, …]) . For example: typeof(col1:string, col2:long) . To extend the input schema, use the following syntax: typeof(*, col1:string, col2:long) . |
script | string | ✔️ | The valid R script to be executed. |
script_parameters | dynamic | A property bag of name and value pairs to be passed to the R script as the reserved kargs dictionary. For more information, see Reserved R variables. | |
hint.distribution | string | Hint for the plugin’s execution to be distributed across multiple cluster nodes. The default value is single . single means that a single instance of the script will run over the entire query data. per_node means that if the query before the R block is distributed, an instance of the script will run on each node over the data that it contains. | |
external_artifacts | dynamic | A property bag of name and URL pairs for artifacts that are accessible from cloud storage. They can be made available for the script to use at runtime. URLs referenced in this property bag are required to be included in the cluster’s callout policy and in a publicly available location, or contain the necessary credentials, as explained in storage connection strings. The artifacts are made available for the script to consume from a local temporary directory, .\Temp . The names provided in the property bag are used as the local file names. See Example. For more information, see Install packages for the R plugin. |
Reserved R variables
The following variables are reserved for interaction between Kusto Query Language and the R code:
df
: The input tabular data (the values ofT
above), as an R DataFrame.kargs
: The value of the script_parameters argument, as an R dictionary.result
: An R DataFrame created by the R script. The value becomes the tabular data that gets sent to any Kusto query operator that follows the plugin.
Enable the plugin
- The plugin is disabled by default.
- Enable or disable the plugin in the Azure portal in the Configuration tab of your cluster. For more information, see Manage language extensions in your Azure Data Explorer cluster (Preview)
R sandbox image
- The R sandbox image is based on R 3.4.4 for Windows, and includes packages from Anaconda’s R Essentials bundle.
Examples
range x from 1 to 360 step 1
| evaluate r(
//
typeof(*, fx:double), // Output schema: append a new fx column to original table
//
'result <- df\n' // The R decorated script
'n <- nrow(df)\n'
'g <- kargs$gain\n'
'f <- kargs$cycles\n'
'result$fx <- g * sin(df$x / n * 2 * pi * f)'
//
, bag_pack('gain', 100, 'cycles', 4) // dictionary of parameters
)
| render linechart
Performance tips
Reduce the plugin’s input dataset to the minimum amount required (columns/rows).
Use filters on the source dataset using the Kusto Query Language, when possible.
To make a calculation on a subset of the source columns, project only those columns before invoking the plugin.
Use
hint.distribution = per_node
whenever the logic in your script is distributable.You can also use the partition operator for partitioning the input data et.
Whenever possible, use the Kusto Query Language to implement the logic of your R script.
For example:
.show operations | where StartedOn > ago(1d) // Filtering out irrelevant records before invoking the plugin | project d_seconds = Duration / 1s // Projecting only a subset of the necessary columns | evaluate hint.distribution = per_node r( // Using per_node distribution, as the script's logic allows it typeof(*, d2:double), 'result <- df\n' 'result$d2 <- df$d_seconds\n' // Negative example: this logic should have been written using Kusto's query language ) | summarize avg = avg(d2)
Usage tips
To avoid conflicts between Kusto string delimiters and R string delimiters:
- Use single quote characters (
'
) for Kusto string literals in Kusto queries. - Use double quote characters (
"
) for R string literals in R scripts.
- Use single quote characters (
Use the external data operator to obtain the content of a script that you’ve stored in an external location, such as Azure blob storage or a public GitHub repository.
For example:
let script = externaldata(script:string) [h'https://kustoscriptsamples.blob.core.windows.net/samples/R/sample_script.r'] with(format = raw); range x from 1 to 360 step 1 | evaluate r( typeof(*, fx:double), toscalar(script), bag_pack('gain', 100, 'cycles', 4)) | render linechart
Install packages for the R plugin
Follow these step by step instructions to install package(s) that aren’t included in the plugin’s base image.
Prerequisites
Create a blob container to host the packages, preferably in the same place as your cluster. For example,
https://artifactswestus.blob.core.windows.net/r
, assuming your cluster is in West US.Alter the cluster’s callout policy to allow access to that location.
This change requires AllDatabasesAdmin permissions.
For example, to enable access to a blob located in
https://artifactswestus.blob.core.windows.net/r
, run the following command:
.alter-merge cluster policy callout @'[ { "CalloutType": "sandbox_artifacts", "CalloutUriRegex": "artifactswestus\\.blob\\.core\\.windows\\.net/r/","CanCall": true } ]'
Install packages
The example snips below assume local R machine on Windows environment.
Verify you’re using the appropriate R version – current R Sandbox version is 3.4.4:
> R.Version()["version.string"] $version.string [1] "R version 3.4.4 (2018-03-15)"
If needed you can download it from here.
Launch the x64 RGui
Create a new empty folder to be populated with all the relevant packages you would like to install. In this example we install the brglm2 package, so creating “C:\brglm2”.
Add the newly created folder path to lib paths:
> .libPaths("C://brglm2")
Verify that the new folder is now the first path in .libPaths():
> .libPaths() [1] "C:/brglm2" "C:/Program Files/R/R-3.4.4/library"
Once this setup is done, any package that we install shall be added to this new folder. Let’s install the requested package and its dependencies:
> install.packages("brglm2")
In case the question “Do you want to install from sources the packages which need compilation?” pops up, answer “Y”.
Verify that new folders were added to “C:\brglm2”:
Select all items in that folder and zip them to e.g. libs.zip (do not zip the parent folder). You should get an archive structure like this:
libs.zip:
- brglm2 (folder)
- enrichwith (folder)
- numDeriv (folder)
Upload libs.zip to the blob container that was set above
Call the
r
plugin.- Specify the
external_artifacts
parameter with a property bag of name and reference to the ZIP file (the blob’s URL, including a SAS token). - In your inline r code, import
zipfile
fromsandboxutils
and call itsinstall()
method with the name of the ZIP file.
- Specify the
Example
Install the brglm2 package:
print x=1
| evaluate r(typeof(*, ver:string),
'library(sandboxutils)\n'
'zipfile.install("brglm2.zip")\n'
'library("brglm2")\n'
'result <- df\n'
'result$ver <-packageVersion("brglm2")\n'
,external_artifacts=bag_pack(brglm2.zip', 'https://artifactswestus.blob.core.windows.net/r/libs.zip?*** REPLACE WITH YOUR SAS TOKEN ***'))
x | ver |
---|---|
1 | 1.8.2 |
Make sure that the archive’s name (first value in pack pair) has the *.zip suffix to prevent collisions when unzipping folders whose name is identical to the archive name.
9.4 - Machine learning plugins
9.4.1 - autocluster plugin
autocluster
finds common patterns of discrete attributes (dimensions) in the data. It then reduces the results of the original query, whether it’s 100 or 100,000 rows, to a few patterns. The plugin was developed to help analyze failures (such as exceptions or crashes) but can potentially work on any filtered dataset. The plugin is invoked with the evaluate
operator.
Syntax
T |
evaluate
autocluster
(
[SizeWeight [,
WeightColumn [,
NumSeeds [,
CustomWildcard [,
… ]]]]])
Parameters
The parameters must be ordered as specified in the syntax. To indicate that the default value should be used, put the string tilde value ~
. For more information, see Examples.
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
SizeWeight | double | A double between 0 and 1 that controls the balance between generic (high coverage) and informative (many shared) values. Increasing this value typically reduces the quantity of patterns while expanding coverage. Conversely, decreasing this value generates more specific patterns characterized by increased shared values and a smaller percentage coverage. The default is 0.5 . The formula is a weighted geometric mean with weights SizeWeight and 1-SizeWeight . | |
WeightColumn | string | Considers each row in the input according to the specified weight. Each row has a default weight of 1 . The argument must be a name of a numeric integer column. A common usage of a weight column is to take into account sampling or bucketing or aggregation of the data that is already embedded into each row. | |
NumSeeds | int | Determines the number of initial local search points. Adjusting the number of seeds impacts result quantity or quality based on data structure. Increasing seeds can enhance results but with a slower query tradeoff. Decreasing below five yields negligible improvements, while increasing above 50 rarely generates more patterns. The default is 25 . | |
CustomWildcard | string | A type literal that sets the wildcard value for a specific type in the results table, indicating no restriction on this column. The default is null , which represents an empty string. If the default is a good value in the data, a different wildcard value should be used, such as * . You can include multiple custom wildcards by adding them consecutively. |
Returns
The autocluster
plugin usually returns a small set of patterns. The patterns capture portions of the data with shared common values across multiple discrete attributes. Each pattern in the results is represented by a row.
The first column is the segment ID. The next two columns are the count and percentage of rows out of the original query that are captured by the pattern. The remaining columns are from the original query. Their value is either a specific value from the column, or a wildcard value (which are by default null) meaning variable values.
The patterns aren’t distinct, may be overlapping, and usually don’t cover all the original rows. Some rows may not fall under any pattern.
Examples
Using evaluate
T | evaluate autocluster()
Using autocluster
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.6)
Output
SegmentId | Count | Percent | State | EventType | Damage | |||
---|---|---|---|---|---|---|---|---|
0 | 2278 | 38.7 | Hail | NO | ||||
1 | 512 | 8.7 | Thunderstorm Wind | YES | ||||
2 | 898 | 15.3 | TEXAS |
Using custom wildcards
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.2, '~', '~', '*')
Output
SegmentId | Count | Percent | State | EventType | Damage | |||
---|---|---|---|---|---|---|---|---|
0 | 2278 | 38.7 | * | Hail | NO | |||
1 | 512 | 8.7 | * | Thunderstorm Wind | YES | |||
2 | 898 | 15.3 | TEXAS | * | * |
Related content
9.4.2 - basket plugin
The basket
plugin finds frequent patterns of attributes in the data and returns the patterns that pass a frequency threshold in that data. A pattern represents a subset of the rows that have the same value across one or more columns. The basket
plugin is based on the Apriori algorithm originally developed for basket analysis data mining.
Syntax
T | evaluate
basket
(
[ Threshold,
WeightColumn,
MaxDimensions,
CustomWildcard,
CustomWildcard,
… ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
Threshold | long | A double in the range of 0.015 to 1 that sets the minimal ratio of the rows to be considered frequent. Patterns with a smaller ratio won’t be returned. The default value is 0.05. To use the default value, input the tilde: ~ .Example: `T | |
WeightColumn | string | The column name to use to consider each row in the input according to the specified weight. Must be a name of a numeric type column, such as int , long , real . By default, each row has a weight of 1. To use the default value, input the tilde: ~ . A common use of a weight column is to take into account sampling or bucketing/aggregation of the data that is already embedded into each row.Example: `T | |
MaxDimensions | int | Sets the maximal number of uncorrelated dimensions per basket, limited by default, to minimize the query runtime. The default is 5. To use the default value, input the tilde: ~ .Example: `T | |
CustomWildcard | string | Sets the wildcard value for a specific type in the result table that will indicate that the current pattern doesn’t have a restriction on this column. The default is null except for string columns whose default value is an empty string. If the default is a good value in the data, a different wildcard value should be used, such as * . To use the default value, input the tilde: ~ .Example: `T |
Returns
The basket
plugin returns frequent patterns that pass a ratio threshold. The default threshold is 0.05.
Each pattern is represented by a row in the results. The first column is the segment ID. The next two columns are the count and percentage of rows, from the original query that match the pattern. The remaining columns relate to the original query, with either a specific value from the column or a wildcard value, which is by default null, meaning a variable value.
Example
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2)
Output
SegmentId | Count | Percent | State | EventType | Damage | DamageCrops | ||
---|---|---|---|---|---|---|---|---|
0 | 4574 | 77.7 | NO | 0 | ||||
1 | 2278 | 38.7 | Hail | NO | 0 | |||
2 | 5675 | 96.4 | 0 | |||||
3 | 2371 | 40.3 | Hail | 0 | ||||
4 | 1279 | 21.7 | Thunderstorm Wind | 0 | ||||
5 | 2468 | 41.9 | Hail | |||||
6 | 1310 | 22.3 | YES | |||||
7 | 1291 | 21.9 | Thunderstorm Wind |
Example with custom wildcards
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2, '~', '~', '*', int(-1))
Output
SegmentId | Count | Percent | State | EventType | Damage | DamageCrops | ||
---|---|---|---|---|---|---|---|---|
0 | 4574 | 77.7 | * | * | NO | 0 | ||
1 | 2278 | 38.7 | * | Hail | NO | 0 | ||
2 | 5675 | 96.4 | * | * | * | 0 | ||
3 | 2371 | 40.3 | * | Hail | * | 0 | ||
4 | 1279 | 21.7 | * | Thunderstorm Wind | * | 0 | ||
5 | 2468 | 41.9 | * | Hail | * | -1 | ||
6 | 1310 | 22.3 | * | * | YES | -1 | ||
7 | 1291 | 21.9 | * | Thunderstorm Wind | * | -1 |
9.4.3 - diffpatterns plugin
Compares two datasets of the same structure and finds patterns of discrete attributes (dimensions) that characterize differences between the two datasets. The plugin is invoked with the evaluate
operator.
diffpatterns
was developed to help analyze failures (for example, by comparing failures to non-failures in a given time frame), but can potentially find differences between any two datasets of the same structure.
Syntax
T | evaluate diffpatterns(
SplitColumn,
SplitValueA,
SplitValueB [,
WeightColumn,
Threshold,
MaxDimensions,
CustomWildcard,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
SplitColumn | string | ✔️ | The column name that tells the algorithm how to split the query into datasets. According to the specified values for the SplitValueA and SplitValueB arguments, the algorithm splits the query into two datasets, “A” and “B”, and analyzes the differences between them. As such, the split column must have at least two distinct values. |
SplitValueA | string | ✔️ | A string representation of one of the values in the SplitColumn that was specified. All the rows that have this value in their SplitColumn considered as dataset “A”. |
SplitValueB | string | ✔️ | A string representation of one of the values in the SplitColumn that was specified. All the rows that have this value in their SplitColumn considered as dataset “B”. |
WeightColumn | string | The column used to consider each row in the input according to the specified weight. Must be a name of a numeric column, such as int , long , real . By default each row has a weight of ‘1’. To use the default value, input the tilde: ~ . A common usage of a weight column is to take into account sampling or bucketing/aggregation of the data that is already embedded into each row.Example: `T | |
Threshold | real | A real in the range of 0.015 to 1. This value sets the minimal pattern ratio difference between the two sets. The default is 0.05. To use the default value, input the tilde: ~ .Example: `T | |
MaxDimensions | int | Sets the maximum number of uncorrelated dimensions per result pattern. By specifying a limit, you decrease the query runtime. The default is unlimited. To use the default value, input the tilde: ~ .Example: `T | |
CustomWildcard | string | Sets the wildcard value for a specific type in the result table that will indicate that the current pattern doesn’t have a restriction on this column. The default is null, except for string columns for which the default is an empty string. If the default is a viable value in the data, a different wildcard value should be used. For example, * . To use the default value, input the tilde: ~ .Example: `T |
Returns
diffpatterns
returns a small set of patterns that capture different portions of the data in the two sets (that is, a pattern capturing a large percentage of the rows in the first dataset and low percentage of the rows in the second set). Each pattern is represented by a row in the results.
The result of diffpatterns
returns the following columns:
SegmentId: the identity assigned to the pattern in the current query (note: IDs aren’t guaranteed to be the same in repeating queries).
CountA: the number of rows captured by the pattern in Set A (Set A is the equivalent of
where tostring(splitColumn) == SplitValueA
).CountB: the number of rows captured by the pattern in Set B (Set B is the equivalent of
where tostring(splitColumn) == SplitValueB
).PercentA: the percentage of rows in Set A captured by the pattern (100.0 * CountA / count(SetA)).
PercentB: the percentage of rows in Set B captured by the pattern (100.0 * CountB / count(SetB)).
PercentDiffAB: the absolute percentage point difference between A and B (|PercentA - PercentB|) is the main measure of significance of patterns in describing the difference between the two sets.
Rest of the columns: are the original schema of the input and describe the pattern, each row (pattern) represents the intersection of the non-wildcard values of the columns (equivalent of
where col1==val1 and col2==val2 and ... colN=valN
for each non-wildcard value in the row).
For each pattern, columns that aren’t set in the pattern (that is, without restriction on a specific value) will contain a wildcard value, which is null by default. See in the Arguments section below how wildcards can be manually changed.
- Note: the patterns are often not distinct. They may be overlapping, and usually don’t cover all the original rows. Some rows may not fall under any pattern.
Example
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , 1 , 0)
| project State , EventType , Source , Damage, DamageCrops
| evaluate diffpatterns(Damage, "0", "1" )
Output
SegmentId | CountA | CountB | PercentA | PercentB | PercentDiffAB | State | EventType | Source | DamageCrops |
---|---|---|---|---|---|---|---|---|---|
0 | 2278 | 93 | 49.8 | 7.1 | 42.7 | Hail | 0 | ||
1 | 779 | 512 | 17.03 | 39.08 | 22.05 | Thunderstorm Wind | |||
2 | 1098 | 118 | 24.01 | 9.01 | 15 | Trained Spotter | 0 | ||
3 | 136 | 158 | 2.97 | 12.06 | 9.09 | Newspaper | |||
4 | 359 | 214 | 7.85 | 16.34 | 8.49 | Flash Flood | |||
5 | 50 | 122 | 1.09 | 9.31 | 8.22 | IOWA | |||
6 | 655 | 279 | 14.32 | 21.3 | 6.98 | Law Enforcement | |||
7 | 150 | 117 | 3.28 | 8.93 | 5.65 | Flood | |||
8 | 362 | 176 | 7.91 | 13.44 | 5.52 | Emergency Manager |
9.4.4 - diffpatterns_text plugin
Compares two datasets of string values and finds text patterns that characterize differences between the two datasets. The plugin is invoked with the evaluate
operator.
The diffpatterns_text
returns a set of text patterns that capture different portions of the data in the two sets. For example, a pattern capturing a large percentage of the rows when the condition is true
and low percentage of the rows when the condition is false
. The patterns are built from consecutive tokens separated by white space, with a token from the text column or a *
representing a wildcard. Each pattern is represented by a row in the results.
Syntax
T | evaluate diffpatterns_text(
TextColumn, BooleanCondition [, MinTokens, Threshold , MaxTokens])
Parameters
Name | Type | Required | Description |
---|---|---|---|
TextColumn | string | ✔️ | The text column to analyze. |
BooleanCondition | string | ✔️ | An expression that evaluates to a boolean value. The algorithm splits the query into the two datasets to compare based on this expression. |
MinTokens | int | An integer value between 0 and 200 that represents the minimal number of non-wildcard tokens per result pattern. The default is 1. | |
Threshold | decimal | A decimal value between 0.015 and 1 that sets the minimal pattern ratio difference between the two sets. Default is 0.05. See diffpatterns. | |
MaxTokens | int | An integer value between 0 and 20 that sets the maximal number of tokens per result pattern, specifying a lower limit decreases the query runtime. |
Returns
The result of diffpatterns_text returns the following columns:
- Count_of_True: The number of rows matching the pattern when the condition is
true
. - Count_of_False: The number of rows matching the pattern when the condition is
false
. - Percent_of_True: The percentage of rows matching the pattern from the rows when the condition is
true
. - Percent_of_False: The percentage of rows matching the pattern from the rows when the condition is
false
. - Pattern: The text pattern containing tokens from the text string and ‘
*
’ for wildcards.
Example
The following example uses data from the StormEvents table in the help cluster. To access this data, sign in to https://dataexplorer.azure.com/clusters/help/databases/Samples. In the left menu, browse to help > Samples > Tables > Storm_Events.
The examples in this tutorial use the StormEvents
table, which is publicly available in the Weather analytics sample data.
StormEvents
| where EventNarrative != "" and monthofyear(StartTime) > 1 and monthofyear(StartTime) < 9
| where EventType == "Drought" or EventType == "Extreme Cold/Wind Chill"
| evaluate diffpatterns_text(EpisodeNarrative, EventType == "Extreme Cold/Wind Chill", 2)
Output
Count_of_True | Count_of_False | Percent_of_True | Percent_of_False | Pattern |
---|---|---|---|---|
11 | 0 | 6.29 | 0 | Winds shifting northwest in * wake * a surface trough brought heavy lake effect snowfall downwind * Lake Superior from |
9 | 0 | 5.14 | 0 | Canadian high pressure settled * * region * produced the coldest temperatures since February * 2006. Durations * freezing temperatures |
0 | 34 | 0 | 6.24 | * * * * * * * * * * * * * * * * * * West Tennessee, |
0 | 42 | 0 | 7.71 | * * * * * * caused * * * * * * * * across western Colorado. * |
0 | 45 | 0 | 8.26 | * * below normal * |
0 | 110 | 0 | 20.18 | Below normal * |
9.5 - Query connectivity plugins
9.5.1 - ai_embed_text plugin (Preview)
The ai_embed_text
plugin allows embedding of text using language models, enabling various AI-related scenarios such as Retrieval Augmented Generation (RAG) applications and semantic search. The plugin supports Azure OpenAI Service embedding models accessed using managed identity.
Prerequisites
- An Azure OpenAI Service configured with managed identity
- Managed identity and callout policies configured to allow communication with Azure OpenAI services
Syntax
evaluate
ai_embed_text
(
text, connectionString [,
options [,
IncludeErrorMessages]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
text | string | ✔️ | The text to embed. The value can be a column reference or a constant scalar. |
connectionString | string | ✔️ | The connection string for the language model in the format <ModelDeploymentUri>;<AuthenticationMethod> ; replace <ModelDeploymentUri> and <AuthenticationMethod> with the AI model deployment URI and the authentication method respectively. |
options | dynamic | The options that control calls to the embedding model endpoint. See Options. | |
IncludeErrorMessages | bool | Indicates whether to output errors in a new column in the output table. Default value: false . |
Options
The following table describes the options that control the way the requests are made to the embedding model endpoint.
Name | Type | Description |
---|---|---|
RecordsPerRequest | int | Specifies the number of records to process per request. Default value: 1 . |
CharsPerRequest | int | Specifies the maximum number of characters to process per request. Default value: 0 (unlimited). Azure OpenAI counts tokens, with each token approximately translating to four characters. |
RetriesOnThrottling | int | Specifies the number of retry attempts when throttling occurs. Default value: 0 . |
GlobalTimeout | timespan | Specifies the maximum time to wait for a response from the embedding model. Default value: null |
ModelParameters | dynamic | Parameters specific to the embedding model, such as embedding dimensions or user identifiers for monitoring purposes. Default value: null . |
Configure managed identity and callout policies
To use the ai_embed_text
plugin, you must configure the following policies:
- managed identity: Allow the system-assigned managed identity to authenticate to Azure OpenAI services.
- callout: Authorize the AI model endpoint domain.
To configure these policies, use the commands in the following steps:
Configure the managed identity:
.alter-merge cluster policy managed_identity ``` [ { "ObjectId": "system", "AllowedUsages": "AzureAI" } ] ```
Configure the callout policy:
.alter-merge cluster policy callout ``` [ { "CalloutType": "azure_openai", "CalloutUriRegex": "https://[A-Za-z0-9\\-]{3,63}\\.openai\\.azure\\.com/.*", "CanCall": true } ] ```
Returns
Returns the following new embedding columns:
- A column with the _embedding suffix that contains the embedding values
- If configured to return errors, a column with the _embedding_error suffix, which contains error strings or is left empty if the operation is successful.
Depending on the input type, the plugin returns different results:
- Column reference: Returns one or more records with additional columns are prefixed by the reference column name. For example, if the input column is named TextData, the output columns are named TextData_embedding and, if configured to return errors, TextData_embedding_error.
- Constant scalar: Returns a single record with additional columns that are not prefixed. The column names are _embedding and, if configured to return errors, _embedding_error.
Examples
The following example embeds the text Embed this text using AI
using the Azure OpenAI Embedding model.
let expression = 'Embed this text using AI';
let connectionString = 'https://myaccount.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-06-01;managed_identity=system';
evaluate ai_embed_text(expression, connectionString)
The following example embeds multiple texts using the Azure OpenAI Embedding model.
let connectionString = 'https://myaccount.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-06-01;managed_identity=system';
let options = dynamic({
"RecordsPerRequest": 10,
"CharsPerRequest": 10000,
"RetriesOnThrottling": 1,
"GlobalTimeout": 2m
});
datatable(TextData: string)
[
"First text to embed",
"Second text to embed",
"Third text to embed"
]
| evaluate ai_embed_text(TextData, connectionString, options , true)
Best practices
Azure OpenAI embedding models are subject to heavy throttling, and frequent calls to this plugin can quickly reach throttling limits.
To efficiently use the ai_embed_text
plugin while minimizing throttling and costs, follow these best practices:
- Control request size: Adjust the number of records (
RecordsPerRequest
) and characters per request (CharsPerRequest
). - Control query timeout: Set
GlobalTimeout
to a value lower than the query timeout to ensure progress isn’t lost on successful calls up to that point. - Handle rate limits more gracefully: Set retries on throttling (
RetriesOnThrottling
).
Related content
9.5.2 - azure_digital_twins_query_request plugin
The azure_digital_twins_query_request
plugin runs an Azure Digital Twins query as part of a Kusto Query Language (KQL) query. The plugin is invoked with the evaluate
operator.
Using the plugin, you can query across data in both Azure Digital Twins and any data source accessible through KQL. For example, you can perform time series analytics.
For more information about the plugin, see Azure Digital Twins query plugin.
Syntax
evaluate
azure_digital_twins_query_request
(
AdtInstanceEndpoint ,
AdtQuery )
Parameters
Name | Type | Required | Description |
---|---|---|---|
AdtInstanceEndpoint | string | ✔️ | The Azure Digital Twins instance endpoint to be queried. |
AdtQuery | string | ✔️ | The query to run against the Azure Digital Twins endpoint. This query is written in a custom SQL-like query language for Azure Digital Twins, called the Azure Digital Twins query language. For more information, see Query language for Azure Digital Twins. |
Authentication and authorization
The azure_digital_twins_query_request
plugin uses the Microsoft Entra account of the user running the query to authenticate. To run a query, a user must at least be granted the Azure Digital Twins Data Reader role. Information on how to assign this role can be found in Security for Azure Digital Twins solutions.
Examples
The following examples show how you can run various Azure Digital Twins queries, including queries that use additional Kusto expressions.
Retrieval of all twins within an Azure Digital Twins instance
The following example returns all digital twins within an Azure Digital Twins instance.
evaluate azure_digital_twins_query_request(
'https://contoso.api.wcus.digitaltwins.azure.net',
'SELECT T AS Twins FROM DIGITALTWINS T')
Projection of twin properties as columns along with additional Kusto expressions
The following example returns the result from the plugin as separate columns, and then performs additional operations using Kusto expressions.
evaluate azure_digital_twins_query_request(
'https://contoso.api.wcus.digitaltwins.azure.net',
'SELECT T.Temperature, T.Humidity FROM DIGITALTWINS T WHERE IS_PRIMITIVE(T.Temperature) AND IS_PRIMITIVE(T.Humidity)')
| where Temperature > 20
| project TemperatureInC = Temperature, Humidity
Output
TemperatureInC | Humidity |
---|---|
21 | 48 |
49 | 34 |
80 | 32 |
Perform time series analytics
You can use the data history integration feature of Azure Digital Twins to historize digital twin property updates. To learn how to view the historized twin updates, see View the historized twin updates
9.5.3 - cosmosdb_sql_request plugin
The cosmosdb_sql_request
plugin sends a SQL query to an Azure Cosmos DB SQL network endpoint and returns the results of the query. This plugin is primarily designed for querying small datasets, for example, enriching data with reference data stored in Azure Cosmos DB. The plugin is invoked with the evaluate
operator.
Syntax
evaluate
cosmosdb_sql_request
(
ConnectionString ,
SqlQuery [,
SqlParameters [,
Options]] )
[:
OutputSchema]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ConnectionString | string | ✔️ | The connection string that points to the Azure Cosmos DB collection to query. It must include AccountEndpoint, Database, and Collection. It might include AccountKey if a master key is used for authentication. For more information, see Authentication and authorization.Example: 'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>' |
SqlQuery | string | ✔️ | The query to execute. |
SqlParameters | dynamic | The property bag object to pass as parameters along with the query. Parameter names must begin with @ . | |
OutputSchema | The names and types of the expected columns of the cosmosdb_sql_request plugin output. Use the following syntax: ( ColumnName : ColumnType [, …] ) . Specifying this parameter enables multiple query optimizations. | ||
Options | dynamic | A property bag object of advanced settings. If an AccountKey isn’t provided in the ConnectionString, then the armResourceId field of this parameter is required. For more information, see Supported options. |
Supported options
The following table describes the supported fields of the Options parameter.
Name | Type | Description |
---|---|---|
armResourceId | string | The Azure Resource Manager resource ID of the Cosmos DB database. If an account key isn’t provided in the connection string argument, this field is required. In such a case, the armResourceId is used to authenticate to Cosmos DB.Example: armResourceId='/subscriptions/<SubscriptionId>/resourceGroups/<ResourceGroup>/providers/Microsoft.DocumentDb/databaseAccounts/<DatabaseAccount>' |
token | string | A Microsoft Entra access token of a principal with access to the Cosmos DB database. This token is used along with the armResourceId to authenticate with the Azure Resource Manager. If unspecified, the token of the principal that made the query is used.If armResourceId isn’t specified, the token is used directly to access the Cosmos DB database. For more information about the token authentication method, see Authentication and authorization. |
preferredLocations | string | The region from which to query the data.Example: ['East US'] |
Authentication and authorization
To authorize to an Azure Cosmos DB SQL network endpoint, you need to specify the authorization information. The following table provides the supported authentication methods and the description for how to use that method.
Authentication method | Description |
---|---|
Managed identity (Recommended) | Append Authentication="Active Directory Managed Identity";User Id={object_id}; to the connection string. The request is made on behalf of a managed identity which must have the appropriate permissions to the database.To enable managed identity authentication, you must add the managed identity to your cluster and alter the managed identity policy. For more information, see Managed Identity policy. |
Azure Resource Manager resource ID | This authentication method requires specifying the armResourceId and optionally the token in the options. The armResourceId identifies the Cosmos DB database account, and the token must be a valid Microsoft Entra bearer token for a principal with access permissions to the Cosmos DB database. If no token is provided, the Microsoft Entra token of the requesting principal will be used for authentication. |
Account key | You can add the account key directly to the ConnectionString argument. However, this approach is less secure as it involves including the secret in the query text, and is less resilient to future changes in the account key. To enhance security, hide the secret as an obfuscated string literal. |
Token | You can add a token value in the plugin options. The token must belong to a principal with relevant permissions. To enhance security, hide the token as an obfuscated string literal. |
Set callout policy
The plugin makes callouts to the Azure Cosmos DB instance. Make sure that the cluster’s callout policy enables calls of type cosmosdb
to the target CosmosDbUri.
The following example shows how to define the callout policy for Azure Cosmos DB. It’s recommended to restrict it to specific endpoints (my_endpoint1
, my_endpoint2
).
[
{
"CalloutType": "CosmosDB",
"CalloutUriRegex": "my_endpoint1\\.documents\\.azure\\.com",
"CanCall": true
},
{
"CalloutType": "CosmosDB",
"CalloutUriRegex": "my_endpoint2\\.documents\\.azure\\.com",
"CanCall": true
}
]
The following example shows an alter callout policy command for cosmosdb
CalloutType
.alter cluster policy callout @'[{"CalloutType": "cosmosdb", "CalloutUriRegex": "\\.documents\\.azure\\.com", "CanCall": true}]'
Examples
The following examples use placeholder text, in brackets.
Query Azure Cosmos DB with a query-defined output schema
The following example uses the cosmosdb_sql_request plugin to send a SQL query while selecting only specific columns. This query uses explicit schema definitions that allow various optimizations before the actual query is run against Cosmos DB.
evaluate cosmosdb_sql_request(
'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>',
'SELECT c.Id, c.Name from c') : (Id:long, Name:string)
Query Azure Cosmos DB
The following example uses the cosmosdb_sql_request plugin to send a SQL query to fetch data from Azure Cosmos DB using its Azure Cosmos DB for NoSQL.
evaluate cosmosdb_sql_request(
'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>',
'SELECT * from c') // OutputSchema is unknown, so it is not specified. This may harm the performance of the query.
Query Azure Cosmos DB with parameters
The following example uses SQL query parameters and queries the data from an alternate region. For more information, see preferredLocations
.
evaluate cosmosdb_sql_request(
'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>',
"SELECT c.id, c.lastName, @param0 as Column0 FROM c WHERE c.dob >= '1970-01-01T00:00:00Z'",
dynamic({'@param0': datetime(2019-04-16 16:47:26.7423305)}),
dynamic({'preferredLocations': ['East US']})) : (Id:long, Name:string, Column0: datetime)
| where lastName == 'Smith'
Query Azure Cosmos DB and join data with a database table
The following example joins partner data from an Azure Cosmos DB with partner data in a database using the Partner
field. It results in a list of partners with their phone numbers, website, and contact email address sorted by partner name.
evaluate cosmosdb_sql_request(
'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>',
"SELECT c.id, c.Partner, c. phoneNumber FROM c') : (Id:long, Partner:string, phoneNumber:string)
| join kind=innerunique Partner on Partner
| project id, Partner, phoneNumber, website, Contact
| sort by Partner
Query Azure Cosmos DB using token authentication
The following example joins partner data from an Azure Cosmos DB with partner data in a database using the Partner
field. It results in a list of partners with their phone numbers, website, and contact email address sorted by partner name.
evaluate cosmosdb_sql_request(
'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;',
"SELECT c.Id, c.Name, c.City FROM c",
dynamic(null),
dynamic({'token': h'abc123...'})
) : (Id:long, Name:string, City:string)
Query Azure Cosmos DB using Azure Resource Manager resource ID for authentication
The following example uses the Azure Resource Manager resource ID for authentication and the Microsoft Entra token of the requesting principal, since a token isn’t specified. It sends a SQL query while selecting only specific columns and specifies explicit schema definitions.
evaluate cosmosdb_sql_request(
'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;',
"SELECT c.Id, c.Name, c.City FROM c",
dynamic({'armResourceId': '/subscriptions/<SubscriptionId>/resourceGroups/<ResourceGroup>/providers/Microsoft.DocumentDb/databaseAccounts/<DatabaseAccount>'})
) : (Id:long, Name:string, City:string)
9.5.4 - http_request plugin
services: data-explorer
http_request plugin
The http_request
plugin sends an HTTP GET request and converts the response into a table.
Prerequisites
- Run
.enable plugin http_request
to enable the plugin - Set the URI to access as an allowed destination for
webapi
in the Callout policy
Syntax
evaluate
http_request
(
Uri [,
RequestHeaders [,
Options]] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
Uri | string | ✔️ | The destination URI for the HTTP or HTTPS request. |
RequestHeaders | dynamic | A property bag containing HTTP headers to send with the request. | |
Options | dynamic | A property bag containing additional properties of the request. |
Authentication and authorization
To authenticate, use the HTTP standard Authorization
header or any custom header supported by the web service.
Returns
The plugin returns a table that has a single record with the following dynamic columns:
- ResponseHeaders: A property bag with the response header.
- ResponseBody: The response body parsed as a value of type
dynamic
.
If the HTTP response indicates (via the Content-Type
response header) that the media type is application/json
,
the response body is automatically parsed as-if it’s a JSON object. Otherwise, it’s returned as-is.
Headers
The RequestHeaders argument can be used to add custom headers to the outgoing HTTP request. In addition to the standard HTTP request headers and the user-provided custom headers, the plugin also adds the following custom headers:
Name | Description |
---|---|
x-ms-client-request-id | A correlation ID that identifies the request. Multiple invocations of the plugin in the same query will all have the same ID. |
x-ms-readonly | A flag indicating that the processor of this request shouldn’t make any persistent changes. |
Example
The following example retrieves Azure retails prices for Azure Purview in west Europe:
let Uri = "https://prices.azure.com/api/retail/prices?$filter=serviceName eq 'Azure Purview' and location eq 'EU West'";
evaluate http_request(Uri)
| project ResponseBody.Items
| mv-expand ResponseBody_Items
| evaluate bag_unpack(ResponseBody_Items)
Output
armRegionName | armSkuName | currencyCode | effectiveStartDate | isPrimaryMeterRegion | location | meterId | meterName | productId | productName | retailPrice | serviceFamily | serviceId | serviceName | skuId | skuName | tierMinimumUnits | type | unitOfMeasure | unitPrice |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
westeurope | Data Insights | USD | 2022-06-01T00:00:00Z | false | EU West | 8ce915f7-20db-564d-8cc3-5702a7c952ab | Data Insights Report Consumption | DZH318Z08M22 | Azure Purview Data Map | 0.21 | Analytics | DZH318Q66D0F | Azure Purview | DZH318Z08M22/006C | Catalog Insights | 0 | Consumption | 1 API Calls | 0.21 |
westeurope | Data Map Enrichment - Data Insights Generation | USD | 2022-06-01T00:00:00Z | false | EU West | 7ce2db1d-59a0-5193-8a57-0431a10622b6 | Data Map Enrichment - Data Insights Generation vCore | DZH318Z08M22 | Azure Purview Data Map | 0.82 | Analytics | DZH318Q66D0F | Azure Purview | DZH318Z08M22/005C | Data Map Enrichment - Insight Generation | 0 | Consumption | 1 Hour | 0.82 |
westeurope | USD | 2021-09-28T00:00:00Z | false | EU West | 053e2dcb-82c0-5e50-86cd-1f1c8d803705 | Power BI vCore | DZH318Z08M23 | Azure Purview Scanning Ingestion and Classification | 0 | Analytics | DZH318Q66D0F | Azure Purview | DZH318Z08M23/0005 | Power BI | 0 | Consumption | 1 Hour | 0 | |
westeurope | USD | 2021-09-28T00:00:00Z | false | EU West | a7f57f26-5f31-51e5-a5ed-ffc2b0da37b9 | Resource Set vCore | DZH318Z08M22 | Azure Purview Data Map | 0.21 | Analytics | DZH318Q66D0F | Azure Purview | DZH318Z08M22/000X | Resource Set | 0 | Consumption | 1 Hour | 0.21 | |
westeurope | USD | 2021-09-28T00:00:00Z | false | EU West | 5d157295-441c-5ea7-ba7c-5083026dc456 | SQL Server vCore | DZH318Z08M23 | Azure Purview Scanning Ingestion and Classification | 0 | Analytics | DZH318Q66D0F | Azure Purview | DZH318Z08M23/000F | SQL Server | 0 | Consumption | 1 Hour | 0 | |
westeurope | USD | 2021-09-28T00:00:00Z | false | EU West | 0745df0d-ce4f-52db-ac31-ac574d4dcfe5 | Standard Capacity Unit | DZH318Z08M22 | Azure Purview Data Map | 0.411 | Analytics | DZH318Q66D0F | Azure Purview | DZH318Z08M22/0002 | Standard | 0 | Consumption | 1 Hour | 0.411 | |
westeurope | USD | 2021-09-28T00:00:00Z | false | EU West | 811e3118-5380-5ee8-a5d9-01d48d0a0627 | Standard vCore | DZH318Z08M23 | Azure Purview Scanning Ingestion and Classification | 0.63 | Analytics | DZH318Q66D0F | Azure Purview | DZH318Z08M23/0009 | Standard | 0 | Consumption | 1 Hour | 0.63 |
9.5.5 - http_request_post plugin
services: data-explorer
http_request_post plugin
The http_request_post
plugin sends an HTTP POST request and converts the response into a table.
Prerequisites
- Run
.enable plugin http_request_post
to enable the plugin - Set the URI to access as an allowed destination for
webapi
in the Callout policy
Syntax
evaluate
http_request_post
(
Uri [,
RequestHeaders [,
Options [,
Content]]] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
Uri | string | ✔️ | The destination URI for the HTTP or HTTPS request. |
RequestHeaders | dynamic | A property bag containing HTTP headers to send with the request. | |
Options | dynamic | A property bag containing additional properties of the request. | |
Content | string | The body content to send with the request. The content is encoded in UTF-8 and the media type for the Content-Type attribute is application/json . |
Authentication and authorization
To authenticate, use the HTTP standard Authorization
header or any custom header supported by the web service.
Returns
The plugin returns a table that has a single record with the following dynamic columns:
- ResponseHeaders: A property bag with the response header.
- ResponseBody: The response body parsed as a value of type
dynamic
.
If the HTTP response indicates (via the Content-Type
response header) that the media type is application/json
,
the response body is automatically parsed as-if it’s a JSON object. Otherwise, it’s returned as-is.
Headers
The RequestHeaders argument can be used to add custom headers to the outgoing HTTP request. In addition to the standard HTTP request headers and the user-provided custom headers, the plugin also adds the following custom headers:
Name | Description |
---|---|
x-ms-client-request-id | A correlation ID that identifies the request. Multiple invocations of the plugin in the same query will all have the same ID. |
x-ms-readonly | A flag indicating that the processor of this request shouldn’t make any persistent changes. |
Example
The following example is for a hypothetical HTTPS web service that accepts additional request headers and must be authenticated to using Microsoft Entra ID:
let uri='https://example.com/node/js/on/eniac';
let headers=dynamic({'x-ms-correlation-vector':'abc.0.1.0', 'authorization':'bearer ...Azure-AD-bearer-token-for-target-endpoint...'});
evaluate http_request_post(uri, headers)
9.5.6 - mysql_request plugin
The mysql_request
plugin sends a SQL query to an Azure MySQL Server network endpoint and returns the first rowset in the results. The query may return more than one rowset, but only the first rowset is made available for the rest of the Kusto query.
The plugin is invoked with the evaluate
operator.
Syntax
evaluate
mysql_request
(
ConnectionString ,
SqlQuery [,
SqlParameters] )
[:
OutputSchema]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ConnectionString | string | ✔️ | The connection string that points at the MySQL Server network endpoint. See authentication and how to specify the network endpoint. |
SqlQuery | string | ✔️ | The query that is to be executed against the SQL endpoint. Must return one or more row sets. Only the first set is made available for the rest of the query. |
SqlParameters | dynamic | A property bag object that holds key-value pairs to pass as parameters along with the query. | |
OutputSchema | The names and types for the expected columns of the mysql_request plugin output.Syntax: ( ColumnName : ColumnType [, …] ) |
Authentication and authorization
To authorize to a MySQL Server network endpoint, you need to specify the authorization information in the connection string. The supported authorization method is via username and password.
Set callout policy
The plugin makes callouts to the MySql database. Make sure that the cluster’s callout policy enables calls of type mysql
to the target MySqlDbUri.
The following example shows how to define the callout policy for MySQL databases. We recommend restricting the callout policy to specific endpoints (my_endpoint1
, my_endpoint2
).
[
{
"CalloutType": "mysql",
"CalloutUriRegex": "my_endpoint1\\.mysql\\.database\\.azure\\.com",
"CanCall": true
},
{
"CalloutType": "mysql",
"CalloutUriRegex": "my_endpoint2\\.mysql\\.database\\.azure\\.com",
"CanCall": true
}
]
The following example shows an .alter callout policy
command for mysql
CalloutType:
.alter cluster policy callout @'[{"CalloutType": "mysql", "CalloutUriRegex": "\\.mysql\\.database\\.azure\\.com", "CanCall": true}]'
Username and password authentication
The mysql_request
plugin only supports username and password authentication to the MySQL server endpoint and doesn’t integrate with Microsoft Entra authentication.
The username and password are provided as part of the connections string using the following parameters:
User ID=...; Password=...;
Encryption and server validation
For security, SslMode
is unconditionally set to Required
when connecting to a MySQL server network endpoint. As a result, the server must be configured with a valid SSL/TLS server certificate.
Specify the network endpoint
Specify the MySQL network endpoint as part of the connection string.
Syntax:
Server
=
FQDN [Port
=
Port]
Where:
- FQDN is the fully qualified domain name of the endpoint.
- Port is the TCP port of the endpoint. By default,
3306
is assumed.
Examples
SQL query to Azure MySQL DB
The following example sends a SQL query to an Azure MySQL database. It retrieves all records from [dbo].[Table]
, and then processes the results.
evaluate mysql_request(
'Server=contoso.mysql.database.azure.com; Port = 3306;'
'Database=Fabrikam;'
h'UID=USERNAME;'
h'Pwd=PASSWORD;',
'select * from `dbo`.`Table`') : (Id: int, Name: string)
| where Id > 0
| project Name
SQL query to an Azure MySQL database with modifications
The following example sends a SQL query to an Azure MySQL database
retrieving all records from [dbo].[Table]
, while appending another datetime
column,
and then processes the results on the Kusto side.
It specifies a SQL parameter (@param0
) to be used in the SQL query.
evaluate mysql_request(
'Server=contoso.mysql.database.azure.com; Port = 3306;'
'Database=Fabrikam;'
h'UID=USERNAME;'
h'Pwd=PASSWORD;',
'select *, @param0 as dt from `dbo`.`Table`',
dynamic({'param0': datetime(2020-01-01 16:47:26.7423305)})) : (Id:long, Name:string, dt: datetime)
| where Id > 0
| project Name
SQL query to an Azure MySQL database without a query-defined output schema
The following example sends a SQL query to an Azure MySQL database without an output schema. This is not recommended unless the schema is unknown, as it may impact the performance of the query.
evaluate mysql_request(
'Server=contoso.mysql.database.azure.com; Port = 3306;'
'Database=Fabrikam;'
h'UID=USERNAME;'
h'Pwd=PASSWORD;',
'select * from `dbo`.`Table`')
| where Id > 0
| project Name
9.5.7 - postgresql_request plugin
The postgresql_request
plugin sends a SQL query to an Azure PostgreSQL Server network endpoint and returns the first rowset in the results. The query may return more than one rowset, but only the first rowset is made available for the rest of the Kusto query.
The plugin is invoked with the evaluate
operator.
Syntax
evaluate
postgresql_request
(
ConnectionString ,
SqlQuery [,
SqlParameters] )
[:
OutputSchema]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ConnectionString | string | ✔️ | The connection string that points at the PostgreSQL Server network endpoint. See authentication and how to specify the network endpoint. |
SqlQuery | string | ✔️ | The query that is to be executed against the SQL endpoint. Must return one or more row sets. Only the first set is made available for the rest of the query. |
SqlParameters | dynamic | A property bag object that holds key-value pairs to pass as parameters along with the query. | |
OutputSchema | The names and types for the expected columns of the postgresql_request plugin output.Syntax: ( ColumnName : ColumnType [, …] ) |
Authentication and authorization
To authorize a PostgreSQL Server network endpoint, you must specify the authorization information in the connection string. The supported authorization method is via username and password.
Set callout policy
The plugin makes callouts to the PostgreSQL database. Make sure that the cluster’s callout policy enables calls of type postgresql
to the target PostgreSqlDbUri.
The following example shows how to define the callout policy for PostgreSQL databases. We recommend restricting the callout policy to specific endpoints (my_endpoint1
, my_endpoint2
).
[
{
"CalloutType": "postgresql",
"CalloutUriRegex": "my_endpoint1\\.postgres\\.database\\.azure\\.com",
"CanCall": true
},
{
"CalloutType": "postgresql",
"CalloutUriRegex": "my_endpoint2\\.postgres\\.database\\.azure\\.com",
"CanCall": true
}
]
The following example shows a .alter callout policy
command for postgresql
CalloutType:
.alter cluster policy callout @'[{"CalloutType": "postgresql", "CalloutUriRegex": "\\.postgresql\\.database\\.azure\\.com", "CanCall": true}]'
Username and password authentication
The postgresql_request
plugin only supports username and password authentication to the PostgreSQL server endpoint and doesn’t integrate with Microsoft Entra authentication.
The username and password are provided as part of the connections string using the following parameters:
User ID=...; Password=...;
Encryption and server validation
For security, SslMode
is unconditionally set to Required
when connecting to a PostgreSQL server network endpoint. As a result, the server must be configured with a valid SSL/TLS server certificate.
Specify the network endpoint
Specify the PostgreSQL network endpoint as part of the connection string.
Syntax:
Host
=
FQDN [Port
=
Port]
Where:
- FQDN is the fully qualified domain name of the endpoint.
- Port is the TCP port of the endpoint.
Examples
SQL query to Azure PostgreSQL DB
The following example sends a SQL query to an Azure PostgreSQL database. It retrieves all records from public."Table"
, and then processes the results.
evaluate postgresql_request(
'Host=contoso.postgres.database.azure.com; Port = 5432;'
'Database=Fabrikam;'
h'User Id=USERNAME;'
h'Password=PASSWORD;',
'select * from public."Table"') : (Id: int, Name: string)
| where Id > 0
| project Name
SQL query to an Azure PostgreSQL database with modifications
The following example sends a SQL query to an Azure PostgreSQL database
retrieving all records from public."Table"
, while appending another datetime
column,
and then processes the results.
It specifies a SQL parameter (@param0
) to be used in the SQL query.
evaluate postgresql_request(
'Server=contoso.postgres.database.azure.com; Port = 5432;'
'Database=Fabrikam;'
h'User Id=USERNAME;'
h'Password=PASSWORD;',
'select *, @param0 as dt from public."Table"',
dynamic({'param0': datetime(2020-01-01 16:47:26.7423305)})) : (Id: int, Name: string, dt: datetime)
| where Id > 0
| project Name
SQL query to an Azure PostgreSQL database without a query-defined output schema
The following example sends a SQL query to an Azure PostgreSQL database without an output schema. This is not recommended unless the schema is unknown, as it may impact the performance of the query
evaluate postgresql_request(
'Host=contoso.postgres.database.azure.com; Port = 5432;'
'Database=Fabrikam;'
h'User Id=USERNAME;'
h'Password=PASSWORD;',
'select * from public."Table"')
| where Id > 0
| project Name
9.5.8 - sql_request plugin
The sql_request
plugin sends a SQL query to an Azure SQL Server network endpoint and returns the results.
If more than one rowset is returned by SQL, only the first one is used.
The plugin is invoked with the evaluate
operator.
Syntax
evaluate
sql_request
(
ConnectionString ,
SqlQuery [,
SqlParameters [,
Options]] )
[:
OutputSchema]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ConnectionString | string | ✔️ | The connection string that points at the SQL Server network endpoint. See valid methods of authentication and how to specify the network endpoint. |
SqlQuery | string | ✔️ | The query that is to be executed against the SQL endpoint. The query must return one or more row sets, but only the first one is made available for the rest of the Kusto query. |
SqlParameters | dynamic | A property bag of key-value pairs to pass as parameters along with the query. | |
Options | dynamic | A property bag of key-value pairs to pass more advanced settings along with the query. Currently, only token can be set, to pass a caller-provided Microsoft Entra access token that is forwarded to the SQL endpoint for authentication. | |
OutputSchema | string | The names and types for the expected columns of the sql_request plugin output. Use the following syntax: ( ColumnName : ColumnType [, …] ) . |
Authentication and authorization
The sql_request plugin supports the following three methods of authentication to the SQL Server endpoint.
|Authentication method|Syntax|How|Description|
|–|–|–|
|Microsoft Entra integrated|Authentication="Active Directory Integrated"
|Add to the ConnectionString parameter.| The user or application authenticates via Microsoft Entra ID to your cluster, and the same token is used to access the SQL Server network endpoint.
The principal must have the appropriate permissions on the SQL resource to perform the requested action. For example, to read from the database the principal needs table SELECT permissions, and to write to an existing table the principal needs UPDATE and INSERT permissions. To write to a new table, CREATE permissions are also required.|
|Managed identity|Authentication="Active Directory Managed Identity";User Id={object_id}
|Add to the ConnectionString parameter.| The request is executed on behalf of a managed identity. The managed identity must have the appropriate permissions on the SQL resource to perform the requested action.
To enable managed identity authentication, you must add the managed identity to your cluster and alter the managed identity policy. For more information, see Managed Identity policy. |
|Username and password|User ID=...; Password=...;
|Add to the ConnectionString parameter.|When possible, avoid this method as it may be less secure.|
|Microsoft Entra access token|dynamic({'token': h"eyJ0..."})
|Add in the Options parameter.|The access token is passed as token
property in the Options argument of the plugin.|
Examples
Send a SQL query using Microsoft Entra integrated authentication
The following example sends a SQL query to an Azure SQL DB database. It
retrieves all records from [dbo].[Table]
, and then processes the results on the
Kusto side. Authentication reuses the calling user’s Microsoft Entra token.
evaluate sql_request(
'Server=tcp:contoso.database.windows.net,1433;'
'Authentication="Active Directory Integrated";'
'Initial Catalog=Fabrikam;',
'select * from [dbo].[Table]') : (Id:long, Name:string)
| where Id > 0
| project Name
Send a SQL query using Username/Password authentication
The following example is identical to the previous one, except that SQL authentication is done by username/password. For confidentiality, we use obfuscated strings here.
evaluate sql_request(
'Server=tcp:contoso.database.windows.net,1433;'
'Initial Catalog=Fabrikam;'
h'User ID=USERNAME;'
h'Password=PASSWORD;',
'select * from [dbo].[Table]') : (Id:long, Name:string)
| where Id > 0
| project Name
Send a SQL query using a Microsoft Entra access token
The following example sends a SQL query to an Azure SQL database
retrieving all records from [dbo].[Table]
, while appending another datetime
column,
and then processes the results on the Kusto side.
It specifies a SQL parameter (@param0
) to be used in the SQL query.
evaluate sql_request(
'Server=tcp:contoso.database.windows.net,1433;'
'Authentication="Active Directory Integrated";'
'Initial Catalog=Fabrikam;',
'select *, @param0 as dt from [dbo].[Table]',
dynamic({'param0': datetime(2020-01-01 16:47:26.7423305)})) : (Id:long, Name:string, dt: datetime)
| where Id > 0
| project Name
Send a SQL query without a query-defined output schema
The following example sends a SQL query to an Azure SQL database without an output schema. This is not recommended unless the schema is unknown, as it may impact the performance of the query
evaluate sql_request(
'Server=tcp:contoso.database.windows.net,1433;'
'Initial Catalog=Fabrikam;'
h'User ID=USERNAME;'
h'Password=PASSWORD;',
'select * from [dbo].[Table]')
| where Id > 0
| project Name
Encryption and server validation
The following connection properties are forced when connecting to a SQL Server network endpoint, for security reasons.
Encrypt
is set totrue
unconditionally.TrustServerCertificate
is set tofalse
unconditionally.
As a result, the SQL Server must be configured with a valid SSL/TLS server certificate.
Specify the network endpoint
Specifying the SQL network endpoint as part of the connection string is mandatory. The appropriate syntax is:
Server
=
tcp:
FQDN [,
Port]
Where:
- FQDN is the fully qualified domain name of the endpoint.
- Port is the TCP port of the endpoint. By default,
1433
is assumed.
9.6 - User and sequence analytics plugins
9.6.1 - active_users_count plugin
Calculates distinct count of values, where each value has appeared in at least a minimum number of periods in a lookback period.
Useful for calculating distinct counts of “fans” only, while not including appearances of “non-fans”. A user is counted as a “fan” only if it was active during the lookback period. The lookback period is only used to determine whether a user is considered active
(“fan”) or not. The aggregation itself doesn’t include users from the lookback window. In comparison, the sliding_window_counts aggregation is performed over a sliding window of the lookback period.
Syntax
T | evaluate
active_users_count(
IdColumn,
TimelineColumn,
Start,
End,
LookbackWindow,
Period,
ActivePeriodsCount,
Bin ,
[dim1,
dim2,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input used to count active users. |
IdColumn | string | ✔️ | The name of the column with ID values that represent user activity. |
TimelineColumn | string | ✔️ | The name of the column that represents timeline. |
Start | datetime | ✔️ | The analysis start period. |
End | datetime | ✔️ | The analysis end period. |
LookbackWindow | timespan | ✔️ | The time window defining a period where user appearance is checked. The lookback period starts at ([current appearance] - [lookback window]) and ends on ([current appearance]). |
Period | timespan | ✔️ | A constant to count as single appearance (a user will be counted as active if it appears in at least distinct ActivePeriodsCount of this timespan. |
ActivePeriodsCount | decimal | ✔️ | The minimal number of distinct active periods to decide if user is active. Active users are those users who appeared in at least (equal or greater than) active periods count. |
Bin | decimal, datetime, or timespan | ✔️ | A constant value of the analysis step period. May also be a string of week , month , or year . All periods will be the corresponding startofweek, startofmonth, orstartofyear functions. |
dim1, dim2, … | dynamic | An array of the dimensions columns that slice the activity metrics calculation. |
Returns
Returns a table that has distinct count values for IDs that have appeared in ActivePeriodCounts in the following periods: the lookback period, each timeline period, and each existing dimensions combination.
Output table schema is:
TimelineColumn | dim1 | .. | dim_n | dcount_values |
---|---|---|---|---|
type: as of TimelineColumn | .. | .. | .. | long |
Examples
Calculate weekly number of distinct users that appeared in at least three different days over a period of prior eight days. Period of analysis: July 2018.
let Start = datetime(2018-07-01);
let End = datetime(2018-07-31);
let LookbackWindow = 8d;
let Period = 1d;
let ActivePeriods = 3;
let Bin = 7d;
let T = datatable(User:string, Timestamp:datetime)
[
"B", datetime(2018-06-29),
"B", datetime(2018-06-30),
"A", datetime(2018-07-02),
"B", datetime(2018-07-04),
"B", datetime(2018-07-08),
"A", datetime(2018-07-10),
"A", datetime(2018-07-14),
"A", datetime(2018-07-17),
"A", datetime(2018-07-20),
"B", datetime(2018-07-24)
];
T | evaluate active_users_count(User, Timestamp, Start, End, LookbackWindow, Period, ActivePeriods, Bin)
Output
Timestamp | dcount |
---|---|
2018-07-01 00:00:00.0000000 | 1 |
2018-07-15 00:00:00.0000000 | 1 |
A user is considered active if it fulfills both of the following criteria:
- The user was seen in at least three distinct days (Period = 1d, ActivePeriods=3).
- The user was seen in a lookback window of 8d before and including their current appearance.
In the illustration below, the only appearances that are active by this criteria are the following instances: User A on 7/20 and User B on 7/4 (see plugin results above). The appearances of User B are included for the lookback window on 7/4, but not for the Start-End time range of 6/29-30.
9.6.2 - activity_counts_metrics plugin
Calculates useful activity metrics for each time window compared/aggregated to all previous time windows. Metrics include: total count values, distinct count values, distinct count of new values, and aggregated distinct count. Compare this plugin to activity_metrics plugin, in which every time window is compared to its previous time window only.
Syntax
T | evaluate
activity_counts_metrics(
IdColumn,
TimelineColumn,
Start,
End,
Step [,
Dimensions])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input used to count activities. |
IdColumn | string | ✔️ | The name of the column with ID values that represent user activity. |
TimelineColumn | string | ✔️ | The name of the column that represents the timeline. |
Start | datetime | ✔️ | The analysis start period. |
End | datetime | ✔️ | The analysis end period. |
Step | decimal, datetime, or timespan | ✔️ | The analysis window period. The value may also be a string of week , month , or year , in which case all periods would be startofweek, startofmonth, or startofyear. |
Dimensions | string | Zero or more comma-separated dimensions columns that slice the activity metrics calculation. |
Returns
Returns a table that has the total count values, distinct count values, distinct count of new values, and aggregated distinct count for each time window. If Dimensions are provided, then there’s another column for each dimension in the output table.
The following table describes the output table schema.
Column name | Type | Description |
---|---|---|
Timestamp | Same as the provided TimelineColumn argument | The time window start time. |
count | long | The total records count in the time window and dim(s) |
dcount | long | The distinct ID values count in the time window and dim(s) |
new_dcount | long | The distinct ID values in the time window and dim(s) compared to all previous time windows. |
aggregated_dcount | long | The total aggregated distinct ID values of dim(s) from first-time window to current (inclusive). |
Examples
Daily activity counts
The next query calculates daily activity counts for the provided input table.
let start=datetime(2017-08-01);
let end=datetime(2017-08-04);
let window=1d;
let T = datatable(UserId:string, Timestamp:datetime)
[
'A', datetime(2017-08-01),
'D', datetime(2017-08-01),
'J', datetime(2017-08-01),
'B', datetime(2017-08-01),
'C', datetime(2017-08-02),
'T', datetime(2017-08-02),
'J', datetime(2017-08-02),
'H', datetime(2017-08-03),
'T', datetime(2017-08-03),
'T', datetime(2017-08-03),
'J', datetime(2017-08-03),
'B', datetime(2017-08-03),
'S', datetime(2017-08-03),
'S', datetime(2017-08-04),
];
T
| evaluate activity_counts_metrics(UserId, Timestamp, start, end, window)
Output
Timestamp | count | dcount | new_dcount | aggregated_dcount |
---|---|---|---|---|
2017-08-01 00:00:00.0000000 | 4 | 4 | 4 | 4 |
2017-08-02 00:00:00.0000000 | 3 | 3 | 2 | 6 |
2017-08-03 00:00:00.0000000 | 6 | 5 | 2 | 8 |
2017-08-04 00:00:00.0000000 | 1 | 1 | 0 | 8 |
9.6.3 - activity_engagement plugin
Calculates activity engagement ratio based on ID column over a sliding timeline window.
The activity_engagement plugin can be used for calculating DAU/WAU/MAU (daily/weekly/monthly activities).
Syntax
T | evaluate
activity_engagement(
IdColumn,
TimelineColumn,
[Start,
End,
] InnerActivityWindow,
OuterActivityWindow [,
dim1,
dim2,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input used to calculate engagement. |
IdCoumn | string | ✔️ | The name of the column with ID values that represent user activity. |
TimelineColumn | string | ✔️ | The name of the column that represents timeline. |
Start | datetime | The analysis start period. | |
End | datetime | The analysis end period. | |
InnerActivityWindow | timespan | ✔️ | The inner-scope analysis window period. |
OuterActivityWindow | timespan | ✔️ | The outer-scope analysis window period. |
dim1, dim2, … | dynamic | An array of the dimensions columns that slice the activity metrics calculation. |
Returns
Returns a table that has a distinct count of ID values inside an inner-scope window, inside an outer-scope window, and the activity ratio for each inner-scope window period for each existing dimensions combination.
Output table schema is:
TimelineColumn | dcount_activities_inner | dcount_activities_outer | activity_ratio | dim1 | .. | dim_n | |||
---|---|---|---|---|---|---|---|---|---|
type: as of TimelineColumn | long | long | double | .. | .. | .. |
Examples
DAU/WAU calculation
The following example calculates DAU/WAU (Daily Active Users / Weekly Active Users ratio) over a randomly generated data.
// Generate random data of user activities
let _start = datetime(2017-01-01);
let _end = datetime(2017-01-31);
range _day from _start to _end step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+100*r-1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
// Calculate DAU/WAU ratio
| evaluate activity_engagement(['id'], _day, _start, _end, 1d, 7d)
| project _day, Dau_Wau=activity_ratio*100
| render timechart
:::image type=“content” source=“media/activity-engagement-plugin/activity-engagement-dau-wau.png” border=“false” alt-text=“Graph displaying the ratio of daily active users to weekly active users as specified in the query.”:::
DAU/MAU calculation
The following example calculates DAU/WAU (Daily Active Users / Weekly Active Users ratio) over a randomly generated data.
// Generate random data of user activities
let _start = datetime(2017-01-01);
let _end = datetime(2017-05-31);
range _day from _start to _end step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+100*r-1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
// Calculate DAU/MAU ratio
| evaluate activity_engagement(['id'], _day, _start, _end, 1d, 30d)
| project _day, Dau_Mau=activity_ratio*100
| render timechart
:::image type=“content” source=“media/activity-engagement-plugin/activity-engagement-dau-mau.png” border=“false” alt-text=“Graph displaying the ratio of daily active users to monthly active users as specified in the query.”:::
DAU/MAU calculation with additional dimensions
The following example calculates DAU/WAU (Daily Active Users / Weekly Active Users ratio) over a randomly generated data with additional dimension (mod3
).
// Generate random data of user activities
let _start = datetime(2017-01-01);
let _end = datetime(2017-05-31);
range _day from _start to _end step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+100*r-1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
| extend mod3 = strcat("mod3=", id % 3)
// Calculate DAU/MAU ratio
| evaluate activity_engagement(['id'], _day, _start, _end, 1d, 30d, mod3)
| project _day, Dau_Mau=activity_ratio*100, mod3
| render timechart
:::image type=“content” source=“media/activity-engagement-plugin/activity-engagement-dau-mau-mod3.png” border=“false” alt-text=“Graph displaying the ratio of daily active users to monthly active users with modulo 3 as specified in the query.”:::
9.6.4 - activity_metrics plugin
Calculates useful metrics that include distinct count values, distinct count of new values, retention rate, and churn rate. This plugin is different from activity_counts_metrics plugin in which every time window is compared to all previous time windows.
Syntax
T | evaluate
activity_metrics(
IdColumn,
TimelineColumn,
[Start,
End,
] Window [,
dim1,
dim2,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input used to calculate activity metrics. |
IdCoumn | string | ✔️ | The name of the column with ID values that represent user activity. |
TimelineColumn | string | ✔️ | The name of the column that represents timeline. |
Start | datetime | ✔️ | The analysis start period. |
End | datetime | ✔️ | The analysis end period. |
Step | decimal, datetime, or timespan | ✔️ | The analysis window period. This value may also be a string of week , month , or year , in which case all periods will be startofweek, startofmonth, or startofyear respectively. |
dim1, dim2, … | dynamic | An array of the dimensions columns that slice the activity metrics calculation. |
Returns
The plugin returns a table with the distinct count values, distinct count of new values, retention rate, and churn rate for each timeline period for each existing dimensions combination.
Output table schema is:
TimelineColumn | dcount_values | dcount_newvalues | retention_rate | churn_rate | dim1 | .. | dim_n | |||
---|---|---|---|---|---|---|---|---|---|---|
type: as of TimelineColumn | long | long | double | double | .. | .. | .. |
Notes
Retention Rate Definition
Retention Rate
over a period is calculated as:
where the # of customers returned during the period
is defined as:
Retention Rate
can vary from 0.0 to 1.0
A higher score means a larger number of returning users.
Churn Rate Definition
Churn Rate
over a period is calculated as:
where the # of customer lost in the period
is defined as:
Churn Rate
can vary from 0.0 to 1.0
The higher score means the larger number of users are NOT returning to the service.
Churn vs. Retention Rate
The churn vs. retention Rate is derived from the definition of Churn Rate
and Retention Rate
. The following calculation is always true:
Examples
Weekly retention rate and churn rate
The next query calculates retention and churn rate for week-over-week window.
// Generate random data of user activities
let _start = datetime(2017-01-02);
let _end = datetime(2017-05-31);
range _day from _start to _end step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+200*r-1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
//
| evaluate activity_metrics(['id'], _day, _start, _end, 7d)
| project _day, retention_rate, churn_rate
| render timechart
Output
_day | retention_rate | churn_rate |
---|---|---|
2017-01-02 00:00:00.0000000 | NaN | NaN |
2017-01-09 00:00:00.0000000 | 0.179910044977511 | 0.820089955022489 |
2017-01-16 00:00:00.0000000 | 0.744374437443744 | 0.255625562556256 |
2017-01-23 00:00:00.0000000 | 0.612096774193548 | 0.387903225806452 |
2017-01-30 00:00:00.0000000 | 0.681141439205955 | 0.318858560794045 |
2017-02-06 00:00:00.0000000 | 0.278145695364238 | 0.721854304635762 |
2017-02-13 00:00:00.0000000 | 0.223172628304821 | 0.776827371695179 |
2017-02-20 00:00:00.0000000 | 0.38 | 0.62 |
2017-02-27 00:00:00.0000000 | 0.295519001701645 | 0.704480998298355 |
2017-03-06 00:00:00.0000000 | 0.280387770320656 | 0.719612229679344 |
2017-03-13 00:00:00.0000000 | 0.360628154795289 | 0.639371845204711 |
2017-03-20 00:00:00.0000000 | 0.288008028098344 | 0.711991971901656 |
2017-03-27 00:00:00.0000000 | 0.306134969325153 | 0.693865030674847 |
2017-04-03 00:00:00.0000000 | 0.356866537717602 | 0.643133462282398 |
2017-04-10 00:00:00.0000000 | 0.495098039215686 | 0.504901960784314 |
2017-04-17 00:00:00.0000000 | 0.198296836982968 | 0.801703163017032 |
2017-04-24 00:00:00.0000000 | 0.0618811881188119 | 0.938118811881188 |
2017-05-01 00:00:00.0000000 | 0.204657727593507 | 0.795342272406493 |
2017-05-08 00:00:00.0000000 | 0.517391304347826 | 0.482608695652174 |
2017-05-15 00:00:00.0000000 | 0.143667296786389 | 0.856332703213611 |
2017-05-22 00:00:00.0000000 | 0.199122325836533 | 0.800877674163467 |
2017-05-29 00:00:00.0000000 | 0.063468992248062 | 0.936531007751938 |
:::image type=“content” source=“media/activity-metrics-plugin/activity-metrics-churn-and-retention.png” border=“false” alt-text=“Table showing the calculated retention and churn rates per seven days as specified in the query.”:::
Distinct values and distinct ’new’ values
The next query calculates distinct values and ’new’ values (IDs that didn’t appear in previous time window) for week-over-week window.
// Generate random data of user activities
let _start = datetime(2017-01-02);
let _end = datetime(2017-05-31);
range _day from _start to _end step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+200*r-1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
//
| evaluate activity_metrics(['id'], _day, _start, _end, 7d)
| project _day, dcount_values, dcount_newvalues
| render timechart
Output
_day | dcount_values | dcount_newvalues |
---|---|---|
2017-01-02 00:00:00.0000000 | 630 | 630 |
2017-01-09 00:00:00.0000000 | 738 | 575 |
2017-01-16 00:00:00.0000000 | 1187 | 841 |
2017-01-23 00:00:00.0000000 | 1092 | 465 |
2017-01-30 00:00:00.0000000 | 1261 | 647 |
2017-02-06 00:00:00.0000000 | 1744 | 1043 |
2017-02-13 00:00:00.0000000 | 1563 | 432 |
2017-02-20 00:00:00.0000000 | 1406 | 818 |
2017-02-27 00:00:00.0000000 | 1956 | 1429 |
2017-03-06 00:00:00.0000000 | 1593 | 848 |
2017-03-13 00:00:00.0000000 | 1801 | 1423 |
2017-03-20 00:00:00.0000000 | 1710 | 1017 |
2017-03-27 00:00:00.0000000 | 1796 | 1516 |
2017-04-03 00:00:00.0000000 | 1381 | 1008 |
2017-04-10 00:00:00.0000000 | 1756 | 1162 |
2017-04-17 00:00:00.0000000 | 1831 | 1409 |
2017-04-24 00:00:00.0000000 | 1823 | 1164 |
2017-05-01 00:00:00.0000000 | 1811 | 1353 |
2017-05-08 00:00:00.0000000 | 1691 | 1246 |
2017-05-15 00:00:00.0000000 | 1812 | 1608 |
2017-05-22 00:00:00.0000000 | 1740 | 1017 |
2017-05-29 00:00:00.0000000 | 960 | 756 |
:::image type=“content” source=“media/activity-metrics-plugin/activity-metrics-dcount-and-dcount-newvalues.png” border=“false” alt-text=“Table showing the count of distinct values (dcount_values) and of new distinct values (dcount_newvalues) that didn’t appear in previous time window as specified in the query.”:::
9.6.5 - funnel_sequence plugin
Calculates distinct count of users who have taken a sequence of states, and the distribution of previous/next states that have led to/were followed by the sequence. The plugin is invoked with the evaluate
operator.
Syntax
T | evaluate
funnel_sequence(
IdColumn,
TimelineColumn,
Start,
End,
MaxSequenceStepWindow, Step, StateColumn, Sequence)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
IdColum | string | ✔️ | The column reference representing the ID. This column must be present in T. |
TimelineColumn | string | ✔️ | The column reference representing the timeline. This column must be present in T. |
Start | datetime, timespan, or long | ✔️ | The analysis start period. |
End | datetime, timespan, or long | ✔️ | The analysis end period. |
MaxSequenceStepWindow | datetime, timespan, or long | ✔️ | The value of the max allowed timespan between two sequential steps in the sequence. |
Step | datetime, timespan, or long | ✔️ | The analysis step period, or bin. |
StateColumn | string | ✔️ | The column reference representing the state. This column must be present in T. |
Sequence | dynamic | ✔️ | An array with the sequence values that are looked up in StateColumn . |
Returns
Returns three output tables, which are useful for constructing a sankey diagram for the analyzed sequence:
Table #1 - prev-sequence-next
dcount
- TimelineColumn: the analyzed time window
- prev: the prev state (may be empty if there were any users that only had events for the searched sequence, but not any events prior to it).
- next: the next state (may be empty if there were any users that only had events for the searched sequence, but not any events that followed it).
dcount
: distinct count ofIdColumn
in time window that transitionedprev
–>Sequence
–>next
.- samples: an array of IDs (from
IdColumn
) corresponding to the row’s sequence (a maximum of 128 IDs are returned).
Table #2 - prev-sequence
dcount
- TimelineColumn: the analyzed time window
- prev: the prev state (may be empty if there were any users that only had events for the searched sequence, but not any events prior to it).
dcount
: distinct count ofIdColumn
in time window that transitionedprev
–>Sequence
–>next
.- samples: an array of IDs (from
IdColumn
) corresponding to the row’s sequence (a maximum of 128 IDs are returned).
Table #3 - sequence-next
dcount
- TimelineColumn: the analyzed time window
- next: the next state (may be empty if there were any users that only had events for the searched sequence, but not any events that followed it).
dcount
: distinct count ofIdColumn
in time window that transitionedprev
–>Sequence
–>next
.- samples: an array of IDs (from
IdColumn
) corresponding to the row’s sequence (a maximum of 128 IDs are returned).
Examples
Exploring storm events
The following query looks at the table StormEvents (weather statistics for 2007) and shows which events happened before/after all Tornado events occurred in 2007.
// Looking on StormEvents statistics:
// Q1: What happens before Tornado event?
// Q2: What happens after Tornado event?
StormEvents
| evaluate funnel_sequence(EpisodeId, StartTime, datetime(2007-01-01), datetime(2008-01-01), 1d,365d, EventType, dynamic(['Tornado']))
Result includes three tables:
- Table #1: All possible variants of what happened before and after the sequence. For example, the second line means that there were 87 different events that had following sequence:
Hail
->Tornado
->Hail
StartTime | prev | next | dcount |
---|---|---|---|
2007-01-01 00:00:00.0000000 | 293 | ||
2007-01-01 00:00:00.0000000 | Hail | Hail | 87 |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | Thunderstorm Wind | 77 |
2007-01-01 00:00:00.0000000 | Hail | Thunderstorm Wind | 28 |
2007-01-01 00:00:00.0000000 | Hail | 28 | |
2007-01-01 00:00:00.0000000 | Hail | 27 | |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | 25 | |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | Hail | 24 |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | 24 | |
2007-01-01 00:00:00.0000000 | Flash Flood | Flash Flood | 12 |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | Flash Flood | 8 |
2007-01-01 00:00:00.0000000 | Flash Flood | 8 | |
2007-01-01 00:00:00.0000000 | Funnel Cloud | Thunderstorm Wind | 6 |
2007-01-01 00:00:00.0000000 | Funnel Cloud | 6 | |
2007-01-01 00:00:00.0000000 | Flash Flood | 6 | |
2007-01-01 00:00:00.0000000 | Funnel Cloud | Funnel Cloud | 6 |
2007-01-01 00:00:00.0000000 | Hail | Flash Flood | 4 |
2007-01-01 00:00:00.0000000 | Flash Flood | Thunderstorm Wind | 4 |
2007-01-01 00:00:00.0000000 | Hail | Funnel Cloud | 4 |
2007-01-01 00:00:00.0000000 | Funnel Cloud | Hail | 4 |
2007-01-01 00:00:00.0000000 | Funnel Cloud | 4 | |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | Funnel Cloud | 3 |
2007-01-01 00:00:00.0000000 | Heavy Rain | Thunderstorm Wind | 2 |
2007-01-01 00:00:00.0000000 | Flash Flood | Funnel Cloud | 2 |
2007-01-01 00:00:00.0000000 | Flash Flood | Hail | 2 |
2007-01-01 00:00:00.0000000 | Strong Wind | Thunderstorm Wind | 1 |
2007-01-01 00:00:00.0000000 | Heavy Rain | Flash Flood | 1 |
2007-01-01 00:00:00.0000000 | Heavy Rain | Hail | 1 |
2007-01-01 00:00:00.0000000 | Hail | Flood | 1 |
2007-01-01 00:00:00.0000000 | Lightning | Hail | 1 |
2007-01-01 00:00:00.0000000 | Heavy Rain | Lightning | 1 |
2007-01-01 00:00:00.0000000 | Funnel Cloud | Heavy Rain | 1 |
2007-01-01 00:00:00.0000000 | Flash Flood | Flood | 1 |
2007-01-01 00:00:00.0000000 | Flood | Flash Flood | 1 |
2007-01-01 00:00:00.0000000 | Heavy Rain | 1 | |
2007-01-01 00:00:00.0000000 | Funnel Cloud | Lightning | 1 |
2007-01-01 00:00:00.0000000 | Lightning | Thunderstorm Wind | 1 |
2007-01-01 00:00:00.0000000 | Flood | Thunderstorm Wind | 1 |
2007-01-01 00:00:00.0000000 | Hail | Lightning | 1 |
2007-01-01 00:00:00.0000000 | Lightning | 1 | |
2007-01-01 00:00:00.0000000 | Tropical Storm | Hurricane (Typhoon) | 1 |
2007-01-01 00:00:00.0000000 | Coastal Flood | 1 | |
2007-01-01 00:00:00.0000000 | Rip Current | 1 | |
2007-01-01 00:00:00.0000000 | Heavy Snow | 1 | |
2007-01-01 00:00:00.0000000 | Strong Wind | 1 |
- Table #2: shows all distinct events grouped by the previous event. For example, the second line shows that there were a total of 150 events of
Hail
that happened just beforeTornado
.
StartTime | prev | dcount |
---|---|---|
2007-01-01 00:00:00.0000000 | 331 | |
2007-01-01 00:00:00.0000000 | Hail | 150 |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | 135 |
2007-01-01 00:00:00.0000000 | Flash Flood | 28 |
2007-01-01 00:00:00.0000000 | Funnel Cloud | 22 |
2007-01-01 00:00:00.0000000 | Heavy Rain | 5 |
2007-01-01 00:00:00.0000000 | Flood | 2 |
2007-01-01 00:00:00.0000000 | Lightning | 2 |
2007-01-01 00:00:00.0000000 | Strong Wind | 2 |
2007-01-01 00:00:00.0000000 | Heavy Snow | 1 |
2007-01-01 00:00:00.0000000 | Rip Current | 1 |
2007-01-01 00:00:00.0000000 | Coastal Flood | 1 |
2007-01-01 00:00:00.0000000 | Tropical Storm | 1 |
- Table #3: shows all distinct events grouped by next event. For example, the second line shows that there were a total of 143 events of
Hail
that happened afterTornado
.
StartTime | next | dcount |
---|---|---|
2007-01-01 00:00:00.0000000 | 332 | |
2007-01-01 00:00:00.0000000 | Hail | 145 |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | 143 |
2007-01-01 00:00:00.0000000 | Flash Flood | 32 |
2007-01-01 00:00:00.0000000 | Funnel Cloud | 21 |
2007-01-01 00:00:00.0000000 | Lightning | 4 |
2007-01-01 00:00:00.0000000 | Heavy Rain | 2 |
2007-01-01 00:00:00.0000000 | Flood | 2 |
2007-01-01 00:00:00.0000000 | Hurricane (Typhoon) | 1 |
Now, let’s try to find out how the following sequence continues:Hail
-> Tornado
-> Thunderstorm Wind
StormEvents
| evaluate funnel_sequence(
EpisodeId,
StartTime,
datetime(2007-01-01),
datetime(2008-01-01),
1d,
365d,
EventType,
dynamic(['Hail', 'Tornado', 'Thunderstorm Wind'])
)
Skipping Table #1
and Table #2
, and looking at Table #3
, we can conclude that sequence Hail
-> Tornado
-> Thunderstorm Wind
in 92 events ended with this sequence, continued as Hail
in 41 events, and turned back to Tornado
in 14.
StartTime | next | dcount |
---|---|---|
2007-01-01 00:00:00.0000000 | 92 | |
2007-01-01 00:00:00.0000000 | Hail | 41 |
2007-01-01 00:00:00.0000000 | Tornado | 14 |
2007-01-01 00:00:00.0000000 | Flash Flood | 11 |
2007-01-01 00:00:00.0000000 | Lightning | 2 |
2007-01-01 00:00:00.0000000 | Heavy Rain | 1 |
2007-01-01 00:00:00.0000000 | Flood | 1 |
9.6.6 - funnel_sequence_completion plugin
Calculates a funnel of completed sequence steps while comparing different time periods. The plugin is invoked with the evaluate
operator.
Syntax
T | evaluate
funnel_sequence_completion(
IdColumn,
TimelineColumn,
Start,
End,
BinSize,
StateColumn,
Sequence,
MaxSequenceStepWindows)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
IdColum | string | ✔️ | The column reference representing the ID. The column must be present in T. |
TimelineColumn | string | ✔️ | The column reference representing the timeline. The column must be present in T. |
Start | datetime, timespan, or long | ✔️ | The analysis start period. |
End | datetime, timespan, or long | ✔️ | The analysis end period. |
BinSize | datetime, timespan, or long | ✔️ | The analysis window size. Each window is analyzed separately. |
StateColumn | string | ✔️ | The column reference representing the state. The column must be present in T. |
Sequence | dynamic | ✔️ | An array with the sequence values that are looked up in StateColumn . |
MaxSequenceStepPeriods | dynamic | ✔️ | An array with the values of the max allowed timespan between the first and last sequential steps in the sequence. Each period in the array generates a funnel analysis result. |
Returns
Returns a single table useful for constructing a funnel diagram for the analyzed sequence:
TimelineColumn
: the analyzed time window (bin), each bin in the analysis timeframe (Start to End) generates a funnel analysis separately.StateColumn
: the state of the sequence.Period
: the maximal period allowed for completing steps in the funnel sequence measured from the first step in the sequence. Each value in MaxSequenceStepPeriods generates a funnel analysis with a separate period.dcount
: distinct count ofIdColumn
in time window that transitioned from first sequence state to the value ofStateColumn
.
Examples
Exploring Storm Events
The following query checks the completion funnel of the sequence: Hail
-> Tornado
-> Thunderstorm Wind
in “overall” time of 1hour, 4hours, 1day.
let _start = datetime(2007-01-01);
let _end = datetime(2008-01-01);
let _windowSize = 365d;
let _sequence = dynamic(['Hail', 'Tornado', 'Thunderstorm Wind']);
let _periods = dynamic([1h, 4h, 1d]);
StormEvents
| evaluate funnel_sequence_completion(EpisodeId, StartTime, _start, _end, _windowSize, EventType, _sequence, _periods)
Output
StartTime | EventType | Period | dcount |
---|---|---|---|
2007-01-01 00:00:00.0000000 | Hail | 01:00:00 | 2877 |
2007-01-01 00:00:00.0000000 | Tornado | 01:00:00 | 208 |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | 01:00:00 | 87 |
2007-01-01 00:00:00.0000000 | Hail | 04:00:00 | 2877 |
2007-01-01 00:00:00.0000000 | Tornado | 04:00:00 | 231 |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | 04:00:00 | 141 |
2007-01-01 00:00:00.0000000 | Hail | 1.00:00:00 | 2877 |
2007-01-01 00:00:00.0000000 | Tornado | 1.00:00:00 | 244 |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | 1.00:00:00 | 155 |
Understanding the results:
The outcome is three funnels (for periods: One hour, 4 hours, and one day). For each funnel step, a number of distinct counts of are shown. You can see that the more time is given to complete the whole sequence of Hail
-> Tornado
-> Thunderstorm Wind
, the higher dcount
value is obtained. In other words, there were more occurrences of the sequence reaching the funnel step.
Related content
9.6.7 - new_activity_metrics plugin
Calculates useful activity metrics (distinct count values, distinct count of new values, retention rate, and churn rate) for the cohort of New Users
. Each cohort of New Users
(all users, which were first seen in time window) is compared to all prior cohorts.
Comparison takes into account all previous time windows. For example, for records from T2 to T3, the distinct count of users will be all users in T3 who weren’t seen in both T1 and T2.
The plugin is invoked with the evaluate
operator.
Syntax
TabularExpression | evaluate
new_activity_metrics(
IdColumn,
TimelineColumn,
Start,
End,
Window [,
Cohort] [,
dim1,
dim2,
…] [,
Lookback] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
TabularExpression | string | ✔️ | The tabular expression for which to calculate activity metrics. |
IdColumn | string | ✔️ | The name of the column with ID values that represent user activity. |
TimelineColumn | string | ✔️ | The name of the column that represents the timeline. |
Start | scalar | ✔️ | The value of the analysis start period. |
End | scalar | ✔️ | The value of the analysis end period. |
Window | scalar | ✔️ | The value of the analysis window period. Can be a numeric, datetime, or timespan value, or a string that is one of week , month or year , in which case all periods will be startofweek/startofmonth/startofyear accordingly. When using startofweek , make sure start time is a Sunday, otherwise first cohort will be empty (since startofweek is considered to be a Sunday). |
Cohort | scalar | Indicates a specific cohort. If not provided, all cohorts corresponding to the analysis time window are calculated and returned. | |
dim1, dim2, … | dynamic | An array of the dimensions columns that slice the activity metrics calculation. | |
Lookback | string | A tabular expression with a set of IDs that belong to the ’look back’ period. |
Returns
Returns a table that contains the following for each combination of ‘from’ and ’to’ timeline periods and for each existing column (dimensions) combination:
- distinct count values
- distinct count of new values
- retention rate
- churn rate
Output table schema is:
from_TimelineColumn | to_TimelineColumn | dcount_new_values | dcount_retained_values | dcount_churn_values | retention_rate | churn_rate | dim1 | .. | dim_n |
---|---|---|---|---|---|---|---|---|---|
type: as of TimelineColumn | same | long | long | double | double | double | .. | .. | .. |
from_TimelineColumn
- the cohort of new users. Metrics in this record refer to all users who were first seen in this period. The decision on first seen takes into account all previous periods in the analysis period.to_TimelineColumn
- the period being compared to.dcount_new_values
- the number of distinct users into_TimelineColumn
that weren’t seen in all periods prior to and includingfrom_TimelineColumn
.dcount_retained_values
- out of all new users, first seen infrom_TimelineColumn
, the number of distinct users that were seen into_TimelineCoumn
.dcount_churn_values
- out of all new users, first seen infrom_TimelineColumn
, the number of distinct users that weren’t seen into_TimelineCoumn
.retention_rate
- the percent ofdcount_retained_values
out of the cohort (users first seen infrom_TimelineColumn
).churn_rate
- the percent ofdcount_churn_values
out of the cohort (users first seen infrom_TimelineColumn
).
Examples
The following sample dataset shows which users seen on which days. The table was generated based on a source Users
table, as follows:
Users | summarize tostring(make_set(user)) by bin(Timestamp, 1d) | order by Timestamp asc;
Output
Timestamp | set_user |
---|---|
2019-11-01 00:00:00.0000000 | [0,2,3,4] |
2019-11-02 00:00:00.0000000 | [0,1,3,4,5] |
2019-11-03 00:00:00.0000000 | [0,2,4,5] |
2019-11-04 00:00:00.0000000 | [0,1,2,3] |
2019-11-05 00:00:00.0000000 | [0,1,2,3,4] |
The output of the plugin for the original table is the following:
let StartDate = datetime(2019-11-01 00:00:00);
let EndDate = datetime(2019-11-07 00:00:00);
Users
| evaluate new_activity_metrics(user, Timestamp, StartDate, EndDate-1tick, 1d)
| where from_Timestamp < datetime(2019-11-03 00:00:00.0000000)
Output
R | from_Timestamp | to_Timestamp | dcount_new_values | dcount_retained_values | dcount_churn_values | retention_rate | churn_rate |
---|---|---|---|---|---|---|---|
1 | 2019-11-01 00:00:00.0000000 | 2019-11-01 00:00:00.0000000 | 4 | 4 | 0 | 1 | 0 |
2 | 2019-11-01 00:00:00.0000000 | 2019-11-02 00:00:00.0000000 | 2 | 3 | 1 | 0.75 | 0.25 |
3 | 2019-11-01 00:00:00.0000000 | 2019-11-03 00:00:00.0000000 | 1 | 3 | 1 | 0.75 | 0.25 |
4 | 2019-11-01 00:00:00.0000000 | 2019-11-04 00:00:00.0000000 | 1 | 3 | 1 | 0.75 | 0.25 |
5 | 2019-11-01 00:00:00.0000000 | 2019-11-05 00:00:00.0000000 | 1 | 4 | 0 | 1 | 0 |
6 | 2019-11-01 00:00:00.0000000 | 2019-11-06 00:00:00.0000000 | 0 | 0 | 4 | 0 | 1 |
7 | 2019-11-02 00:00:00.0000000 | 2019-11-02 00:00:00.0000000 | 2 | 2 | 0 | 1 | 0 |
8 | 2019-11-02 00:00:00.0000000 | 2019-11-03 00:00:00.0000000 | 0 | 1 | 1 | 0.5 | 0.5 |
9 | 2019-11-02 00:00:00.0000000 | 2019-11-04 00:00:00.0000000 | 0 | 1 | 1 | 0.5 | 0.5 |
10 | 2019-11-02 00:00:00.0000000 | 2019-11-05 00:00:00.0000000 | 0 | 1 | 1 | 0.5 | 0.5 |
11 | 2019-11-02 00:00:00.0000000 | 2019-11-06 00:00:00.0000000 | 0 | 0 | 2 | 0 | 1 |
Following is an analysis of a few records from the output:
Record
R=3
,from_TimelineColumn
=2019-11-01
,to_TimelineColumn
=2019-11-03
:- The users considered for this record are all new users seen on 11/1. Since this is the first period, these are all users in that bin – [0,2,3,4]
dcount_new_values
– the number of users on 11/3 who weren’t seen on 11/1. This includes a single user –5
.dcount_retained_values
– out of all new users on 11/1, how many were retained until 11/3? There are three values ([0,2,4]
), whilecount_churn_values
is one (user=3
).retention_rate
= 0.75 – the three retained users out of the four new users who were first seen in 11/1.
Record
R=9
,from_TimelineColumn
=2019-11-02
,to_TimelineColumn
=2019-11-04
:- This record focuses on the new users who were first seen on 11/2 – users
1
and5
. dcount_new_values
– the number of users on 11/4 who weren’t seen through all periodsT0 .. from_Timestamp
. Meaning, users who are seen on 11/4 but who weren’t seen on either 11/1 or 11/2 – there are no such users.dcount_retained_values
– out of all new users on 11/2 ([1,5]
), how many were retained until 11/4? There’s one such user ([1]
),while count_churn_values
is one (user5
).retention_rate
is 0.5 – the single user that was retained on 11/4 out of the two new ones on 11/2.
- This record focuses on the new users who were first seen on 11/2 – users
Weekly retention rate, and churn rate (single week)
The next query calculates a retention and churn rate for week-over-week window for New Users
cohort (users that arrived on the first week).
// Generate random data of user activities
let _start = datetime(2017-05-01);
let _end = datetime(2017-05-31);
range Day from _start to _end step 1d
| extend d = tolong((Day - _start) / 1d)
| extend r = rand() + 1
| extend _users=range(tolong(d * 50 * r), tolong(d * 50 * r + 200 * r - 1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
// Take only the first week cohort (last parameter)
| evaluate new_activity_metrics(['id'], Day, _start, _end, 7d, _start)
| project from_Day, to_Day, retention_rate, churn_rate
Output
from_Day | to_Day | retention_rate | churn_rate |
---|---|---|---|
2017-05-01 00:00:00.0000000 | 2017-05-01 00:00:00.0000000 | 1 | 0 |
2017-05-01 00:00:00.0000000 | 2017-05-08 00:00:00.0000000 | 0.544632768361582 | 0.455367231638418 |
2017-05-01 00:00:00.0000000 | 2017-05-15 00:00:00.0000000 | 0.031638418079096 | 0.968361581920904 |
2017-05-01 00:00:00.0000000 | 2017-05-22 00:00:00.0000000 | 0 | 1 |
2017-05-01 00:00:00.0000000 | 2017-05-29 00:00:00.0000000 | 0 | 1 |
Weekly retention rate, and churn rate (complete matrix)
The next query calculates retention and churn rate for week-over-week window for New Users
cohort. If the previous example calculated the statistics for a single week - the following query produces an NxN table for each from/to combination.
// Generate random data of user activities
let _start = datetime(2017-05-01);
let _end = datetime(2017-05-31);
range Day from _start to _end step 1d
| extend d = tolong((Day - _start) / 1d)
| extend r = rand() + 1
| extend _users=range(tolong(d * 50 * r), tolong(d * 50 * r + 200 * r - 1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
// Last parameter is omitted -
| evaluate new_activity_metrics(['id'], Day, _start, _end, 7d)
| project from_Day, to_Day, retention_rate, churn_rate
Output
from_Day | to_Day | retention_rate | churn_rate |
---|---|---|---|
2017-05-01 00:00:00.0000000 | 2017-05-01 00:00:00.0000000 | 1 | 0 |
2017-05-01 00:00:00.0000000 | 2017-05-08 00:00:00.0000000 | 0.190397350993377 | 0.809602649006622 |
2017-05-01 00:00:00.0000000 | 2017-05-15 00:00:00.0000000 | 0 | 1 |
2017-05-01 00:00:00.0000000 | 2017-05-22 00:00:00.0000000 | 0 | 1 |
2017-05-01 00:00:00.0000000 | 2017-05-29 00:00:00.0000000 | 0 | 1 |
2017-05-08 00:00:00.0000000 | 2017-05-08 00:00:00.0000000 | 1 | 0 |
2017-05-08 00:00:00.0000000 | 2017-05-15 00:00:00.0000000 | 0.405263157894737 | 0.594736842105263 |
2017-05-08 00:00:00.0000000 | 2017-05-22 00:00:00.0000000 | 0.227631578947368 | 0.772368421052632 |
2017-05-08 00:00:00.0000000 | 2017-05-29 00:00:00.0000000 | 0 | 1 |
2017-05-15 00:00:00.0000000 | 2017-05-15 00:00:00.0000000 | 1 | 0 |
2017-05-15 00:00:00.0000000 | 2017-05-22 00:00:00.0000000 | 0.785488958990536 | 0.214511041009464 |
2017-05-15 00:00:00.0000000 | 2017-05-29 00:00:00.0000000 | 0.237644584647739 | 0.762355415352261 |
2017-05-22 00:00:00.0000000 | 2017-05-22 00:00:00.0000000 | 1 | 0 |
2017-05-22 00:00:00.0000000 | 2017-05-29 00:00:00.0000000 | 0.621835443037975 | 0.378164556962025 |
2017-05-29 00:00:00.0000000 | 2017-05-29 00:00:00.0000000 | 1 | 0 |
Weekly retention rate with lookback period
The following query calculates the retention rate of New Users
cohort when taking into
consideration lookback
period: a tabular query with set of Ids that are used to define
the New Users
cohort (all IDs that don’t appear in this set are New Users
). The
query examines the retention behavior of the New Users
during the analysis period.
// Generate random data of user activities
let _lookback = datetime(2017-02-01);
let _start = datetime(2017-05-01);
let _end = datetime(2017-05-31);
let _data = range Day from _lookback to _end step 1d
| extend d = tolong((Day - _lookback) / 1d)
| extend r = rand() + 1
| extend _users=range(tolong(d * 50 * r), tolong(d * 50 * r + 200 * r - 1), 1)
| mv-expand id=_users to typeof(long) limit 1000000;
//
let lookback_data = _data | where Day < _start | project Day, id;
_data
| evaluate new_activity_metrics(id, Day, _start, _end, 7d, _start, lookback_data)
| project from_Day, to_Day, retention_rate
Output
from_Day | to_Day | retention_rate |
---|---|---|
2017-05-01 00:00:00.0000000 | 2017-05-01 00:00:00.0000000 | 1 |
2017-05-01 00:00:00.0000000 | 2017-05-08 00:00:00.0000000 | 0.404081632653061 |
2017-05-01 00:00:00.0000000 | 2017-05-15 00:00:00.0000000 | 0.257142857142857 |
2017-05-01 00:00:00.0000000 | 2017-05-22 00:00:00.0000000 | 0.296326530612245 |
2017-05-01 00:00:00.0000000 | 2017-05-29 00:00:00.0000000 | 0.0587755102040816 |
9.6.8 - rolling_percentile plugin
Returns an estimate for the specified percentile of the ValueColumn population in a rolling (sliding) BinsPerWindow size window per BinSize.
The plugin is invoked with the evaluate
operator.
Syntax
T | evaluate
rolling_percentile(
ValueColumn,
Percentile,
IndexColumn,
BinSize,
BinsPerWindow [,
dim1,
dim2,
…] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
ValueColumn | string | ✔️ | The name of the column used to calculate the percentiles. |
Percentile | int, long, or real | ✔️ | Scalar with the percentile to calculate. |
IndexColumn | string | ✔️ | The name of the column over which to run the rolling window. |
BinSize | int, long, real, datetime, or timespan | ✔️ | Scalar with size of the bins to apply over the IndexColumn. |
BinsPerWindow | int | ✔️ | The number of bins included in each window. |
dim1, dim2, … | string | A list of the dimensions columns to slice by. |
Returns
Returns a table with a row per each bin (and combination of dimensions if specified) that has the rolling percentile of values in the window ending at the bin (inclusive). Output table schema is:
IndexColumn | dim1 | … | dim_n | rolling_BinsPerWindow_percentile_ValueColumn_Pct |
---|
Examples
Rolling 3-day median value per day
The next query calculates a 3-day median value in daily granularity. Each row in the output represents the median value for the last 3 bins (days), including the bin itself.
let T =
range idx from 0 to 24 * 10 - 1 step 1
| project Timestamp = datetime(2018-01-01) + 1h * idx, val=idx + 1
| extend EvenOrOdd = iff(val % 2 == 0, "Even", "Odd");
T
| evaluate rolling_percentile(val, 50, Timestamp, 1d, 3)
Output
Timestamp | rolling_3_percentile_val_50 |
---|---|
2018-01-01 00:00:00.0000000 | 12 |
2018-01-02 00:00:00.0000000 | 24 |
2018-01-03 00:00:00.0000000 | 36 |
2018-01-04 00:00:00.0000000 | 60 |
2018-01-05 00:00:00.0000000 | 84 |
2018-01-06 00:00:00.0000000 | 108 |
2018-01-07 00:00:00.0000000 | 132 |
2018-01-08 00:00:00.0000000 | 156 |
2018-01-09 00:00:00.0000000 | 180 |
2018-01-10 00:00:00.0000000 | 204 |
Rolling 3-day median value per day by dimension
Same example from above, but now also calculates the rolling window partitioned for each value of the dimension.
let T =
range idx from 0 to 24 * 10 - 1 step 1
| project Timestamp = datetime(2018-01-01) + 1h * idx, val=idx + 1
| extend EvenOrOdd = iff(val % 2 == 0, "Even", "Odd");
T
| evaluate rolling_percentile(val, 50, Timestamp, 1d, 3, EvenOrOdd)
Output
Timestamp | EvenOrOdd | rolling_3_percentile_val_50 |
---|---|---|
2018-01-01 00:00:00.0000000 | Even | 12 |
2018-01-02 00:00:00.0000000 | Even | 24 |
2018-01-03 00:00:00.0000000 | Even | 36 |
2018-01-04 00:00:00.0000000 | Even | 60 |
2018-01-05 00:00:00.0000000 | Even | 84 |
2018-01-06 00:00:00.0000000 | Even | 108 |
2018-01-07 00:00:00.0000000 | Even | 132 |
2018-01-08 00:00:00.0000000 | Even | 156 |
2018-01-09 00:00:00.0000000 | Even | 180 |
2018-01-10 00:00:00.0000000 | Even | 204 |
2018-01-01 00:00:00.0000000 | Odd | 11 |
2018-01-02 00:00:00.0000000 | Odd | 23 |
2018-01-03 00:00:00.0000000 | Odd | 35 |
2018-01-04 00:00:00.0000000 | Odd | 59 |
2018-01-05 00:00:00.0000000 | Odd | 83 |
2018-01-06 00:00:00.0000000 | Odd | 107 |
2018-01-07 00:00:00.0000000 | Odd | 131 |
2018-01-08 00:00:00.0000000 | Odd | 155 |
2018-01-09 00:00:00.0000000 | Odd | 179 |
2018-01-10 00:00:00.0000000 | Odd | 203 |
9.6.9 - rows_near plugin
Finds rows near a specified condition.
The plugin is invoked with the evaluate
operator.
Syntax
T | evaluate
rows_near(
Condition,
NumRows,
[,
RowsAfter ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
Condition | bool | ✔️ | Represents the condition to find rows around. |
NumRows | int | ✔️ | The number of rows to find before and after the condition. |
RowsAfter | int | When specified, overrides the number of rows to find after the condition. |
Returns
Every row from the input that is within NumRows from a true
Condition,
When RowsAfter is specified, returns every row from the input that is NumRows before or RowsAfter after a true
Condition.
Example
Find rows with an "Error"
State, and returns 2
rows before and after the "Error"
record.
datatable (Timestamp:datetime, Value:long, State:string )
[
datetime(2021-06-01), 1, "Success",
datetime(2021-06-02), 4, "Success",
datetime(2021-06-03), 3, "Success",
datetime(2021-06-04), 11, "Success",
datetime(2021-06-05), 15, "Success",
datetime(2021-06-06), 2, "Success",
datetime(2021-06-07), 19, "Error",
datetime(2021-06-08), 12, "Success",
datetime(2021-06-09), 7, "Success",
datetime(2021-06-10), 9, "Success",
datetime(2021-06-11), 4, "Success",
datetime(2021-06-12), 1, "Success",
]
| sort by Timestamp asc
| evaluate rows_near(State == "Error", 2)
Output
Timestamp | Value | State |
---|---|---|
2021-06-05 00:00:00.0000000 | 15 | Success |
2021-06-06 00:00:00.0000000 | 2 | Success |
2021-06-07 00:00:00.0000000 | 19 | Error |
2021-06-08 00:00:00.0000000 | 12 | Success |
2021-06-09 00:00:00.0000000 | 7 | Success |
9.6.10 - sequence_detect plugin
Detects sequence occurrences based on provided predicates. The plugin is invoked with the evaluate
operator.
Syntax
T | evaluate
sequence_detect
(
TimelineColumn,
MaxSequenceStepWindow,
MaxSequenceSpan,
Expr1,
Expr2,
…, Dim1,
Dim2,
…)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
TimelineColumn | string | ✔️ | The column reference representing timeline, must be present in the source expression. |
MaxSequenceStepWindow | timespan | ✔️ | The value of the max allowed timespan between 2 sequential steps in the sequence. |
MaxSequenceSpan | timespan | ✔️ | The max timespan for the sequence to complete all steps. |
Expr1, Expr2, … | string | ✔️ | The boolean predicate expressions defining sequence steps. |
Dim1, Dim2, … | string | ✔️ | The dimension expressions that are used to correlate sequences. |
Returns
Returns a single table where each row in the table represents a single sequence occurrence:
- Dim1, Dim2, …: dimension columns that were used to correlate sequences.
- Expr1TimelineColumn, Expr2TimelineColumn, …: Columns with time values, representing the timeline of each sequence step.
- Duration: the overall sequence time window
Examples
The following query looks at the table T to search for relevant data from a specified time period.
T | evaluate sequence_detect(datetime_column, 10m, 1h, e1 = (Col1 == 'Val'), e2 = (Col2 == 'Val2'), Dim1, Dim2)
Exploring Storm Events
The following query looks on the table StormEvents (weather statistics for 2007) and shows cases where sequence of ‘Excessive Heat’ was followed by ‘Wildfire’ within 5 days.
StormEvents
| evaluate sequence_detect(
StartTime,
5d, // step max-time
5d, // sequence max-time
heat=(EventType == "Excessive Heat"),
wildfire=(EventType == 'Wildfire'),
State
)
Output
State | heat_StartTime | wildfire_StartTime | Duration |
---|---|---|---|
CALIFORNIA | 2007-05-08 00:00:00.0000000 | 2007-05-08 16:02:00.0000000 | 16:02:00 |
CALIFORNIA | 2007-05-08 00:00:00.0000000 | 2007-05-10 11:30:00.0000000 | 2.11:30:00 |
CALIFORNIA | 2007-07-04 09:00:00.0000000 | 2007-07-05 23:01:00.0000000 | 1.14:01:00 |
SOUTH DAKOTA | 2007-07-23 12:00:00.0000000 | 2007-07-27 09:00:00.0000000 | 3.21:00:00 |
TEXAS | 2007-08-10 08:00:00.0000000 | 2007-08-11 13:56:00.0000000 | 1.05:56:00 |
CALIFORNIA | 2007-08-31 08:00:00.0000000 | 2007-09-01 11:28:00.0000000 | 1.03:28:00 |
CALIFORNIA | 2007-08-31 08:00:00.0000000 | 2007-09-02 13:30:00.0000000 | 2.05:30:00 |
CALIFORNIA | 2007-09-02 12:00:00.0000000 | 2007-09-02 13:30:00.0000000 | 01:30:00 |
9.6.11 - session_count plugin
Calculates the session count based on the ID column over a timeline. The plugin is invoked with the evaluate
operator.
Syntax
TabularExpression | evaluate
session_count(
IdColumn,
TimelineColumn,
Start,
End,
Bin,
LookBackWindow [,
dim1,
dim2,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
TabularExpression | string | ✔️ | The tabular expression that serves as input. |
IdColumn | string | ✔️ | The name of the column with ID values that represents user activity. |
TimelineColumn | string | ✔️ | The name of the column that represents the timeline. |
Start | scalar | ✔️ | The start of the analysis period. |
End | scalar | ✔️ | The end of the analysis period. |
Bin | scalar | ✔️ | The session’s analysis step period. |
LookBackWindow | scalar | ✔️ | The session lookback period. If the ID from IdColumn appears in a time window within LookBackWindow , the session is considered to be an existing one. If the ID doesn’t appear, then the session is considered to be new. |
dim1, dim2, … | string | A list of the dimensions columns that slice the session count calculation. |
Returns
Returns a table that has the session count values for each timeline period and for each existing dimensions combination.
Output table schema is:
TimelineColumn | dim1 | .. | dim_n | count_sessions | ||||||
---|---|---|---|---|---|---|---|---|---|---|
type: as of TimelineColumn | .. | .. | .. | long |
Examples
For this example, the data is deterministic, and we use a table with two columns:
Timeline
: a running number from 1 to 10,000Id
: ID of the user from 1 to 50
Id
appears at the specific Timeline
slot if it’s a divider of Timeline
(Timeline % Id == 0).
An event with Id==1
will appear at any Timeline
slot, an event with Id==2
at every second Timeline
slot, and so on.
Here are 20 lines of the data:
let _data = range Timeline from 1 to 10000 step 1
| extend __key = 1
| join kind=inner (range Id from 1 to 50 step 1 | extend __key=1) on __key
| where Timeline % Id == 0
| project Timeline, Id;
// Look on few lines of the data
_data
| order by Timeline asc, Id asc
| take 20
Output
Timeline | Id |
---|---|
1 | 1 |
2 | 1 |
2 | 2 |
3 | 1 |
3 | 3 |
4 | 1 |
4 | 2 |
4 | 4 |
5 | 1 |
5 | 5 |
6 | 1 |
6 | 2 |
6 | 3 |
6 | 6 |
7 | 1 |
7 | 7 |
8 | 1 |
8 | 2 |
8 | 4 |
8 | 8 |
Let’s define a session in next terms: session considered to be active as long as user (Id
) appears at least once at a timeframe of 100 time slots, while session look-back window is 41 time slots.
The next query shows the count of active sessions according to the above definition.
let _data = range Timeline from 1 to 9999 step 1
| extend __key = 1
| join kind=inner (range Id from 1 to 50 step 1 | extend __key=1) on __key
| where Timeline % Id == 0
| project Timeline, Id;
// End of data definition
_data
| evaluate session_count(Id, Timeline, 1, 10000, 100, 41)
| render linechart
9.6.12 - sliding_window_counts plugin
Calculates counts and distinct count of values in a sliding window over a lookback period, using the technique described in the Perform aggregations over a sliding window example. The plugin is invoked with the evaluate
operator.
Syntax
T | evaluate
sliding_window_counts(
IdColumn,
TimelineColumn,
Start,
End,
LookbackWindow,
Bin ,
[dim1,
dim2,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
IdColumn | string | ✔️ | The name of the column with ID values that represent user activity. |
TimelineColumn | string | ✔️ | The name of the column representing the timeline. |
Start | int, long, real, datetime, or timespan | ✔️ | The analysis start period. |
End | int, long, real, datetime, or timespan | ✔️ | The analysis end period. |
LookbackWindow | int, long, real, datetime, or timespan | ✔️ | The lookback period. This value should be a multiple of the Bin value, otherwise the LookbackWindow will be rounded down to a multiple of the Bin value. For example, for dcount users in past 7d : LookbackWindow = 7d . |
Bin | int, long, real, datetime, timespan, or string | ✔️ | The analysis step period. The possible string values are week , month , and year for which all periods will be startofweek, startofmonth, startofyear respectively. |
dim1, dim2, … | string | A list of the dimensions columns that slice the activity metrics calculation. |
Returns
Returns a table that has the count and distinct count values of Ids in the lookback period, for each timeline period (by bin) and for each existing dimensions combination.
Output table schema is:
TimelineColumn | dim1 | .. | dim_n | count | dcount |
---|---|---|---|---|---|
type: as of TimelineColumn | .. | .. | .. | long | long |
Example
Calculate counts and dcounts
for users in past week, for each day in the analysis period.
let start = datetime(2017 - 08 - 01);
let end = datetime(2017 - 08 - 07);
let lookbackWindow = 3d;
let bin = 1d;
let T = datatable(UserId: string, Timestamp: datetime)
[
'Bob', datetime(2017 - 08 - 01),
'David', datetime(2017 - 08 - 01),
'David', datetime(2017 - 08 - 01),
'John', datetime(2017 - 08 - 01),
'Bob', datetime(2017 - 08 - 01),
'Ananda', datetime(2017 - 08 - 02),
'Atul', datetime(2017 - 08 - 02),
'John', datetime(2017 - 08 - 02),
'Ananda', datetime(2017 - 08 - 03),
'Atul', datetime(2017 - 08 - 03),
'Atul', datetime(2017 - 08 - 03),
'John', datetime(2017 - 08 - 03),
'Bob', datetime(2017 - 08 - 03),
'Betsy', datetime(2017 - 08 - 04),
'Bob', datetime(2017 - 08 - 05),
];
T
| evaluate sliding_window_counts(UserId, Timestamp, start, end, lookbackWindow, bin)
Output
Timestamp | Count | dcount |
---|---|---|
2017-08-01 00:00:00.0000000 | 5 | 3 |
2017-08-02 00:00:00.0000000 | 8 | 5 |
2017-08-03 00:00:00.0000000 | 13 | 5 |
2017-08-04 00:00:00.0000000 | 9 | 5 |
2017-08-05 00:00:00.0000000 | 7 | 5 |
2017-08-06 00:00:00.0000000 | 2 | 2 |
2017-08-07 00:00:00.0000000 | 1 | 1 |
9.6.13 - User Analytics
This section describes Kusto extensions (plugins) for user analytics scenarios.
Scenario | Plugin | Details | User Experience |
---|---|---|---|
Counting new users over time | activity_counts_metrics | Returns counts/dcounts/new counts for each time window. Each time window is compared to all previous time windows | Kusto.Explorer: Report Gallery |
Period-over-period: retention/churn rate and new users | activity_metrics | Returns dcount , retention/churn rate for each time window. Each time window is compared to previous time window | Kusto.Explorer: Report Gallery |
Users count and dcount over sliding window | sliding_window_counts | For each time window, returns count and dcount over a lookback period, in a sliding window manner | |
New-users cohort: retention/churn rate and new users | new_activity_metrics | Compares between cohorts of new users (all users that were first seen in time window). Each cohort is compared to all prior cohorts. Comparison takes into account all previous time windows | Kusto.Explorer: Report Gallery |
Active Users: distinct counts | active_users_count | Returns distinct users for each time window. A user is only considered if it appears in at least X distinct periods in a specified lookback period. | |
User Engagement: DAU/WAU/MAU | activity_engagement | Compares between an inner time window (for example, daily) and an outer (for example, weekly) for computing engagement (for example, DAU/WAU) | Kusto.Explorer: Report Gallery |
Sessions: count active sessions | session_count | Counts sessions, where a session is defined by a time period - a user record is considered a new session, if it hasn’t been seen in the lookback period from current record | |
Funnels: previous and next state sequence analysis | funnel_sequence | Counts distinct users who have taken a sequence of events, and the previous or next events that led or were followed by the sequence. Useful for constructing sankey diagrams | |
Funnels: sequence completion analysis | funnel_sequence_completion | Computes the distinct count of users that have completed a specified sequence in each time window | |
||||
10 - Query statements
10.1 - Alias statement
Alias statements allow you to define an alias for a database, which can be used in the same query.
The alias
statement is useful as a shorthand name for a database so it can be referenced using that alias in the same query.
Syntax
alias
database DatabaseAliasName =
cluster(“QueryURI”).database("DatabaseName")
Parameters
Name | Type | Required | Description |
---|---|---|---|
DatabaseAliasName | string | ✔️ | An existing name or new database alias name. You can escape the name with brackets. For example, [“Name with spaces”]. |
QueryURI | string | ✔️ | The URI that can be used to run queries or management commands. |
DatabaseName | string | ✔️ | The name of the database to give an alias. |
Examples
First, count the number of records in that table.
StormEvents
| count
Output
Count |
---|
59066 |
Then, give an alias to the Samples
database and use that name to check the record count of the StormEvents
table.
alias database samplesAlias = cluster("https://help.kusto.windows.net").database("Samples");
database("samplesAlias").StormEvents | count
Output
Count |
---|
59066 |
Create an alias name that contains spaces using the bracket syntax.
alias database ["Samples Database Alias"] = cluster("https://help.kusto.windows.net").database("Samples");
database("Samples Database Alias").StormEvents | count
Output
Count |
---|
59066 |
10.2 - Batches
A query can include multiple tabular expression statements, as long as they’re delimited by a semicolon (;
) character. The query then returns multiple tabular results. Results are produced by the tabular expression statements and ordered according to the order of the statements in the query text.
Examples
The following examples show how to create multiple tables simultaneously.
Name tabular results
The following query produces two tabular results. User agent tools can then display those results with the appropriate name associated with each (Count of events in Florida
and Count of events in Guam
, respectively).
StormEvents | where State == "FLORIDA" | count | as ['Count of events in Florida'];
StormEvents | where State == "GUAM" | count | as ['Count of events in Guam']
Output
Count of events in Florida
Count |
---|
1042 |
Count of events in Guam
Count |
---|
4 |
Share a calculation
Batching is useful for scenarios where a common calculation is shared by multiple subqueries, such as for dashboards. If the common calculation is complex, use the materialize() function and construct the query so that it will be executed only once.
let m = materialize(StormEvents | summarize n=count() by State);
m | where n > 2000;
m | where n < 10
Output
Table 1
State | n |
---|---|
ILLINOIS | 2022 |
IOWA | 2337 |
KANSAS | 3166 |
MISSOURI | 2016 |
TEXAS | 4701 |
Table 2
State | n |
---|---|
GUAM | 2022 |
GULF OF ALASKA | 2337 |
HAWAII WATERS | 3166 |
LAKE ONTARIO | 2016 |
10.3 - Let statement
A let
statement is used to set a variable name equal to an expression or a function, or to create views.
let
statements are useful for:
- Breaking up a complex expression into multiple parts, each represented by a variable.
- Defining constants outside of the query body for readability.
- Defining a variable once and using it multiple times within a query.
If the variable previously represented another value, for example in nested statements, the innermost let
statement applies.
To optimize multiple uses of the let
statement within a single query, see Optimize queries that use named expressions.
Syntax: Scalar or tabular expressions
let
Name =
Expression
Parameters
Name | Type | Required | Description |
---|---|---|---|
Name | string | ✔️ | The variable name. You can escape the name with brackets. For example, ["Name with spaces"] . |
Expression | string | ✔️ | An expression with a scalar or tabular result. For example, an expression with a scalar result would be let one=1; , and an expression with a tabular result would be `let RecentLog = Logs |
Syntax: View or function
let
Name =
[view
] (
[ Parameters ])
{
FunctionBody }
Parameters
Name | Type | Required | Description |
---|---|---|---|
FunctionBody | string | ✔️ | An expression that yields a user defined function. |
view | string | Only relevant for a parameter-less let statement. When used, the let statement is included in queries with a union operator with wildcard selection of the tables/views. For an example, see Create a view or virtual table. | |
Parameters | string | Zero or more comma-separated tabular or scalar function parameters. For each parameter of tabular type, the parameter should be in the format TableName : TableSchema, in which TableSchema is either a comma-separated list of columns in the format ColumnName: ColumnType or a wildcard (* ). If columns are specified, then the input tabular argument must contain these columns. If a wildcard is specified, then the input tabular argument can have any schema. To reference columns in the function body, they must be specified. For examples, see Tabular argument with schema and Tabular argument with wildcard.For each parameter of scalar type, provide the parameter name and parameter type in the format Name : Type. The name can appear in the FunctionBody and is bound to a particular value when the user defined function is invoked. The only supported types are bool , string , long , datetime , timespan , real , dynamic , and the aliases to these types. |
Examples
The examples in this section show how to use the syntax to help you get started.
The query examples show the syntax and example usage of the operator, statement, or function.
Define scalar values
The following example uses a scalar expression statement.
let n = 10; // number
let place = "Dallas"; // string
let cutoff = ago(62d); // datetime
Events
| where timestamp > cutoff
and city == place
| take n
The following example binds the name some number
using the ['name']
notation, and then uses it in a tabular expression statement.
let ['some number'] = 20;
range y from 0 to ['some number'] step 5
Output
y |
---|
0 |
5 |
10 |
15 |
20 |
Create a user defined function with scalar calculation
This example uses the let statement with arguments for scalar calculation. The query defines function MultiplyByN
for multiplying two numbers.
let MultiplyByN = (val:long, n:long) { val * n };
range x from 1 to 5 step 1
| extend result = MultiplyByN(x, 5)
Output
x | result |
---|---|
1 | 5 |
2 | 10 |
3 | 15 |
4 | 20 |
5 | 25 |
Create a user defined function that trims input
The following example removes leading and trailing ones from the input.
let TrimOnes = (s:string) { trim("1", s) };
range x from 10 to 15 step 1
| extend result = TrimOnes(tostring(x))
Output
x | result |
---|---|
10 | 0 |
11 | |
12 | 2 |
13 | 3 |
14 | 4 |
15 | 5 |
Use multiple let statements
This example defines two let statements where one statement (foo2
) uses another (foo1
).
let foo1 = (_start:long, _end:long, _step:long) { range x from _start to _end step _step};
let foo2 = (_step:long) { foo1(1, 100, _step)};
foo2(2) | count
Output
result |
---|
50 |
Create a view or virtual table
This example shows you how to use a let statement to create a view
or virtual table.
let Range10 = view () { range MyColumn from 1 to 10 step 1 };
let Range20 = view () { range MyColumn from 1 to 20 step 1 };
search MyColumn == 5
Output
$table | MyColumn |
---|---|
Range10 | 5 |
Range20 | 5 |
Use a materialize function
The materialize()
function lets you cache subquery results during the time of query execution. When you use the materialize()
function, the data is cached, and any subsequent invocation of the result uses cached data.
let totalPagesPerDay = PageViews
| summarize by Page, Day = startofday(Timestamp)
| summarize count() by Day;
let materializedScope = PageViews
| summarize by Page, Day = startofday(Timestamp);
let cachedResult = materialize(materializedScope);
cachedResult
| project Page, Day1 = Day
| join kind = inner
(
cachedResult
| project Page, Day2 = Day
)
on Page
| where Day2 > Day1
| summarize count() by Day1, Day2
| join kind = inner
totalPagesPerDay
on $left.Day1 == $right.Day
| project Day1, Day2, Percentage = count_*100.0/count_1
Output
Day1 | Day2 | Percentage |
---|---|---|
2016-05-01 00:00:00.0000000 | 2016-05-02 00:00:00.0000000 | 34.0645725975255 |
2016-05-01 00:00:00.0000000 | 2016-05-03 00:00:00.0000000 | 16.618368960101 |
2016-05-02 00:00:00.0000000 | 2016-05-03 00:00:00.0000000 | 14.6291376489636 |
Using nested let statements
Nested let statements are permitted, including within a user defined function expression. Let statements and arguments apply in both the current and inner scope of the function body.
let start_time = ago(5h);
let end_time = start_time + 2h;
T | where Time > start_time and Time < end_time | ...
Tabular argument with schema
The following example specifies that the table parameter T
must have a column State
of type string
. The table T
may include other columns as well, but they can’t be referenced in the function StateState
because the aren’t declared.
let StateState=(T: (State: string)) { T | extend s_s=strcat(State, State) };
StormEvents
| invoke StateState()
| project State, s_s
Output
State | s_s |
---|---|
ATLANTIC SOUTH | ATLANTIC SOUTHATLANTIC SOUTH |
FLORIDA | FLORIDAFLORIDA |
FLORIDA | FLORIDAFLORIDA |
GEORGIA | GEORGIAGEORGIA |
MISSISSIPPI | MISSISSIPPIMISSISSIPPI |
… | … |
Tabular argument with wildcard
The table parameter T
can have any schema, and the function CountRecordsInTable
will work.
let CountRecordsInTable=(T: (*)) { T | count };
StormEvents | invoke CountRecordsInTable()
Output
Count |
---|
59,066 |
10.4 - Pattern statement
A pattern is a construct that maps string tuples to tabular expressions.
Each pattern must declare a pattern name and optionally define a pattern mapping. Patterns that define a mapping return a tabular expression when invoked. Separate any two statements by a semicolon.
Empty patterns are patterns that are declared but don’t define a mapping. When invoked, they return error SEM0036 along with the details of the missing pattern definitions in the HTTP header.
Middle-tier applications that provide a Kusto Query Language (KQL) experience can use the returned details as part of their process to enrich KQL query results. For more information, see Working with middle-tier applications.
Syntax
Declare an empty pattern:
declare
pattern
PatternName;
Declare and define a pattern:
declare
pattern
PatternName =(
ArgName:
ArgType [,
… ])
[[
PathName:
PathArgType]
]{
(
ArgValue1_1 [,
ArgValue2_1,
… ])
[.[
PathValue_1]
]=
{
expression1}
;
[
(
ArgValue1_2 [,
ArgValue2_2,
… ])
[.[
PathValue_2]
]=
{
expression2}
;
… ]}
;
Invoke a pattern:
- PatternName
(
ArgValue1 [,
ArgValue2 …]).
PathValue - PatternName
(
ArgValue1 [,
ArgValue2 …]).["
PathValue"]
- PatternName
Parameters
Name | Type | Required | Description |
---|---|---|---|
PatternName | string | ✔️ | The name of the pattern. |
ArgName | string | ✔️ | The name of the argument. Patterns can have one or more arguments. |
ArgType | string | ✔️ | The scalar data type of the ArgName argument. Possible values: string |
PathName | string | The name of the path argument. Patterns can have no path or one path. | |
PathArgType | string | The type of the PathArgType argument. Possible values: string | |
ArgValue | string | ✔️ | The ArgName and optional PathName tuple values to be mapped to an expression. |
PathValue | string | The value to map for PathName. | |
expression | string | ✔️ | A tabular or lambda expression that references a function returning tabular data. For example: `Logs |
Examples
The examples in this section show how to use the syntax to help you get started.
Define a simple pattern
This example defines a pattern that maps states to an expression that returns its capital/major city.
declare pattern country = (name:string)[state:string]
{
("USA").["New York"] = { print Capital = "Albany" };
("USA").["Washington"] = { print Capital = "Olympia" };
("Canada").["Alberta"] = { print Capital = "Edmonton" };
};
country("Canada").Alberta
Output
Capital |
---|
Edmonton |
Define a scoped pattern
This example defines a pattern to scope data and metrics of application data. The pattern is invoked to return a union of the data.
declare pattern App = (applicationId:string)[scope:string]
{
('a1').['Data'] = { range x from 1 to 5 step 1 | project App = "App #1", Data = x };
('a1').['Metrics'] = { range x from 1 to 5 step 1 | project App = "App #1", Metrics = rand() };
('a2').['Data'] = { range x from 1 to 5 step 1 | project App = "App #2", Data = 10 - x };
('a3').['Metrics'] = { range x from 1 to 5 step 1 | project App = "App #3", Metrics = rand() };
};
union App('a2').Data, App('a1').Metrics
Output
App | Data | Metrics |
---|---|---|
App #2 | 9 | |
App #2 | 8 | |
App #2 | 7 | |
App #2 | 6 | |
App #2 | 5 | |
App #1 | 0.53674122855537532 | |
App #1 | 0.78304713305654439 | |
App #1 | 0.20168860732346555 | |
App #1 | 0.13249123867679469 | |
App #1 | 0.19388305330563443 |
Normalization
There are syntax variations for invoking patterns. For example, the following union returns a single pattern expression since all the invocations are of the same pattern.
declare pattern app = (applicationId:string)[eventType:string]
{
("ApplicationX").["StopEvents"] = { database("AppX").Events | where EventType == "StopEvent" };
("ApplicationX").["StartEvents"] = { database("AppX").Events | where EventType == "StartEvent" };
};
union
app("ApplicationX").StartEvents,
app('ApplicationX').StartEvents,
app("ApplicationX").['StartEvents'],
app("ApplicationX").["StartEvents"]
No wildcards
There’s no special treatment given to wildcards in a pattern. For example, the following query returns a single missing pattern invocation.
declare pattern app = (applicationId:string)[eventType:string]
{
("ApplicationX").["StopEvents"] = { database("AppX").Events | where EventType == "StopEvent" };
("ApplicationX").["StartEvents"] = { database("AppX").Events | where EventType == "StartEvent" };
};
union app("ApplicationX").["*"]
| count
Output semantic error
Work with middle-tier applications
A middle-tier application provides its users with the ability to use KQL and wants to enhance the experience by enriching the query results with augmented data from its internal service.
To this end, the application provides users with a pattern statement that returns tabular data that their users can use in their queries. The pattern’s arguments are the keys the application will use to retrieve the enrichment data.
When the user runs the query, the application doesn’t parse the query itself but instead uses the error returned by an empty pattern to retrieve the keys it requires. So it prepends the query with the empty pattern declaration, sends it to the cluster for processing, and then parses the returned HTTP header to retrieve the values of missing pattern arguments. The application uses these values to look up the enrichment data and builds a new declaration that defines the appropriate enrichment data mapping.
Finally, the application prepends the new definition to the query, resends it for processing, and returns the result it receives to the user.
Example
In the examples, a pattern is declared, defined, and then invoked.
Declare an empty pattern
In this example, a middle-tier application enriches queries with longitude/latitude locations. The application uses an internal service to map IP addresses to longitude/latitude locations, and provides a pattern called map_ip_to_longlat
. When the query is run, it returns an error with missing pattern definitions:
map_ip_to_longlat("10.10.10.10")
Declare and define a pattern
The application does not parse this query and hence does not know which IP address (10.10.10.10) was passed to the pattern. So it prepends the user query with an empty map_ip_to_longlat
pattern declaration and sends it for processing:
declare pattern map_ip_to_longlat;
map_ip_to_longlat("10.10.10.10")
The application receives the following error in response.
Invoke a pattern
The application inspects the error, determines that the error indicates a missing pattern reference, and retrieves the missing IP address (10.10.10.10). It uses the IP address to look up the enrichment data in its internal service and builds a new pattern defining the mapping of the IP address to the corresponding longitude and latitude data. The new pattern is prepended to the user’s query and run again.
This time the query succeeds because the enrichment data is now declared in the query, and the result is sent to the user.
declare pattern map_ip_to_longlat = (address:string)
{
("10.10.10.10") = { print Lat=37.405992, Long=-122.078515 };
};
map_ip_to_longlat("10.10.10.10")
Output
Lat | Long |
---|---|
37.405992 | -122.078515 |
10.5 - Query parameters declaration statement
Queries sent to Kusto may include a set of name or value pairs. The pairs are called query parameters, together with the query text itself. The query may reference one or more values, by specifying names and type, in a query parameters declaration statement.
Query parameters have two main uses:
- As a protection mechanism against injection attacks.
- As a way to parameterize queries.
In particular, client applications that combine user-provided input in queries that they then send to Kusto should use the mechanism to protect against the Kusto equivalent of SQL Injection attacks.
Declaring query parameters
To reference query parameters, the query text, or functions it uses, must first declare which query parameter it uses. For each parameter, the declaration provides the name and scalar type. Optionally, the parameter can also have a default value. The default is used if the request doesn’t provide a concrete value for the parameter. Kusto then parses the query parameter’s value, according to its normal parsing rules for that type.
Syntax
declare
query_parameters
(
Name1 :
Type1 [=
DefaultValue1] [,
…] );
Parameters
Name | Type | Required | Description |
---|---|---|---|
Name1 | string | ✔️ | The name of a query parameter used in the query. |
Type1 | string | ✔️ | The corresponding type, such as string or datetime . The values provided by the user are encoded as strings. The appropriate parse method is applied to the query parameter to get a strongly typed value. |
DefaultValue1 | string | A default value for the parameter. This value must be a literal of the appropriate scalar type. |
Example
The examples in this section show how to use the syntax to help you get started.
Declare query parameters
This query retrieves storm events from the StormEvents table where the total number of direct and indirect injuries exceeds a specified threshold (default is 90). It then projects the EpisodeId, EventType, and the total number of injuries for each of these events.
declare query_parameters(maxInjured:long = 90);
StormEvents
| where InjuriesDirect + InjuriesIndirect > maxInjured
| project EpisodeId, EventType, totalInjuries = InjuriesDirect + InjuriesIndirect
Output
EpisodeId | EventType | totalInjuries |
---|---|---|
12459 | Winter Weather | 137 |
10477 | Excessive Heat | 200 |
10391 | Heat | 187 |
10217 | Excessive Heat | 422 |
10217 | Excessive Heat | 519 |
Specify query parameters in a client application
The names and values of query parameters are provided as string
values
by the application making the query. No name may repeat.
The interpretation of the values is done according to the query parameters declaration statement. Every value is parsed as if it were a literal in the body of a query. The parsing is done according to the type specified by the query parameters declaration statement.
REST API
Query parameters are provided by client applications through the properties
slot of the request body’s JSON object, in a nested property bag called
Parameters
. For example, here’s the body of a REST API call to Kusto
that calculates the age of some user, presumably by having the application
ask for the user’s birthday.
{
"ns": null,
"db": "myDB",
"csl": "declare query_parameters(birthday:datetime); print strcat(\"Your age is: \", tostring(now() - birthday))",
"properties": "{\"Options\":{},\"Parameters\":{\"birthday\":\"datetime(1970-05-11)\",\"courses\":\"dynamic(['Java', 'C++'])\"}}"
}
Kusto SDKs
To learn how to provide the names and values of query parameters when using Kusto client libraries, see Use query parameters to protect user input.
Kusto.Explorer
To set the query parameters sent when making a request to the service,
use the Query parameters “wrench” icon (ALT
+ P
).
10.6 - Query statements
A query consists of one or more query statements, delimited by a semicolon (;
).
At least one of these query statements must be a tabular expression statement.
The tabular expression statement generates one or more tabular results. Any two statements must be separated by a semicolon.
When the query has more than one tabular expression statement, the query has a batch of tabular expression statements, and the tabular results generated by these statements are all returned by the query.
Two types of query statements:
- Statements that are primarily used by users (user query statements),
- Statements that have been designed to support scenarios in which mid-tier applications take user queries and send a modified version of them to Kusto (application query statements).
Some query statements are useful in both scenarios.
User query statements
Following is a list of user query statements:
A let statement defines a binding between a name and an expression. Let statements can be used to break a long query into small named parts that are easier to understand.
A set statement sets a request property that affects how the query is processed and its results returned.
A tabular expression statement, the most important query statement, returns the “interesting” data back as results.
Application query statements
Following is a list of application query statements:
An alias statement defines an alias to another database (in the same cluster or on a remote cluster).
A pattern statement, which can be used by applications that are built on top of Kusto and expose the query language to their users to inject themselves into the query name resolution process.
A query parameters statement, which is used by applications that are built on top of Kusto to protect themselves against injection attacks (similar to how command parameters protect SQL against SQL injection attacks.)
A restrict statement, which is used by applications that are built on top of Kusto to restrict queries to a specific subset of data in Kusto (including restricting access to specific columns and records.)
10.7 - Restrict statement
The restrict statement limits the set of table/view entities which are visible to query statements that follow it. For example, in a database that includes two tables (A
, B
), the application can prevent the rest of the query from accessing B
and only “see” a limited form of table A
by using a view.
The restrict statement’s main scenario is for middle-tier applications that accept queries from users and want to
apply a row-level security mechanism over those queries.
The middle-tier application can prefix the user’s query with a logical model, a set of let statements to define views that restrict the user’s access to data, for example ( T | where UserId == "..."
). As the last statement being added, it restricts the user’s access to the logical model only.
Syntax
restrict
access
to
(
EntitySpecifiers)
Parameters
Name | Type | Required | Description |
---|---|---|---|
EntitySpecifiers | string | ✔️ | One or more comma-separated entity specifiers. The possible values are: - An identifier defined by a let statement as a tabular view - A table or function reference, similar to one used by a union statement - A pattern defined by a pattern declaration |
Examples
The examples in this section show how to use the syntax to help you get started.
Let statement
The example uses a let statement appearing before restrict
statement.
// Limit access to 'Test' let statement only
let Test = () { print x=1 };
restrict access to (Test);
Tables or functions
The example uses references to tables or functions that are defined in the database metadata.
// Assuming the database that the query uses has table Table1 and Func1 defined in the metadata,
// and other database 'DB2' has Table2 defined in the metadata
restrict access to (database().Table1, database().Func1, database('DB2').Table2);
Patterns
The example uses wildcard patterns that can match multiples of let statements or tables/functions.
let Test1 = () { print x=1 };
let Test2 = () { print y=1 };
restrict access to (*);
// Now access is restricted to Test1, Test2 and no tables/functions are accessible.
// Assuming the database that the query uses has table Table1 and Func1 defined in the metadata.
// Assuming that database 'DB2' has table Table2 and Func2 defined in the metadata
restrict access to (database().*);
// Now access is restricted to all tables/functions of the current database ('DB2' is not accessible).
// Assuming the database that the query uses has table Table1 and Func1 defined in the metadata.
// Assuming that database 'DB2' has table Table2 and Func2 defined in the metadata
restrict access to (database('DB2').*);
// Now access is restricted to all tables/functions of the database 'DB2'
Prevent user from querying other user data
The example shows how a middle-tier application can prepend a user’s query with a logical model that prevents the user from querying any other user’s data.
// Assume the database has a single table, UserData,
// with a column called UserID and other columns that hold
// per-user private information.
//
// The middle-tier application generates the following statements.
// Note that "username@domain.com" is something the middle-tier application
// derives per-user as it authenticates the user.
let RestrictedData = view () { Data | where UserID == "username@domain.com" };
restrict access to (RestrictedData);
// The rest of the query is something that the user types.
// This part can only reference RestrictedData; attempting to reference Data
// will fail.
RestrictedData | summarize MonthlySalary=sum(Salary) by Year, Month
// Restricting access to Table1 in the current database (database() called without parameters)
restrict access to (database().Table1);
Table1 | count
// Restricting access to Table1 in the current database and Table2 in database 'DB2'
restrict access to (database().Table1, database('DB2').Table2);
union
(Table1),
(database('DB2').Table2))
| count
// Restricting access to Test statement only
let Test = () { range x from 1 to 10 step 1 };
restrict access to (Test);
Test
// Assume that there is a table called Table1, Table2 in the database
let View1 = view () { Table1 | project Column1 };
let View2 = view () { Table2 | project Column1, Column2 };
restrict access to (View1, View2);
// When those statements appear before the command - the next works
let View1 = view () { Table1 | project Column1 };
let View2 = view () { Table2 | project Column1, Column2 };
restrict access to (View1, View2);
View1 | count
// When those statements appear before the command - the next access is not allowed
let View1 = view () { Table1 | project Column1 };
let View2 = view () { Table2 | project Column1, Column2 };
restrict access to (View1, View2);
Table1 | count
10.8 - Set statement
The set
statement is used to set a request property for the duration of the query.
Request properties control how a query executes and returns results. They can be boolean flags, which are false
by default, or have an integer value. A query may contain zero, one, or more set statements. Set statements affect only the tabular expression statements that trail them in the program order. Any two statements must be separated by a semicolon.
Request properties aren’t formally a part of the Kusto Query Language and may be modified without being considered as a breaking language change.
Syntax
set
OptionName [=
OptionValue]
Parameters
Name | Type | Required | Description |
---|---|---|---|
OptionName | string | ✔️ | The name of the request property. |
OptionValue | ✔️ | The value of the request property. |
Example
This query enables query tracing and then fetches the first 100 records from the StormEvents table.
set querytrace;
StormEvents | take 100
Output
The table shows the first few results.
StartTime | EndTime | EpisodeId | EventId | State | EventType |
---|---|---|---|---|---|
2007-01-15T12:30:00Z | 2007-01-15T16:00:00Z | 1636 | 7821 | OHIO | Flood |
2007-08-03T01:50:00Z | 2007-08-03T01:50:00Z | 10085 | 56083 | NEW YORK | Thunderstorm Wind |
2007-08-03T15:33:00Z | 2007-08-03T15:33:00Z | 10086 | 56084 | NEW YORK | Hail |
2007-08-03T15:40:00Z | 2007-08-03T15:40:00Z | 10086 | 56085 | NEW YORK | Hail |
2007-08-03T23:15:00Z | 2007-08-05T04:30:00Z | 6569 | 38232 | NEBRASKA | Flood |
2007-08-06T18:19:00Z | 2007-08-06T18:19:00Z | 6719 | 39781 | IOWA | Thunderstorm Wind |
… | … | … | … | … | … |
10.9 - Tabular expression statements
The tabular expression statement is what people usually have in mind when they talk about queries. This statement usually appears last in the statement list, and both its input and its output consists of tables or tabular datasets. Any two statements must be separated by a semicolon.
A tabular expression statement is generally composed of tabular data sources such as tables, tabular data operators such as filters and projections, and optional rendering operators. The composition is represented by the pipe character (|
), giving the statement a regular form that visually represents the flow of tabular data from left to right.
Each operator accepts a tabular dataset “from the pipe”, and other inputs including more tabular datasets from the body of the operator, then emits a tabular dataset to the next operator that follows.
Syntax
Source |
Operator1 |
Operator2 |
RenderInstruction
Parameters
Name | Type | Required | Description |
---|---|---|---|
Source | string | ✔️ | A tabular data source. See Tabular data sources. |
Operator | string | ✔️ | Tabular data operators, such as filters and projections. |
RenderInstruction | string | Rendering operators or instructions. |
Tabular data sources
A tabular data source produces sets of records, to be further processed by tabular data operators. The following list shows supported tabular data sources:
- Table references
- The tabular range operator
- The print operator
- An invocation of a function that returns a table
- A table literal (“datatable”)
Examples
The examples in this section show how to use the syntax to help you get started.
Filter rows by condition
This query counts the number of records in the StormEvents
table that have a value of “FLORIDA” in the State
column.
StormEvents
| where State == "FLORIDA"
| count
Output
Count |
---|
1042 |
Combine data from two tables
In this example, the join operator is used to combine records from two tabular data sources: the StormEvents
table and the PopulationData
table.
StormEvents
| where InjuriesDirect + InjuriesIndirect > 50
| join (PopulationData) on State
| project State, Population, TotalInjuries = InjuriesDirect + InjuriesIndirect
Output
State | Population | TotalInjuries |
---|---|---|
ALABAMA | 4918690 | 60 |
CALIFORNIA | 39562900 | 61 |
KANSAS | 2915270 | 63 |
MISSOURI | 6153230 | 422 |
OKLAHOMA | 3973710 | 200 |
TENNESSEE | 6886720 | 187 |
TEXAS | 29363100 | 137 |
11 - Reference
11.1 - JSONPath syntax
JSONPath notation describes the path to one or more elements in a JSON document.
The JSONPath notation is used in the following scenarios:
- To specify data mappings for ingestion
- To specify data mappings for external tables
- In Kusto Query Language (KQL) functions that process dynamic objects, like bag_remove_keys() and extract_json()
The following subset of the JSONPath notation is supported:
Path expression | Description |
---|---|
$ | Root object |
. | Selects the specified property in a parent object. Use this notation if the property doesn’t contain special characters. |
['property'] or ["property"] | Selects the specified property in a parent object. Make sure you put single quotes or double quotes around the property name. Use this notation if the property name contains special characters, such as spaces, or begins with a character other than A..Za..z_ . |
[n] | Selects the n-th element from an array. Indexes are 0-based. |
Example
Given the following JSON document:
{
"Source": "Server-01",
"Timestamp": "2023-07-25T09:15:32.123Z",
"Log Level": "INFO",
"Message": "Application started successfully.",
"Details": {
"Service": "AuthService",
"Endpoint": "/api/login",
"Response Code": 200,
"Response Time": 54.21,
"User": {
"User ID": "user123",
"Username": "kiana_anderson",
"IP Address": "192.168.1.100"
}
}
}
You can represent each of the fields with JSONPath notation as follows:
"$.Source" // Source field
"$.Timestamp" // Timestamp field
"$['Log Level']" // Log Level field
"$.Message" // Message field
"$.Details.Service" // Service field
"$.Details.Endpoint" // Endpoint field
"$.Details['Response Code']" // Response Code field
"$.Details['Response Time']" // Response Time field
"$.Details.User['User ID']" // User ID field
"$.Details.User.Username" // Username field
"$.Details.User['IP Address']" // IP Address field
Related content
11.2 - KQL docs navigation guide
The behavior of KQL may vary when using this language in different services. When you view any KQL documentation article by using our Learn website, the currently chosen service name is visible above the table of contents (TOC) under the Version dropdown. Switch between services using the version dropdown to see the KQL behavior for the selected service.
Change service selection
HTTPS parameter view=
Applies to services
Most of the KQL articles have the words Applies to under their title. On the same line, there follows a handy listing of services with indicators of which services are relevant for this article. For example, a certain function could be applicable to Fabric and Azure Data Explorer, but not Azure Monitor or others. If you do not see the service you are using, most likely the article is not relevant to your service.
Versions
The following table describes the different versions of KQL and the services they are associated with.
Version | Description |
---|---|
Microsoft Fabric | Microsoft Fabric is an end-to-end analytics and data platform designed for enterprises that require a unified solution. It encompasses data movement, processing, ingestion, transformation, real-time event routing, and report building. Within the suite of experiences offered in Microsof Fabric, Real-Time Intelligence is a powerful service that empowers everyone in your organization to extract insights and visualize their data in motion. It offers an end-to-end solution for event-driven scenarios, streaming data, and data logs. The main query environment for KQL in Microsoft Fabric is the KQL queryset. KQL in Microsoft Fabric supports query operators, functions, and management commands. |
Azure Data Explorer | Azure Data Explorer is a fully managed, high-performance, big data analytics platform that makes it easy to analyze high volumes of data in near real time. There are several query environments and integrations that can be used in Azure Data Explorer, including the web UI. KQL in Azure Data Explorer is the full, native version, which supports all query operators, functions, and management commands. |
Azure Monitor | Log Analytics is a tool in the Azure portal that’s used to edit and run log queries against data in the Azure Monitor Logs store. You interact with Log Anlytics in a Log Analytics workspace in the Azure portal. KQL in Azure Monitor uses a subset of the overall KQL operators and functions. |
Microsoft Sentinel | Microsoft Sentinel is a scalable, cloud-native security information and event management (SIEM) that delivers an intelligent and comprehensive solution for SIEM and security orchestration, automation, and response (SOAR). Microsoft Sentinel provides cyberthreat detection, investigation, response, and proactive hunting, with a bird’s-eye view across your enterprise. Microsoft Sentinel is built on top of the Azure Monitor service and it uses Azure Monitor’s Log Analytics workspaces to store all of its data. KQL in Microsoft Sentinel uses a subset of the overall KQL operators and functions. |
11.3 - Regex syntax
This article provides an overview of regular expression syntax supported by Kusto Query Language (KQL).
There are a number of KQL operators and functions that perform string matching, selection, and extraction with regular expressions, such as matches regex
, parse
, and replace_regex()
.
In KQL, regular expressions must be encoded as string literals and follow the string quoting rules. For example, the regular expression \A
is represented in KQL as "\\A"
. The extra backslash indicates that the other backslash is part of the regular expression \A
.
Syntax
The following sections document the regular expression syntax supported by Kusto.
Match one character
Pattern | Description |
---|---|
. | Any character except new line (includes new line with s flag). |
[0-9] | Any ASCII digit. |
[^0-9] | Any character that isn’t an ASCII digit. |
\d | Digit (\p{Nd} ). |
\D | Not a digit. |
\pX | Unicode character class identified by a one-letter name. |
\p{Greek} | Unicode character class (general category or script). |
\PX | Negated Unicode character class identified by a one-letter name. |
\P{Greek} | Negated Unicode character class (general category or script). |
Character classes
Pattern | Description |
---|---|
[xyz] | Character class matching either x, y or z (union). |
[^xyz] | Character class matching any character except x, y, and z. |
[a-z] | Character class matching any character in range a-z. |
[[:alpha:]] | ASCII character class ([A-Za-z]). |
[[:^alpha:]] | Negated ASCII character class ([^A-Za-z]). |
[x[^xyz]] | Nested/grouping character class (matching any character except y and z). |
[a-y&&xyz] | Intersection (matching x or y). |
[0-9&&[^4]] | Subtraction using intersection and negation (matching 0-9 except 4). |
[0-9--4] | Direct subtraction (matching 0-9 except 4). |
[a-g~~b-h] | Symmetric difference (matching a and h only). |
[\[\]] | Escape in character classes (matching [ or ]). |
[a&&b] | Empty character class matching nothing. |
Precedence in character classes is from most binding to least binding:
- Ranges:
[a-cd]
==[[a-c]d]
- Union:
[ab&&bc]
==[[ab]&&[bc]]
- Intersection, difference, symmetric difference: All have equivalent precedence, and are evaluated from left-to-right. For example,
[\pL--\p{Greek}&&\p{Uppercase}]
==[[\pL--\p{Greek}]&&\p{Uppercase}]
. - Negation:
[^a-z&&b]
==[^[a-z&&b]]
.
Composites
Pattern | Description |
---|---|
xy | Concatenation (x followed by y ) |
x|y | Alternation (x or y , prefer x ) |
Repetitions
Pattern | Description |
---|---|
x* | Zero or more of x (greedy) |
x+ | One or more of x (greedy) |
x? | Zero or one of x (greedy) |
x*? | Zero or more of x (ungreedy/lazy) |
x+? | One or more of x (ungreedy/lazy) |
x?? | Zero or one of x (ungreedy/lazy) |
x{n,m} | At least n x and at most m x (greedy) |
x{n,} | At least n x (greedy) |
x{n} | Exactly n x |
x{n,m}? | At least n x and at most m x (ungreedy/lazy) |
x{n,}? | At least n x (ungreedy/lazy) |
x{n}? | Exactly n x |
Empty matches
Pattern | Description |
---|---|
^ | Beginning of a haystack or start-of-line with multi-line mode. |
$ | End of a haystack or end-of-line with multi-line mode. |
\A | Only the beginning of a haystack, even with multi-line mode enabled. |
\z | Only the end of a haystack, even with multi-line mode enabled. |
\b | Unicode word boundary with \w on one side and \W , \A , or \z on other. |
\B | Not a Unicode word boundary. |
\b{start} , \< | Unicode start-of-word boundary with \W|\A at the start of the string and \w on the other side. |
\b{end} , \> | Unicode end-of-word boundary with \w on one side and \W|\z at the end. |
\b{start-half} | Half of a Unicode start-of-word boundary with \W|\A at the beginning of the boundary. |
\b{end-half} | Half of a Unicode end-of-word boundary with \W|\z at the end. |
Grouping and flags
Pattern | Description |
---|---|
(exp) | Numbered capture group (indexed by opening parenthesis). |
(?P<name>exp) | Named capture group (names must be alpha-numeric). |
(?<name>exp) | Named capture group (names must be alpha-numeric). |
(?:exp) | Non-capturing group. |
(?flags) | Set flags within current group. |
(?flags:exp) | Set flags for exp (non-capturing). |
Capture group names can contain only alpha-numeric Unicode codepoints, dots .
, underscores _
, and square brackets[
and ]
. Names must start with either an _
or an alphabetic codepoint. Alphabetic codepoints correspond to the Alphabetic
Unicode property, while numeric codepoints correspond to the union of the Decimal_Number
, Letter_Number
and Other_Number
general categories.
Flags are single characters. For example, (?x)
sets the flag x
and (?-x)
clears the flag x
. Multiple flags can be set or cleared at the same time: (?xy)
sets both the x
and y
flags and (?x-y)
sets the x
flag and clears the y
flag. By default all flags are disabled unless stated otherwise. They are:
Flag | Description |
---|---|
i | Case-insensitive: letters match both upper and lower case. |
m | Multi-line mode: ^ and $ match begin/end of line. |
s | Allow dot (.). to match \n . |
R | Enables CRLF mode: when multi-line mode is enabled, \r\n is used. |
U | Swap the meaning of x* and x*? . |
u | Unicode support (enabled by default). |
x | Verbose mode, ignores whitespace and allow line comments (starting with # ). |
In verbose mode, whitespace is ignored everywhere, including within character classes. To insert whitespace, use its escaped form or a hex literal. For example, \
or \x20
for an ASCII space.
Escape sequences
Pattern | Description |
---|---|
\* | Literal * , applies to all ASCII except [0-9A-Za-z<>] |
\a | Bell (\x07 ) |
\f | Form feed (\x0C ) |
\t | Horizontal tab |
\n | New line |
\r | Carriage return |
\v | Vertical tab (\x0B ) |
\A | Matches at the beginning of a haystack |
\z | Matches at the end of a haystack |
\b | Word boundary assertion |
\B | Negated word boundary assertion |
\b{start} , \< | Start-of-word boundary assertion |
\b{end} , \> | End-of-word boundary assertion |
\b{start-half} | Half of a start-of-word boundary assertion |
\b{end-half} | Half of an end-of-word boundary assertion |
\123 | Octal character code, up to three digits |
\x7F | Hex character code (exactly two digits) |
\x{10FFFF} | Hex character code corresponding to a Unicode code point |
\u007F | Hex character code (exactly four digits) |
\u{7F} | Hex character code corresponding to a Unicode code point |
\U0000007F | Hex character code (exactly eight digits) |
\U{7F} | Hex character code corresponding to a Unicode code point |
\p{Letter} | Unicode character class |
\P{Letter} | Negated Unicode character class |
\d , \s , \w | Perl character class |
\D , \S , \W | Negated Perl character class |
Perl character classes (Unicode friendly)
These classes are based on the definitions provided in UTS#18:
Pattern | Description |
---|---|
\d | Ddigit (\p{Nd} ) |
\D | Not digit |
\s | Whitespace (\p{White_Space} ) |
\S | Not whitespace |
\w | Word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control} ) |
\W | Not word character |
ASCII character classes
These classes are based on the definitions provided in UTS#18:
Pattern | Description |
---|---|
[[:alnum:]] | Alphanumeric ([0-9A-Za-z] ) |
[[:alpha:]] | Alphabetic ([A-Za-z] ) |
[[:ascii:]] | ASCII ([\x00-\x7F] ) |
[[:blank:]] | Blank ([\t ] ) |
[[:cntrl:]] | Control ([\x00-\x1F\x7F] ) |
[[:digit:]] | Digits ([0-9] ) |
[[:graph:]] | Graphical ([!-~] ) |
[[:lower:]] | Lower case ([a-z] ) |
[[:print:]] | Printable ([ -~] ) |
[[:punct:]] | Punctuation ([!-/:-@\[-`{-~] ) |
[[:space:]] | Whitespace ([\t\n\v\f\r ] ) |
[[:upper:]] | Upper case ([A-Z] ) |
[[:word:]] | Word characters ([0-9A-Za-z_] ) |
[[:xdigit:]] | Hex digit ([0-9A-Fa-f] ) |
Performance
This section provides some guidance on speed and resource usage of regex expressions.
Unicode can affect memory usage and search speed
KQL regex provides first class support for Unicode. In many cases, the extra memory required to support Unicode is negligible and doesn’t typically affect search speed.
The following are some examples of Unicode character classes that can affect memory usage and search speed:
Memory usage: The effect of Unicode primarily arises from the use of Unicode character classes. Unicode character classes tend to be larger in size. For example, the
\w
character class matches around 140,000 distinct codepoints by default. This requires more memory and can slow down regex compilation. If ASCII satisfies your requirements, use ASCII classes instead of Unicode classes. The ASCII-only version of\w
can be expressed in multiple ways, all of which are equivalent.[0-9A-Za-z_] (?-u:\w) [[:word:]] [\w&&\p{ascii}]
Search speed: Unicode tends to be handled well, even when using large Unicode character classes. However, some of the faster internal regex engines can’t handle a Unicode aware word boundary assertion. So if you don’t need Unicode-aware word boundary assertions, you might consider using
(?-u:\b)
instead of\b
. The(?-u:\b)
uses an ASCII-only definition of a word character, which can improve search speed.
Literals can accelerate searches
KQL regex has a strong ability to recognize literals within a regex pattern, which can significantly speed up searches. If possible, including literals in your pattern can greatly improve search performance. For example, in the regex \w+@\w+
, first occurrences of @
are matched and then a reverse match is performed for \w+
to find the starting position.
11.4 - Splunk to Kusto map
This article is intended to assist users who are familiar with Splunk learn the Kusto Query Language to write log queries with Kusto. Direct comparisons are made between the two to highlight key differences and similarities, so you can build on your existing knowledge.
Structure and concepts
The following table compares concepts and data structures between Splunk and Kusto logs:
Concept | Splunk | Kusto | Comment |
---|---|---|---|
deployment unit | cluster | cluster | Kusto allows arbitrary cross-cluster queries. Splunk doesn’t. |
data caches | buckets | caching and retention policies | Controls the period and caching level for the data. This setting directly affects the performance of queries and the cost of the deployment. |
logical partition of data | index | database | Allows logical separation of the data. Both implementations allow unions and joining across these partitions. |
structured event metadata | N/A | table | Splunk doesn’t expose the concept of event metadata to the search language. Kusto logs have the concept of a table, which has columns. Each event instance is mapped to a row. |
record | event | row | Terminology change only. |
record attribute | field | column | In Kusto, this setting is predefined as part of the table structure. In Splunk, each event has its own set of fields. |
types | datatype | datatype | Kusto data types are more explicit because they’re set on the columns. Both have the ability to work dynamically with data types and roughly equivalent set of datatypes, including JSON support. |
query and search | search | query | Concepts essentially are the same between Kusto and Splunk. |
event ingestion time | system time | ingestion_time() | In Splunk, each event gets a system timestamp of the time the event was indexed. In Kusto, you can define a policy called ingestion_time that exposes a system column that can be referenced through the ingestion_time() function. |
Functions
The following table specifies functions in Kusto that are equivalent to Splunk functions.
Splunk | Kusto | Comment |
---|---|---|
strcat | strcat() | (1) |
split | split() | (1) |
if | iff() | (1) |
tonumber | todouble() tolong() toint() | (1) |
upper lower | toupper() tolower() | (1) |
replace | replace_string() , replace_strings() or replace_regex() | (1) Although replace functions take three parameters in both products, the parameters are different. |
substr | substring() | (1) Also note that Splunk uses one-based indices. Kusto notes zero-based indices. |
tolower | tolower() | (1) |
toupper | toupper() | (1) |
match | matches regex | (2) |
regex | matches regex | In Splunk, regex is an operator. In Kusto, it’s a relational operator. |
searchmatch | == | In Splunk, searchmatch allows searching for the exact string. |
random | rand() rand(n) | Splunk’s function returns a number between zero to 231-1. Kusto’s returns a number between 0.0 and 1.0, or if a parameter is provided, between 0 and n-1. |
now | now() | (1) |
relative_time | totimespan() | (1) In Kusto, Splunk’s equivalent of relative_time(datetimeVal, offsetVal) is datetimeVal + totimespan(offsetVal) .For example, search | eval n=relative_time(now(), "-1d@d") becomes ... | extend myTime = now() - totimespan("1d") . |
(1) In Splunk, the function is invoked by using the eval
operator. In Kusto, it’s used as part of extend
or project
.
(2) In Splunk, the function is invoked by using the eval
operator. In Kusto, it can be used with the where
operator.
Operators
The following sections give examples of how to use different operators in Splunk and Kusto.
Search
In Splunk, you can omit the search
keyword and specify an unquoted string. In Kusto, you must start each query with find
, an unquoted string is a column name, and the lookup value must be a quoted string.
Product | Operator | Example |
---|---|---|
Splunk | search | search Session.Id="c8894ffd-e684-43c9-9125-42adc25cd3fc" earliest=-24h |
Kusto | find | find Session.Id=="c8894ffd-e684-43c9-9125-42adc25cd3fc" and ingestion_time()> ago(24h) |
Filter
Kusto log queries start from a tabular result set in which filter
is applied. In Splunk, filtering is the default operation on the current index. You also can use the where
operator in Splunk, but we don’t recommend it.
Product | Operator | Example |
---|---|---|
Splunk | search | Event.Rule="330009.2" Session.Id="c8894ffd-e684-43c9-9125-42adc25cd3fc" _indextime>-24h |
Kusto | where | Office_Hub_OHubBGTaskError | where Session_Id == "c8894ffd-e684-43c9-9125-42adc25cd3fc" and ingestion_time() > ago(24h) |
Get n events or rows for inspection
Kusto log queries also support take
as an alias to limit
. In Splunk, if the results are ordered, head
returns the first n results. In Kusto, limit
isn’t ordered, but it returns the first n rows that are found.
Product | Operator | Example |
---|---|---|
Splunk | head | Event.Rule=330009.2 | head 100 |
Kusto | limit | Office_Hub_OHubBGTaskError | limit 100 |
Get the first n events or rows ordered by a field or column
For the bottom results, in Splunk, you use tail
. In Kusto, you can specify ordering direction by using asc
.
Product | Operator | Example |
---|---|---|
Splunk | head | Event.Rule="330009.2" | sort Event.Sequence | head 20 |
Kusto | top | Office_Hub_OHubBGTaskError | top 20 by Event_Sequence |
Extend the result set with new fields or columns
Splunk has an eval
function, but it’s not comparable to the eval
operator in Kusto. Both the eval
operator in Splunk and the extend
operator in Kusto support only scalar functions and arithmetic operators.
Product | Operator | Example |
---|---|---|
Splunk | eval | Event.Rule=330009.2 | eval state= if(Data.Exception = "0", "success", "error") |
Kusto | extend | Office_Hub_OHubBGTaskError | extend state = iff(Data_Exception == 0,"success" ,"error") |
Rename
Kusto uses the project-rename
operator to rename a field. In the project-rename
operator, a query can take advantage of any indexes that are prebuilt for a field. Splunk has a rename
operator that does the same.
Product | Operator | Example |
---|---|---|
Splunk | rename | Event.Rule=330009.2 | rename Date.Exception as execption |
Kusto | project-rename | Office_Hub_OHubBGTaskError | project-rename exception = Date_Exception |
Format results and projection
Splunk uses the table
command to select which columns to include in the results. Kusto has a project
operator that does the same and more.
Product | Operator | Example |
---|---|---|
Splunk | table | Event.Rule=330009.2 | table rule, state |
Kusto | project | Office_Hub_OHubBGTaskError | project exception, state |
Splunk uses the fields -
command to select which columns to exclude from the results. Kusto has a project-away
operator that does the same.
Product | Operator | Example |
---|---|---|
Splunk | fields - | Event.Rule=330009.2 | fields - quota, hightest_seller |
Kusto | project-away | Office_Hub_OHubBGTaskError | project-away exception, state |
Aggregation
See the list of summarize aggregations functions that are available.
Splunk operator | Splunk example | Kusto operator | Kusto example |
---|---|---|---|
stats | search (Rule=120502.*) | stats count by OSEnv, Audience | summarize | Office_Hub_OHubBGTaskError | summarize count() by App_Platform, Release_Audience |
evenstats | ... | stats count_i by time, category | eventstats sum(count_i) AS count_total by _time_ | join | T2 | join kind=inner (T1) on _time | project _time, category, count_i, count_total |
Join
join
in Splunk has substantial limitations. The subquery has a limit of 10,000 results (set in the deployment configuration file), and a limited number of join flavors are available.
Product | Operator | Example |
---|---|---|
Splunk | join | Event.Rule=120103* | stats by Client.Id, Data.Alias | join Client.Id max=0 [search earliest=-24h Event.Rule="150310.0" Data.Hresult=-2147221040] |
Kusto | join | cluster("OAriaPPT").database("Office PowerPoint").Office_PowerPoint_PPT_Exceptions | where Data_Hresult== -2147221040 | join kind = inner (Office_System_SystemHealthMetadata | summarize by Client_Id, Data_Alias)on Client_Id |
Sort
The default sort order is ascending. To specify descending order, add a minus sign (-
) before the field name. Kusto also supports defining where to put nulls, either at the beginning or at the end.
Product | Operator | Example |
---|---|---|
Splunk | sort | Event.Rule=120103 | sort -Data.Hresult |
Kusto | order by | Office_Hub_OHubBGTaskError | order by Data_Hresult, desc |
Multivalue expand
The multivalue expand operator is similar in both Splunk and Kusto.
Product | Operator | Example |
---|---|---|
Splunk | mvexpand | mvexpand solutions |
Kusto | mv-expand | mv-expand solutions |
Result facets, interesting fields
In Log Analytics in the Azure portal, only the first column is exposed. All columns are available through the API.
Product | Operator | Example |
---|---|---|
Splunk | fields | Event.Rule=330009.2 | fields App.Version, App.Platform |
Kusto | facets | Office_Excel_BI_PivotTableCreate | facet by App_Branch, App_Version |
Deduplicate
In Kusto, you can use summarize arg_min()
to reverse the order of which record is chosen.
Product | Operator | Example |
---|---|---|
Splunk | dedup | Event.Rule=330009.2 | dedup device_id sortby -batterylife |
Kusto | summarize arg_max() | Office_Excel_BI_PivotTableCreate | summarize arg_max(batterylife, *) by device_id |
Related content
- Walk through a tutorial on the Kusto Query Language.
11.5 - SQL to Kusto query translation
If you’re familiar with SQL and want to learn KQL, translate SQL queries into KQL by prefacing the SQL query with a comment line, --
, and the keyword explain
. The output shows the KQL version of the query, which can help you understand the KQL syntax and concepts.
--
explain
SELECT COUNT_BIG(*) as C FROM StormEvents
Output
Query |
---|
StormEvents<br> | summarize C=count()<br> | project C |
SQL to Kusto cheat sheet
The following table shows sample queries in SQL and their KQL equivalents.
| Category | SQL Query | Kusto Query | Learn more |
|–|–|–|
| Select data from table | SELECT * FROM dependencies
| dependencies
| Tabular expression statements |
| – | SELECT name, resultCode FROM dependencies
| dependencies | project name, resultCode
| project |
| – | SELECT TOP 100 * FROM dependencies
| dependencies | take 100
| take |
| Null evaluation | SELECT * FROM dependencies
WHERE resultCode IS NOT NULL
| dependencies
| where isnotnull(resultCode)
| isnotnull() |
| Comparison operators (date) | SELECT * FROM dependencies
WHERE timestamp > getdate()-1
| dependencies
| where timestamp > ago(1d)
| ago() |
| – | SELECT * FROM dependencies
WHERE timestamp BETWEEN ... AND ...
| dependencies
| where timestamp between (datetime(2016-10-01) .. datetime(2016-11-01))
| between |
| Comparison operators (string) | SELECT * FROM dependencies
WHERE type = "Azure blob"
| dependencies
| where type == "Azure blob"
| Logical operators |
| – | -- substring
SELECT * FROM dependencies
WHERE type like "%blob%"
| // substring
dependencies
| where type has "blob"
| has |
| – | -- wildcard
SELECT * FROM dependencies
WHERE type like "Azure%"
| // wildcard
dependencies
| where type startswith "Azure"
// or
dependencies
| where type matches regex "^Azure.*"
| startswith
matches regex |
| Comparison (boolean) | SELECT * FROM dependencies
WHERE !(success)
| dependencies
| where success == False
| Logical operators |
| Grouping, Aggregation | SELECT name, AVG(duration) FROM dependencies
GROUP BY name
| dependencies
| summarize avg(duration) by name
| summarizeavg() |
| Distinct | SELECT DISTINCT name, type FROM dependencies
| dependencies
| summarize by name, type
| summarizedistinct |
| – | SELECT name, COUNT(DISTINCT type)
FROM dependencies
GROUP BY name
| dependencies
| summarize by name, type | summarize count() by name
// or approximate for large sets
dependencies
| summarize dcount(type) by name
| count()dcount() |
| Column aliases, Extending | SELECT operationName as Name, AVG(duration) as AvgD FROM dependencies
GROUP BY name
| dependencies
| summarize AvgD = avg(duration) by Name=operationName
| Alias statement |
| – | SELECT conference, CONCAT(sessionid, ' ' , session_title) AS session FROM ConferenceSessions
| ConferenceSessions
| extend session=strcat(sessionid, " ", session_title)
| project conference, session
| strcat()project |
| Ordering | SELECT name, timestamp FROM dependencies
ORDER BY timestamp ASC
| dependencies
| project name, timestamp
| sort by timestamp asc nulls last
| sort |
| Top n by measure | SELECT TOP 100 name, COUNT(*) as Count FROM dependencies
GROUP BY name
ORDER BY Count DESC
| dependencies
| summarize Count = count() by name
| top 100 by Count desc
| top |
| Union | SELECT * FROM dependencies
UNION
SELECT * FROM exceptions
| union dependencies, exceptions
| union |
| – | SELECT * FROM dependencies
WHERE timestamp > ...
UNION
SELECT * FROM exceptions
WHERE timestamp > ...
| dependencies
| where timestamp > ago(1d)
| union
(exceptions
| where timestamp > ago(1d))
| |
| Join | SELECT * FROM dependencies
LEFT OUTER JOIN exceptions
ON dependencies.operation_Id = exceptions.operation_Id
| dependencies
| join kind = leftouter
(exceptions)
on $left.operation_Id == $right.operation_Id
| join |
| Nested queries | SELECT * FROM dependencies
WHERE resultCode ==
(SELECT TOP 1 resultCode FROM dependencies
WHERE resultId = 7
ORDER BY timestamp DESC)
| dependencies
| where resultCode == toscalar(
dependencies
| where resultId == 7
| top 1 by timestamp desc
| project resultCode)
| toscalar |
| Having | SELECT COUNT(\*) FROM dependencies
GROUP BY name
HAVING COUNT(\*) > 3
| dependencies
| summarize Count = count() by name
| where Count > 3
| summarizewhere |
Related content
- Use T-SQL to query data
11.6 - Timezone
The following is a list of timezones supported by the Internet Assigned Numbers Authority (IANA) Time Zone Database.
Related functions:
Timezone |
---|
Africa/Abidjan |
Africa/Accra |
Africa/Addis_Ababa |
Africa/Algiers |
Africa/Asmara |
Africa/Asmera |
Africa/Bamako |
Africa/Bangui |
Africa/Banjul |
Africa/Bissau |
Africa/Blantyre |
Africa/Brazzaville |
Africa/Bujumbura |
Africa/Cairo |
Africa/Casablanca |
Africa/Ceuta |
Africa/Conakry |
Africa/Dakar |
Africa/Dar_es_Salaam |
Africa/Djibouti |
Africa/Douala |
Africa/El_Aaiun |
Africa/Freetown |
Africa/Gaborone |
Africa/Harare |
Africa/Johannesburg |
Africa/Juba |
Africa/Kampala |
Africa/Khartoum |
Africa/Kigali |
Africa/Kinshasa |
Africa/Lagos |
Africa/Libreville |
Africa/Lome |
Africa/Luanda |
Africa/Lubumbashi |
Africa/Lusaka |
Africa/Malabo |
Africa/Maputo |
Africa/Maseru |
Africa/Mbabane |
Africa/Mogadishu |
Africa/Monrovia |
Africa/Nairobi |
Africa/Ndjamena |
Africa/Niamey |
Africa/Nouakchott |
Africa/Ouagadougou |
Africa/Porto-Novo |
Africa/Sao_Tome |
Africa/Timbuktu |
Africa/Tripoli |
Africa/Tunis |
Africa/Windhoek |
America/Adak |
America/Anchorage |
America/Anguilla |
America/Antigua |
America/Araguaina |
America/Argentina/Buenos_Aires |
America/Argentina/Catamarca |
America/Argentina/ComodRivadavia |
America/Argentina/Cordoba |
America/Argentina/Jujuy |
America/Argentina/La_Rioja |
America/Argentina/Mendoza |
America/Argentina/Rio_Gallegos |
America/Argentina/Salta |
America/Argentina/San_Juan |
America/Argentina/San_Luis |
America/Argentina/Tucuman |
America/Argentina/Ushuaia |
America/Aruba |
America/Asuncion |
America/Atikokan |
America/Atka |
America/Bahia |
America/Bahia_Banderas |
America/Barbados |
America/Belem |
America/Belize |
America/Blanc-Sablon |
America/Boa_Vista |
America/Bogota |
America/Boise |
America/Buenos_Aires |
America/Cambridge_Bay |
America/Campo_Grande |
America/Cancun |
America/Caracas |
America/Catamarca |
America/Cayenne |
America/Cayman |
America/Chicago |
America/Chihuahua |
America/Coral_Harbour |
America/Cordoba |
America/Costa_Rica |
America/Creston |
America/Cuiaba |
America/Curacao |
America/Danmarkshavn |
America/Dawson |
America/Dawson_Creek |
America/Denver |
America/Detroit |
America/Dominica |
America/Edmonton |
America/Eirunepe |
America/El_Salvador |
America/Ensenada |
America/Fort_Nelson |
America/Fort_Wayne |
America/Fortaleza |
America/Glace_Bay |
America/Godthab |
America/Goose_Bay |
America/Grand_Turk |
America/Grenada |
America/Guadeloupe |
America/Guatemala |
America/Guayaquil |
America/Guyana |
America/Halifax |
America/Havana |
America/Hermosillo |
America/Indiana/Indianapolis |
America/Indiana/Knox |
America/Indiana/Marengo |
America/Indiana/Petersburg |
America/Indiana/Tell_City |
America/Indiana/Vevay |
America/Indiana/Vincennes |
America/Indiana/Winamac |
America/Indianapolis |
America/Inuvik |
America/Iqaluit |
America/Jamaica |
America/Jujuy |
America/Juneau |
America/Kentucky/Louisville |
America/Kentucky/Monticello |
America/Knox_IN |
America/Kralendijk |
America/La_Paz |
America/Lima |
America/Los_Angeles |
America/Louisville |
America/Lower_Princes |
America/Maceio |
America/Managua |
America/Manaus |
America/Marigot |
America/Martinique |
America/Matamoros |
America/Mazatlan |
America/Mendoza |
America/Menominee |
America/Merida |
America/Metlakatla |
America/Mexico_City |
America/Miquelon |
America/Moncton |
America/Monterrey |
America/Montevideo |
America/Montreal |
America/Montserrat |
America/Nassau |
America/New_York |
America/Nipigon |
America/Nome |
America/Noronha |
America/North_Dakota/Beulah |
America/North_Dakota/Center |
America/North_Dakota/New_Salem |
America/Nuuk |
America/Ojinaga |
America/Panama |
America/Pangnirtung |
America/Paramaribo |
America/Phoenix |
America/Port-au-Prince |
America/Port_of_Spain |
America/Porto_Acre |
America/Porto_Velho |
America/Puerto_Rico |
America/Punta_Arenas |
America/Rainy_River |
America/Rankin_Inlet |
America/Recife |
America/Regina |
America/Resolute |
America/Rio_Branco |
America/Rosario |
America/Santa_Isabel |
America/Santarem |
America/Santiago |
America/Santo_Domingo |
America/Sao_Paulo |
America/Scoresbysund |
America/Shiprock |
America/Sitka |
America/St_Barthelemy |
America/St_Johns |
America/St_Kitts |
America/St_Lucia |
America/St_Thomas |
America/St_Vincent |
America/Swift_Current |
America/Tegucigalpa |
America/Thule |
America/Thunder_Bay |
America/Tijuana |
America/Toronto |
America/Tortola |
America/Vancouver |
America/Virgin |
America/Whitehorse |
America/Winnipeg |
America/Yakutat |
America/Yellowknife |
Antarctica/Casey |
Antarctica/Davis |
Antarctica/DumontDUrville |
Antarctica/Macquarie |
Antarctica/Mawson |
Antarctica/McMurdo |
Antarctica/Palmer |
Antarctica/Rothera |
Antarctica/South_Pole |
Antarctica/Syowa |
Antarctica/Troll |
Antarctica/Vostok |
Arctic/Longyearbyen |
Asia/Aden |
Asia/Almaty |
Asia/Amman |
Asia/Anadyr |
Asia/Aqtau |
Asia/Aqtobe |
Asia/Ashgabat |
Asia/Ashkhabad |
Asia/Atyrau |
Asia/Baghdad |
Asia/Bahrain |
Asia/Baku |
Asia/Bangkok |
Asia/Barnaul |
Asia/Beirut |
Asia/Bishkek |
Asia/Brunei |
Asia/Kolkata |
Asia/Chita |
Asia/Choibalsan |
Asia/Chongqing |
Asia/Colombo |
Asia/Dacca |
Asia/Damascus |
Asia/Dhaka |
Asia/Dili |
Asia/Dubai |
Asia/Dushanbe |
Asia/Famagusta |
Asia/Gaza |
Asia/Harbin |
Asia/Hebron |
Asia/Ho_Chi_Minh_City |
Asia/Hong_Kong |
Asia/Hovd |
Asia/Irkutsk |
Asia/Istanbul |
Asia/Jakarta |
Asia/Jayapura |
Asia/Jerusalem |
Asia/Kabul |
Asia/Kamchatka |
Asia/Karachi |
Asia/Kashgar |
Asia/Kathmandu |
Asia/Katmandu |
Asia/Khandyga |
Asia/Kolkata |
Asia/Krasnoyarsk |
Asia/Kuala_Lumpur |
Asia/Kuching |
Asia/Kuwait |
Asia/Macao Special Administrative Region |
Asia/Magadan |
Asia/Makassar |
Asia/Manila |
Asia/Muscat |
Asia/Nicosia |
Asia/Novokuznetsk |
Asia/Novosibirsk |
Asia/Omsk |
Asia/Oral |
Asia/Phnom_Penh |
Asia/Pontianak |
Asia/Pyongyang |
Asia/Qatar |
Asia/Qostanay |
Asia/Qyzylorda |
Asia/Yangon (Rangoon) |
Asia/Riyadh |
Asia/Sakhalin |
Asia/Samarkand |
Asia/Seoul |
Asia/Shanghai |
Asia/Singapore |
Asia/Srednekolymsk |
Asia/Taipei |
Asia/Tashkent |
Asia/Tbilisi |
Asia/Tehran |
Asia/Tel_Aviv |
Asia/Thimbu |
Asia/Thimphu |
Asia/Tokyo |
Asia/Tomsk |
Asia/Ujung_Pandang |
Asia/Ulaanbaatar |
Asia/Ulan_Bator |
Asia/Urumqi |
Asia/Ust-Nera |
Asia/Vientiane |
Asia/Vladivostok |
Asia/Yakutsk |
Asia/Yangon |
Asia/Yekaterinburg |
Asia/Yerevan |
Atlantic/Azores |
Atlantic/Bermuda |
Atlantic/Canary |
Atlantic/Cape_Verde |
Atlantic/Faeroe |
Atlantic/Faroe |
Atlantic/Jan_Mayen |
Atlantic/Madeira |
Atlantic/Reykjavik |
Atlantic/South_Georgia |
Atlantic/St_Helena |
Atlantic/Stanley |
Australia/ACT |
Australia/Adelaide |
Australia/Brisbane |
Australia/Broken_Hill |
Australia/Canberra |
Australia/Currie |
Australia/Darwin |
Australia/Eucla |
Australia/Hobart |
Australia/LHI |
Australia/Lindeman |
Australia/Lord_Howe |
Australia/Melbourne |
Australia/NSW |
Australia/North |
Australia/Perth |
Australia/Queensland |
Australia/South |
Australia/Sydney |
Australia/Tasmania |
Australia/Victoria |
Australia/West |
Australia/Yancowinna |
Brazil/Acre |
Brazil/DeNoronha |
Brazil/East |
Brazil/West |
CET |
CST6CDT |
Canada/Atlantic |
Canada/Central |
Canada/Eastern |
Canada/Mountain |
Canada/Newfoundland |
Canada/Pacific |
Canada/Saskatchewan |
Canada/Yukon |
Chile/Continental |
Chile/EasterIsland |
Cuba |
EET |
EST |
EST5EDT |
Egypt |
Eire |
Etc/GMT |
Etc/GMT+0 |
Etc/GMT+1 |
Etc/GMT+10 |
Etc/GMT+11 |
Etc/GMT+12 |
Etc/GMT+2 |
Etc/GMT+3 |
Etc/GMT+4 |
Etc/GMT+5 |
Etc/GMT+6 |
Etc/GMT+7 |
Etc/GMT+8 |
Etc/GMT+9 |
Etc/GMT-0 |
Etc/GMT-1 |
Etc/GMT-10 |
Etc/GMT-11 |
Etc/GMT-12 |
Etc/GMT-13 |
Etc/GMT-14 |
Etc/GMT-2 |
Etc/GMT-3 |
Etc/GMT-4 |
Etc/GMT-5 |
Etc/GMT-6 |
Etc/GMT-7 |
Etc/GMT-8 |
Etc/GMT-9 |
Etc/GMT0 |
Etc/Greenwich |
Etc/UCT |
Etc/UTC |
Etc/Universal |
Etc/Zulu |
Europe/Amsterdam |
Europe/Andorra |
Europe/Astrakhan |
Europe/Athens |
Europe/Belfast |
Europe/Belgrade |
Europe/Berlin |
Europe/Bratislava |
Europe/Brussels |
Europe/Bucharest |
Europe/Budapest |
Europe/Busingen |
Europe/Chisinau |
Europe/Copenhagen |
Europe/Dublin |
Europe/Gibraltar |
Europe/Guernsey |
Europe/Helsinki |
Europe/Isle_of_Man |
Europe/Istanbul |
Europe/Jersey |
Europe/Kaliningrad |
Europe/Kyiv |
Europe/Kirov |
Europe/Lisbon |
Europe/Ljubljana |
Europe/London |
Europe/Luxembourg |
Europe/Madrid |
Europe/Malta |
Europe/Mariehamn |
Europe/Minsk |
Europe/Monaco |
Europe/Moscow |
Europe/Nicosia |
Europe/Oslo |
Europe/Paris |
Europe/Podgorica |
Europe/Prague |
Europe/Riga |
Europe/Rome |
Europe/Samara |
Europe/San_Marino |
Europe/Sarajevo |
Europe/Saratov |
Europe/Simferopol |
Europe/Skopje |
Europe/Sofia |
Europe/Stockholm |
Europe/Tallinn |
Europe/Tirane |
Europe/Tiraspol |
Europe/Ulyanovsk |
Europe/Uzhgorod |
Europe/Vaduz |
Europe/Vatican |
Europe/Vienna |
Europe/Vilnius |
Europe/Volgograd |
Europe/Warsaw |
Europe/Zagreb |
Europe/Zaporozhye |
Europe/Zurich |
GB |
GB-Eire |
GMT |
GMT+0 |
GMT-0 |
GMT0 |
Greenwich |
HST |
Hongkong |
Iceland |
Indian/Antananarivo |
Indian/Chagos |
Indian/Christmas |
Indian/Cocos |
Indian/Comoro |
Indian/Kerguelen |
Indian/Mahe |
Indian/Maldives |
Indian/Mauritius |
Indian/Mayotte |
Indian/Reunion |
Iran |
Israel |
Jamaica |
Japan |
Kwajalein |
Libya |
MET |
MST |
MST7MDT |
Mexico/BajaNorte |
Mexico/BajaSur |
Mexico/General |
NZ |
NZ-CHAT |
Navajo |
PRC |
PST8PDT |
Pacific/Apia |
Pacific/Auckland |
Pacific/Bougainville |
Pacific/Chatham |
Pacific/Chuuk |
Pacific/Easter |
Pacific/Efate |
Pacific/Enderbury |
Pacific/Fakaofo |
Pacific/Fiji |
Pacific/Funafuti |
Pacific/Galapagos |
Pacific/Gambier |
Pacific/Guadalcanal |
Pacific/Guam |
Pacific/Honolulu |
Pacific/Johnston |
Pacific/Kanton |
Pacific/Kiritimati |
Pacific/Kosrae |
Pacific/Kwajalein |
Pacific/Majuro |
Pacific/Marquesas |
Pacific/Midway |
Pacific/Nauru |
Pacific/Niue |
Pacific/Norfolk |
Pacific/Noumea |
Pacific/Pago_Pago |
Pacific/Palau |
Pacific/Pitcairn |
Pacific/Pohnpei |
Pacific/Ponape |
Pacific/Port_Moresby |
Pacific/Rarotonga |
Pacific/Saipan |
Pacific/Samoa |
Pacific/Tahiti |
Pacific/Tarawa |
Pacific/Tongatapu |
Pacific/Truk |
Pacific/Wake |
Pacific/Wallis |
Pacific/Yap |
Poland |
Portugal |
ROK |
Singapore |
Türkiye |
UCT |
US/Alaska |
US/Aleutian |
US/Arizona |
US/Central |
US/East-Indiana |
US/Eastern |
US/Hawaii |
US/Indiana-Starke |
US/Michigan |
US/Mountain |
US/Pacific |
US/Samoa |
UTC |
Universal |
W-SU |
WET |
Zulu |
12 - Scalar functions
12.1 - abs()
Calculates the absolute value of the input.
Syntax
abs(
x)
Parameters
Name | Type | Required | Description |
---|---|---|---|
x | int, real, or timespan | ✔️ | The value to make absolute. |
Returns
Absolute value of x.
Example
print abs(-5)
Output
print_0 |
---|
5 |
12.2 - acos()
Calculates the angle whose cosine is the specified number. Inverse operation of cos()
.
Syntax
acos(
x)
Parameters
Name | Type | Required | Description |
---|---|---|---|
x | real | ✔️ | The value used to calculate the arc cosine. |
Returns
The value of the arc cosine of x
. The return value is null
if x
< -1 or x
> 1.
12.3 - ago()
Subtracts the given timespan from the current UTC time.
Like now()
, if you use ago()
multiple times in a single query statement, the current UTC time
being referenced is the same across all uses.
Syntax
ago(
timespan)
Parameters
Name | Type | Required | Description |
---|---|---|---|
timespan | timespan | ✔️ | The interval to subtract from the current UTC clock time now() . For a full list of possible timespan values, see timespan literals. |
Returns
A datetime value equal to the current time minus the timespan.
Example
All rows with a timestamp in the past hour:
T | where Timestamp > ago(1h)
12.4 - around() function
Creates a bool
value indicating if the first argument is within a range around the center value.
Syntax
around(
value,
center,
delta)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | int, long, real, datetime, or timespan | ✔️ | The value to compare to the center. |
center | int, long, real, datetime, or timespan | ✔️ | The center of the range defined as [(center -delta ) .. (center + delta )]. |
delta | int, long, real, datetime, or timespan | ✔️ | The delta value of the range defined as [(center -delta ) .. (center + delta )]. |
Returns
Returns true
if the value is within the range, false
if the value is outside the range.
Returns null
if any of the arguments is null
.
Example: Filtering values around a specific timestamp
The following example filters rows around specific timestamp.
range dt
from datetime(2021-01-01 01:00)
to datetime(2021-01-01 02:00)
step 1min
| where around(dt, datetime(2021-01-01 01:30), 1min)
Output
dt |
---|
2021-01-01 01:29:00.0000000 |
2021-01-01 01:30:00.0000000 |
2021-01-01 01:31:00.0000000 |
12.5 - array_concat()
Concatenates many dynamic arrays to a single array.
Syntax
array_concat(
arr [,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
arr | dynamic | ✔️ | The arrays to concatenate into a dynamic array. |
Returns
Returns a dynamic array of all input arrays.
Example
The following example shows concatenated arrays.
range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| extend a1 = pack_array(x,y,z), a2 = pack_array(x, y)
| project array_concat(a1, a2)
Output
Column1 |
---|
[1,2,4,1,2] |
[2,4,8,2,4] |
[3,6,12,3,6] |
Related content
12.6 - array_iff()
Element-wise iif function on dynamic arrays.
Syntax
array_iff(
condition_array, when_true, when_false)
Parameters
Name | Type | Required | Description |
---|---|---|---|
condition_array | dynamic | ✔️ | An array of boolean or numeric values. |
when_true | dynamic or scalar | ✔️ | An array of values or primitive value. This will be the result when condition_array is true. |
when_false | dynamic or scalar | ✔️ | An array of values or primitive value. This will be the result when condition_array is false. |
Returns
Returns a dynamic array of the values taken either from the when_true or when_false array values, according to the corresponding value of the condition array.
Examples
print condition=dynamic([true,false,true]), if_true=dynamic([1,2,3]), if_false=dynamic([4,5,6])
| extend res= array_iff(condition, if_true, if_false)
Output
condition | if_true | if_false | res |
---|---|---|---|
[true, false, true] | [1, 2, 3] | [4, 5, 6] | [1, 5, 3] |
Numeric condition values
print condition=dynamic([1,0,50]), if_true="yes", if_false="no"
| extend res= array_iff(condition, if_true, if_false)
Output
condition | if_true | if_false | res |
---|---|---|---|
[1, 0, 50] | yes | no | [yes, no, yes] |
Non-numeric and non-boolean condition values
print condition=dynamic(["some string value", datetime("01-01-2022"), null]), if_true=1, if_false=0
| extend res= array_iff(condition, if_true, if_false)
Output
condition | if_true | if_false | res |
---|---|---|---|
[true, false, true] | 1 | 0 | [null, null, null] |
Mismatched array lengths
print condition=dynamic([true,true,true]), if_true=dynamic([1,2]), if_false=dynamic([3,4])
| extend res= array_iff(condition, if_true, if_false)
Output
condition | if_true | if_false | res |
---|---|---|---|
[true, true, true] | [1, 2] | [3, 4] | [1, 2, null] |
12.7 - array_index_of()
Searches an array for the specified item, and returns its position.
Syntax
array_index_of(
array,
value [,
start [,
length [,
occurence ]]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
array | dynamic | ✔️ | The array to search. |
value | long, int, datetime, timespan, string, guid, or bool | ✔️ | The value to lookup. |
start | int | The search start position. A negative value will offset the starting search value from the end of the array by abs( start) steps. | |
length | int | The number of values to examine. A value of -1 means unlimited length. | |
occurrence | int | The number of the occurrence. The default is 1. |
Returns
Returns a zero-based index position of lookup. Returns -1 if the value isn’t found in the array. Returns null for irrelevant inputs (occurrence < 0 or length < -1).
Example
The following example shows the position number of specific words within the array.
let arr=dynamic(["this", "is", "an", "example", "an", "example"]);
print
idx1 = array_index_of(arr,"an") // lookup found in input string
, idx2 = array_index_of(arr,"example",1,3) // lookup found in researched range
, idx3 = array_index_of(arr,"example",1,2) // search starts from index 1, but stops after 2 values, so lookup can't be found
, idx4 = array_index_of(arr,"is",2,4) // search starts after occurrence of lookup
, idx5 = array_index_of(arr,"example",2,-1) // lookup found
, idx6 = array_index_of(arr, "an", 1, -1, 2) // second occurrence found in input range
, idx7 = array_index_of(arr, "an", 1, -1, 3) // no third occurrence in input array
, idx8 = array_index_of(arr, "an", -3) // negative start index will look at last 3 elements
, idx9 = array_index_of(arr, "is", -4) // negative start index will look at last 3 elements
Output
idx1 | idx2 | idx3 | idx4 | idx5 | idx6 | idx7 | idx8 | idx9 |
---|---|---|---|---|---|---|---|---|
2 | 3 | -1 | -1 | 3 | 4 | -1 | 4 | -1 |
Related content
Use set_has_element(arr
, value
) to check whether a value exists in an array. This function will improve the readability of your query. Both functions have the same performance.
12.8 - array_length()
Calculates the number of elements in a dynamic array.
Syntax
array_length(
array)
Parameters
Name | Type | Required | Description |
---|---|---|---|
array | dynamic | ✔️ | The array for which to calculate length. |
Returns
Returns the number of elements in array, or null
if array isn’t an array.
Examples
The following example shows the number of elements in the array.
print array_length(dynamic([1, 2, 3, "four"]))
Output
print_0 |
---|
4 |
12.9 - array_reverse()
Reverses the order of the elements in a dynamic array.
Syntax
array_reverse(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | dynamic | ✔️ | The array to reverse. |
Returns
Returns an array that contains the same elements as the input array in reverse order.
Example
This example shows an array of words reversed.
print arr=dynamic(["this", "is", "an", "example"])
| project Result=array_reverse(arr)
Output
Result |
---|
[“example”,“an”,“is”,“this”] |
12.10 - array_rotate_left()
Rotates values inside a dynamic
array to the left.
Syntax
array_rotate_left(
array, rotate_count)
Parameters
Name | Type | Required | Description |
---|---|---|---|
array | dynamic | ✔️ | The array to rotate. |
rotate_count | integer | ✔️ | The number of positions that array elements will be rotated to the left. If the value is negative, the elements will be rotated to the right. |
Returns
Dynamic array containing the same elements as the original array with each element rotated according to rotate_count.
Examples
Rotating to the left by two positions:
print arr=dynamic([1,2,3,4,5])
| extend arr_rotated=array_rotate_left(arr, 2)
Output
arr | arr_rotated |
---|---|
[1,2,3,4,5] | [3,4,5,1,2] |
Rotating to the right by two positions by using negative rotate_count value:
print arr=dynamic([1,2,3,4,5])
| extend arr_rotated=array_rotate_left(arr, -2)
Output
arr | arr_rotated |
---|---|
[1,2,3,4,5] | [4,5,1,2,3] |
Related content
- To rotate an array to the right, use array_rotate_right().
- To shift an array to the left, use array_shift_left().
- To shift an array to the right, use array_shift_right()
12.11 - array_rotate_right()
Rotates values inside a dynamic
array to the right.
Syntax
array_rotate_right(
array, rotate_count)
Parameters
Name | Type | Required | Description |
---|---|---|---|
array | dynamic | ✔️ | The array to rotate. |
rotate_count | integer | ✔️ | The number of positions that array elements will be rotated to the right. If the value is negative, the elements will be rotated to the Left. |
Returns
Dynamic array containing the same elements as the original array with each element rotated according to rotate_count.
Examples
Rotating to the right by two positions:
print arr=dynamic([1,2,3,4,5])
| extend arr_rotated=array_rotate_right(arr, 2)
Output
arr | arr_rotated |
---|---|
[1,2,3,4,5] | [4,5,1,2,3] |
Rotating to the left by two positions by using negative rotate_count value:
Results
print arr=dynamic([1,2,3,4,5])
| extend arr_rotated=array_rotate_right(arr, -2)
Output
arr | arr_rotated |
---|---|
[1,2,3,4,5] | [3,4,5,1,2] |
Related content
- To rotate an array to the left, use array_rotate_left().
- To shift an array to the left, use array_shift_left().
- To shift an array to the right, use array_shift_right().
12.12 - array_shift_left()
Shifts the values inside a dynamic array to the left.
Syntax
array_shift_left(
array, shift_count [,
default_value ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
array | dynamic | ✔️ | The array to shift. |
shift_count | int | ✔️ | The number of positions that array elements are shifted to the left. If the value is negative, the elements are shifted to the right. |
default_value | scalar | The value used for an element that was shifted and removed. The default is null or an empty string depending on the type of elements in the array. |
Returns
Returns a dynamic array containing the same number of elements as in the original array. Each element has been shifted according to shift_count. New elements that are added in place of removed elements have a value of default_value.
Examples
Shifting to the left by two positions:
print arr=dynamic([1,2,3,4,5])
| extend arr_shift=array_shift_left(arr, 2)
Output
arr | arr_shift |
---|---|
[1,2,3,4,5] | [3,4,5,null,null] |
Shifting to the left by two positions and adding default value:
print arr=dynamic([1,2,3,4,5])
| extend arr_shift=array_shift_left(arr, 2, -1)
Output
arr | arr_shift |
---|---|
[1,2,3,4,5] | [3,4,5,-1,-1] |
Shifting to the right by two positions by using negative shift_count value:
print arr=dynamic([1,2,3,4,5])
| extend arr_shift=array_shift_left(arr, -2, -1)
Output
arr | arr_shift |
---|---|
[1,2,3,4,5] | [-1,-1,1,2,3] |
Related content
- To shift an array to the right, use array_shift_right().
- To rotate an array to the right, use array_rotate_right().
- To rotate an array to the left, use array_rotate_left().
12.13 - array_shift_right()
Shifts the values inside a dynamic array to the right.
Syntax
array_shift_right(
array, shift_count [,
default_value ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
array | dynamic | ✔️ | The array to shift. |
shift_count | int | ✔️ | The number of positions that array elements are shifted to the right. If the value is negative, the elements are shifted to the left. |
default_value | scalar | The value used for an element that was shifted and removed. The default is null or an empty string depending on the type of elements in the array. |
Returns
Returns a dynamic array containing the same amount of the elements as in the original array. Each element has been shifted according to shift_count. New elements that are added instead of the removed elements have a value of default_value.
Examples
Shifting to the right by two positions:
print arr=dynamic([1,2,3,4,5])
| extend arr_shift=array_shift_right(arr, 2)
Output
arr | arr_shift |
---|---|
[1,2,3,4,5] | [null,null,1,2,3] |
Shifting to the right by two positions and adding a default value:
print arr=dynamic([1,2,3,4,5])
| extend arr_shift=array_shift_right(arr, 2, -1)
Output
arr | arr_shift |
---|---|
[1,2,3,4,5] | [-1,-1,1,2,3] |
Shifting to the left by two positions by using a negative shift_count value:
print arr=dynamic([1,2,3,4,5])
| extend arr_shift=array_shift_right(arr, -2, -1)
Output
arr | arr_shift |
---|---|
[1,2,3,4,5] | [3,4,5,-1,-1] |
Related content
- To shift an array to the left, use array_shift_left().
- To rotate an array to the right, use array_rotate_right().
- To rotate an array to the left, use array_rotate_left().
12.14 - array_slice()
Extracts a slice of a dynamic array.
Syntax
array_slice
(array, start, end)
Parameters
Name | Type | Required | Description |
---|---|---|---|
array | dynamic | ✔️ | The array from which to extract the slice. |
start | int | ✔️ | The start index of the slice (inclusive). Negative values are converted to array_length +start . |
end | int | ✔️ | The last index of the slice. (inclusive). Negative values are converted to array_length +end . |
Returns
Returns a dynamic array of the values in the range [start..end
] from array
.
Examples
The following examples return a slice of the array.
print arr=dynamic([1,2,3])
| extend sliced=array_slice(arr, 1, 2)
Output
arr | sliced |
---|---|
[1,2,3] | [2,3] |
print arr=dynamic([1,2,3,4,5])
| extend sliced=array_slice(arr, 2, -1)
Output
arr | sliced |
---|---|
[1,2,3,4,5] | [3,4,5] |
print arr=dynamic([1,2,3,4,5])
| extend sliced=array_slice(arr, -3, -2)
Output
arr | sliced |
---|---|
[1,2,3,4,5] | [3,4] |
12.15 - array_sort_asc()
Receives one or more arrays. Sorts the first array in ascending order. Orders the remaining arrays to match the reordered first array.
Syntax
array_sort_asc(
array1[, …, arrayN][,
nulls_last])
If nulls_last isn’t provided, a default value of true
is used.
Parameters
Name | Type | Required | Description |
---|---|---|---|
array1…arrayN | dynamic | ✔️ | The array or list of arrays to sort. |
nulls_last | bool | Determines whether null s should be last. |
Returns
Returns the same number of arrays as in the input, with the first array sorted in ascending order, and the remaining arrays ordered to match the reordered first array.
null
is returned for every array that differs in length from the first one.
An array which contains elements of different types, is sorted in the following order:
- Numeric,
datetime
, andtimespan
elements - String elements
- Guid elements
- All other elements
Examples
The examples in this section show how to use the syntax to help you get started.
Sort two arrays
The following example sorts the initial array, array1
, in ascending order. It then sorts array2
to match the new order of array1
.
let array1 = dynamic([1,3,4,5,2]);
let array2 = dynamic(["a","b","c","d","e"]);
print array_sort_asc(array1,array2)
Output
array1_sorted | array2_sorted |
---|---|
[1,2,3,4,5] | [“a”,“e”,“b”,“c”,“d”] |
Sort substrings
The following example sorts a list of names in ascending order. It saves a list of names to a variable, Names
, which is then splits into an array and sorted in ascending order. The query returns the names in ascending order.
let Names = "John,Paul,Jane,Kao";
let SortedNames = strcat_array(array_sort_asc(split(Names, ",")), ",");
print result = SortedNames
Output
result |
---|
Jane,John,Kao,Paul |
Combine summarize and array_sort_asc
The following example uses the summarize
operator and the array_sort_asc
function to organize and sort commands by user in chronological order.
datatable(command:string, command_time:datetime, user_id:string)
[
'chmod', datetime(2019-07-15), "user1",
'ls', datetime(2019-07-02), "user1",
'dir', datetime(2019-07-22), "user1",
'mkdir', datetime(2019-07-14), "user1",
'rm', datetime(2019-07-27), "user1",
'pwd', datetime(2019-07-25), "user1",
'rm', datetime(2019-07-23), "user2",
'pwd', datetime(2019-07-25), "user2",
]
| summarize timestamps = make_list(command_time), commands = make_list(command) by user_id
| project user_id, commands_in_chronological_order = array_sort_asc(timestamps, commands)[1]
Output
user_id | commands_in_chronological_order |
---|---|
user1 | [ “ls”, “mkdir”, “chmod”, “dir”, “pwd”, “rm” ] |
user2 | [ “rm”, “pwd” ] |
Control location of null
values
By default, null
values are put last in the sorted array. However, you can control it explicitly by adding a bool
value as the last argument to array_sort_asc()
.
The following example shows the default behavior:
print result=array_sort_asc(dynamic([null,"blue","yellow","green",null]))
Output
result |
---|
[“blue”,“green”,“yellow”,null,null] |
The following example shows nondefault behavior using the false
parameter, which specifies that nulls are placed at the beginning of the array.
print result=array_sort_asc(dynamic([null,"blue","yellow","green",null]), false)
Output
result |
---|
[null,null,“blue”,“green”,“yellow”] |
Related content
12.16 - array_sort_desc()
Receives one or more arrays. Sorts the first array in descending order. Orders the remaining arrays to match the reordered first array.
Syntax
array_sort_desc(
array1[, …, argumentN])
array_sort_desc(
array1[, …, argumentN],
nulls_last)
If nulls_last isn’t provided, a default value of true
is used.
Parameters
Name | Type | Required | Description |
---|---|---|---|
array1…arrayN | dynamic | ✔️ | The array or list of arrays to sort. |
nulls_last | bool | Determines whether null s should be last. |
Returns
Returns the same number of arrays as in the input, with the first array sorted in ascending order, and the remaining arrays ordered to match the reordered first array.
null
is returned for every array that differs in length from the first one.
An array which contains elements of different types, is sorted in the following order:
- Numeric,
datetime
, andtimespan
elements - String elements
- Guid elements
- All other elements
Examples
The examples in this section show how to use the syntax to help you get started.
Sort two arrays
The following example sorts the initial array, array1
, in descending order. It then sorts array2
to match the new order of array1
.
let array1 = dynamic([1,3,4,5,2]);
let array2 = dynamic(["a","b","c","d","e"]);
print array_sort_desc(array1,array2)
Output
array1_sorted | array2_sorted |
---|---|
[5,4,3,2,1] | [“d”,“c”,“b”,“e”,“a”] |
Sort substrings
The following example sorts a list of names in descending order. It saves a list of names to a variable, Names
, which is then splits into an array and sorted in descending order. The query returns the names in descending order.
let Names = "John,Paul,Jane,Kayo";
let SortedNames = strcat_array(array_sort_desc(split(Names, ",")), ",");
print result = SortedNames
Output
result |
---|
Paul,Kayo,John,Jane |
Combine summarize and array_sort_desc
The following example uses the summarize
operator and the array_sort_asc
function to organize and sort commands by user in descending chronological order.
datatable(command:string, command_time:datetime, user_id:string)
[
'chmod', datetime(2019-07-15), "user1",
'ls', datetime(2019-07-02), "user1",
'dir', datetime(2019-07-22), "user1",
'mkdir', datetime(2019-07-14), "user1",
'rm', datetime(2019-07-27), "user1",
'pwd', datetime(2019-07-25), "user1",
'rm', datetime(2019-07-23), "user2",
'pwd', datetime(2019-07-25), "user2",
]
| summarize timestamps = make_list(command_time), commands = make_list(command) by user_id
| project user_id, commands_in_chronological_order = array_sort_desc(timestamps, commands)[1]
Output
user_id | commands_in_chronological_order |
---|---|
user1 | [ “rm”, “pwd”, “dir”, “chmod”, “mkdir”, “ls” ] |
user2 | [ “pwd”, “rm” ] |
Control location of null
values
By default, null
values are put last in the sorted array. However, you can control it explicitly by adding a bool
value as the last argument to array_sort_asc()
.
The following example shows the default behavior:
print result=array_sort_desc(dynamic([null,"blue","yellow","green",null]))
Output
result |
---|
[“yellow”,“green”,“blue”,null,null] |
The following example shows nondefault behavior using the false
parameter, which specifies that nulls are placed at the beginning of the array.
print result=array_sort_desc(dynamic([null,"blue","yellow","green",null]), false)
Output
result |
---|
[null,null,“yellow”,“green”,“blue”] |
Related content
12.17 - array_split()
Splits an array to multiple arrays according to the split indices and packs the generated array in a dynamic array.
Syntax
array_split
(array, index)
Parameters
Name | Type | Required | Description |
---|---|---|---|
array | dynamic | ✔️ | The array to split. |
index | int or dynamic | ✔️ | An integer or dynamic array of integers used to indicate the location at which to split the array. The start index of arrays is zero. Negative values are converted to array_length + value . |
Returns
Returns a dynamic array containing N+1 arrays with the values in the range [0..i1), [i1..i2), ... [iN..array_length)
from array
, where N is the number of input indices and i1...iN
are the indices.
Examples
This following example shows how to split and array.
print arr=dynamic([1,2,3,4,5])
| extend arr_split=array_split(arr, 2)
Output
arr | arr_split |
---|---|
[1,2,3,4,5] | [[1,2],[3,4,5]] |
print arr=dynamic([1,2,3,4,5])
| extend arr_split=array_split(arr, dynamic([1,3]))
Output
arr | arr_split |
---|---|
[1,2,3,4,5] | [[1],[2,3],[4,5]] |
12.18 - array_sum()
Calculates the sum of elements in a dynamic array.
Syntax
array_sum
(array)
Parameters
Name | Type | Required | Description |
---|---|---|---|
array | dynamic | ✔️ | The array to sum. |
Returns
Returns a double type value with the sum of the elements of the array.
Example
This following example shows the sum of an array.
print arr=dynamic([1,2,3,4])
| extend arr_sum=array_sum(arr)
Output
arr | arr_sum |
---|---|
[1,2,3,4] | 10 |
12.19 - asin()
Calculates the angle whose sine is the specified number, or the arc sine. This is the inverse operation of sin()
.
Syntax
asin(
x)
Parameters
Name | Type | Required | Description |
---|---|---|---|
x | real | ✔️ | A real number in range [-1, 1] used to calculate the arc sine. |
Returns
Returns the value of the arc sine of x
. Returns null
if x
< -1 or x
> 1.
Example
asin(0.5)
Output
result |
---|
1.2532358975033751 |
12.20 - assert()
Checks for a condition. If the condition is false, outputs error messages and fails the query.
Syntax
assert(
condition,
message)
Parameters
Name | Type | Required | Description |
---|---|---|---|
condition | bool | ✔️ | The conditional expression to evaluate. The condition must be evaluated to constant during the query analysis phase. |
message | string | ✔️ | The message used if assertion is evaluated to false . |
Returns
Returns true
if the condition is true
.
Raises a semantic error if the condition is evaluated to false
.
Examples
The following query defines a function checkLength()
that checks input string length, and uses assert
to validate input length parameter (checks that it’s greater than zero).
let checkLength = (len:long, s:string)
{
assert(len > 0, "Length must be greater than zero") and
strlen(s) > len
};
datatable(input:string)
[
'123',
'4567'
]
| where checkLength(len=long(-1), input)
Running this query yields an error:
assert() has failed with message: 'Length must be greater than zero'
Example of running with valid len
input:
let checkLength = (len:long, s:string)
{
assert(len > 0, "Length must be greater than zero") and strlen(s) > len
};
datatable(input:string)
[
'123',
'4567'
]
| where checkLength(len=3, input)
Output
input |
---|
4567 |
The following query will always fail, demonstrating that the assert
function gets evaluated even though the where b
operator returns no data when b
is false
:
let b=false;
print x="Hello"
| where b
| where assert(b, "Assertion failed")
12.21 - atan()
Returns the angle whose tangent is the specified number. This is the inverse operation of tan()
.
Syntax
atan(
x)
Parameters
Name | Type | Required | Description |
---|---|---|---|
x | real | ✔️ | The number used to calculate the arc tangent. |
Returns
The value of the arc tangent of x
.
Example
atan(0.5)
Output
result |
---|
0.46364760900080609 |
12.22 - atan2()
Calculates the angle, in radians, between the positive x-axis and the ray from the origin to the point (y, x).
Syntax
atan2(
y,
x)
Parameters
Name | Type | Required | Description |
---|---|---|---|
y | real | ✔️ | The Y coordinate. |
x | real | ✔️ | The X coordinate. |
Returns
Returns the angle in radians between the positive x-axis and the ray from the origin to the point (y, x).
Examples
The following example returns the angle measurements in radians.
print atan2_0 = atan2(1,1) // Pi / 4 radians (45 degrees)
| extend atan2_1 = atan2(0,-1) // Pi radians (180 degrees)
| extend atan2_2 = atan2(-1,0) // - Pi / 2 radians (-90 degrees)
Output
atan2_0 | atan2_1 | atan2_2 |
---|---|---|
0.785398163397448 | 3.14159265358979 | -1.5707963267949 |
12.23 - bag_has_key()
Checks whether a dynamic property bag object contains a given key.
Syntax
bag_has_key(
bag,
key)
Parameters
Name | Type | Required | Description |
---|---|---|---|
bag | dynamic | ✔️ | The property bag to search. |
key | string | ✔️ | The key for which to search. Search for a nested key using the JSONPath notation. Array indexing isn’t supported. |
Returns
True or false depending on if the key exists in the bag.
Examples
datatable(input: dynamic)
[
dynamic({'key1' : 123, 'key2': 'abc'}),
dynamic({'key1' : 123, 'key3': 'abc'}),
]
| extend result = bag_has_key(input, 'key2')
Output
input | result |
---|---|
{ “key1”: 123, “key2”: “abc” } | true |
{ “key1”: 123, “key3”: “abc” } | false |
Search using a JSONPath key
datatable(input: dynamic)
[
dynamic({'key1': 123, 'key2': {'prop1' : 'abc', 'prop2': 'xyz'}, 'key3': [100, 200]}),
]
| extend result = bag_has_key(input, '$.key2.prop1')
Output
input | result |
---|---|
{ “key1”: 123, “key2”: { “prop1”: “abc”, “prop2”: “xyz” }, “key3”: [ 100, 200 ] } | true |
12.24 - bag_keys()
Enumerates all the root keys in a dynamic property bag object.
Syntax
bag_keys(
object)
Parameters
Name | Type | Required | Description |
---|---|---|---|
object | dynamic | ✔️ | The property bag object for which to enumerate keys. |
Returns
An array of keys, order is undetermined.
Example
datatable(index:long, d:dynamic) [
1, dynamic({'a':'b', 'c':123}),
2, dynamic({'a':'b', 'c':{'d':123}}),
3, dynamic({'a':'b', 'c':[{'d':123}]}),
4, dynamic(null),
5, dynamic({}),
6, dynamic('a'),
7, dynamic([])
]
| extend keys = bag_keys(d)
Output
index | d | keys |
---|---|---|
1 | { “a”: “b”, “c”: 123 } | [ “a”, “c” ] |
2 | { “a”: “b”, “c”: { “d”: 123 } } | [ “a”, “c” ] |
3 | { “a”: “b”, “c”: [ { “d”: 123 } ] } | [ “a”, “c” ] |
4 | ||
5 | {} | [] |
6 | a | |
7 | [] |
12.25 - bag_merge()
The function merges multiple dynamic
property bags into a single dynamic
property bag object, consolidating all properties from the input bags.
Syntax
bag_merge(
bag1,
bag2[
,*bag3*, ...])
Parameters
Name | Type | Required | Description |
---|---|---|---|
bag1…bagN | dynamic | ✔️ | The property bags to merge. The function accepts between 2 to 64 arguments. |
Returns
A dynamic
property bag containing the merged results of all input property bags. If a key is present in multiple input bags, the value associated with the key from the leftmost argument takes precedence.
Example
print result = bag_merge(
dynamic({'A1':12, 'B1':2, 'C1':3}),
dynamic({'A2':81, 'B2':82, 'A1':1}))
Output
result |
---|
{ “A1”: 12, “B1”: 2, “C1”: 3, “A2”: 81, “B2”: 82 } |
12.26 - bag_pack_columns()
Creates a dynamic property bag object from a list of columns.
Syntax
bag_pack_columns(
column1,
column2,... )
Parameters
Name | Type | Required | Description |
---|---|---|---|
column | scalar | ✔️ | A column to pack. The name of the column is the property name in the property bag. |
Returns
Returns a dynamic
property bag object from the listed columns.
Examples
The following example creates a property bag that includes the Id
and Value
columns:
datatable(Id: string, Value: string, Other: long)
[
"A", "val_a", 1,
"B", "val_b", 2,
"C", "val_c", 3
]
| extend Packed = bag_pack_columns(Id, Value)
Id | Value | Other | Packed |
---|---|---|---|
A | val_a | 1 | { “Id”: “A”, “Value”: “val_a” } |
B | val_b | 2 | { “Id”: “B”, “Value”: “val_b” } |
C | val_c | 3 | { “Id”: “C”, “Value”: “val_c” } |
|C|val_c|3|{
“Id”: “C”,
“Value”: “val_c”
}|
12.27 - bag_pack()
Creates a dynamic property bag object from a list of keys and values.
Syntax
bag_pack(
key1,
value1,
key2,
value2,... )
Parameters
Name | Type | Required | Description |
---|---|---|---|
key | string | ✔️ | The key name. |
value | any scalar data type | ✔️ | The key value. |
Returns
Returns a dynamic
property bag object from the listed key and value inputs.
Examples
Example 1
The following example creates and returns a property bag from an alternating list of keys and values.
print bag_pack("Level", "Information", "ProcessID", 1234, "Data", bag_pack("url", "www.bing.com"))
Results
print_0 |
---|
{“Level”:“Information”,“ProcessID”:1234,“Data”:{“url”:“www.bing.com”}} |
Example 2
The following example creates a property bag and extract value from property bag using ‘.’ operator.
datatable (
Source: int,
Destination: int,
Message: string
) [
1234, 100, "AA",
4567, 200, "BB",
1212, 300, "CC"
]
| extend MyBag=bag_pack("Dest", Destination, "Mesg", Message)
| project-away Source, Destination, Message
| extend MyBag_Dest=MyBag.Dest, MyBag_Mesg=MyBag.Mesg
Results
MyBag | MyBag_Dest | MyBag_Mesg |
---|---|---|
{“Dest”:100,“Mesg”:“AA”} | 100 | AA |
{“Dest”:200,“Mesg”:“BB”} | 200 | BB |
{“Dest”:300,“Mesg”:“CC”} | 300 | CC |
Example 3
The following example uses two tables, SmsMessages and MmsMessages, and returns their common columns and a property bag from the other columns. The tables are created ad-hoc as part of the query.
SmsMessages
SourceNumber | TargetNumber | CharsCount |
---|---|---|
555-555-1234 | 555-555-1212 | 46 |
555-555-1234 | 555-555-1213 | 50 |
555-555-1212 | 555-555-1234 | 32 |
MmsMessages
SourceNumber | TargetNumber | AttachmentSize | AttachmentType | AttachmentName |
---|---|---|---|---|
555-555-1212 | 555-555-1213 | 200 | jpeg | Pic1 |
555-555-1234 | 555-555-1212 | 250 | jpeg | Pic2 |
555-555-1234 | 555-555-1213 | 300 | png | Pic3 |
let SmsMessages = datatable (
SourceNumber: string,
TargetNumber: string,
CharsCount: string
) [
"555-555-1234", "555-555-1212", "46",
"555-555-1234", "555-555-1213", "50",
"555-555-1212", "555-555-1234", "32"
];
let MmsMessages = datatable (
SourceNumber: string,
TargetNumber: string,
AttachmentSize: string,
AttachmentType: string,
AttachmentName: string
) [
"555-555-1212", "555-555-1213", "200", "jpeg", "Pic1",
"555-555-1234", "555-555-1212", "250", "jpeg", "Pic2",
"555-555-1234", "555-555-1213", "300", "png", "Pic3"
];
SmsMessages
| join kind=inner MmsMessages on SourceNumber
| extend Packed=bag_pack("CharsCount", CharsCount, "AttachmentSize", AttachmentSize, "AttachmentType", AttachmentType, "AttachmentName", AttachmentName)
| where SourceNumber == "555-555-1234"
| project SourceNumber, TargetNumber, Packed
Results
SourceNumber | TargetNumber | Packed | |
---|---|---|---|
555-555-1234 | 555-555-1213 | {“CharsCount”:“50”,“AttachmentSize”:“250”,“AttachmentType”:“jpeg”,“AttachmentName”:“Pic2”} | |
555-555-1234 | 555-555-1212 | {“CharsCount”:“46”,“AttachmentSize”:“250”,“AttachmentType”:“jpeg”,“AttachmentName”:“Pic2”} | |
555-555-1234 | 555-555-1213 | {“CharsCount”:“50”,“AttachmentSize”:“300”,“AttachmentType”:“png”,“AttachmentName”:“Pic3”} | |
555-555-1234 | 555-555-1212 | {“CharsCount”:“46”,“AttachmentSize”:“300”,“AttachmentType”:“png”,“AttachmentName”:“Pic3”} |
12.28 - bag_remove_keys()
Removes keys and associated values from a dynamic
property bag.
Syntax
bag_remove_keys(
bag,
keys)
Parameters
Name | Type | Required | Description |
---|---|---|---|
bag | dynamic | ✔️ | The property bag from which to remove keys. |
keys | dynamic | ✔️ | List of keys to be removed from the input. The keys are the first level of the property bag. You can specify keys on the nested levels using JSONPath notation. Array indexing isn’t supported. |
Returns
Returns a dynamic
property bag without specified keys and their values.
Examples
datatable(input:dynamic)
[
dynamic({'key1' : 123, 'key2': 'abc'}),
dynamic({'key1' : 'value', 'key3': 42.0}),
]
| extend result=bag_remove_keys(input, dynamic(['key2', 'key4']))
Output
input | result |
---|---|
{ “key1”: 123, “key2”: “abc” } | { “key1”: 123 } |
{ “key1”: “value”, “key3”: 42.0 } | { “key1”: “value”, “key3”: 42.0 } |
Remove inner properties of dynamic values using JSONPath notation
datatable(input:dynamic)
[
dynamic({'key1': 123, 'key2': {'prop1' : 'abc', 'prop2': 'xyz'}, 'key3': [100, 200]}),
]
| extend result=bag_remove_keys(input, dynamic(['$.key2.prop1', 'key3']))
Output
input | result |
---|---|
{ “key1”: 123, “key2”: { “prop1”: “abc”, “prop2”: “xyz” }, “key3”: [ 100, 200 ] } | { “key1”: 123, “key2”: { “prop2”: “xyz” } } |
12.29 - bag_set_key()
bag_set_key() receives a dynamic
property-bag, a key and a value. The function sets the given key in the bag to the given value. The function overrides any existing value in case the key already exists.
Syntax
bag_set_key(
bag,
key,
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
bag | dynamic | ✔️ | The property bag to modify. |
key | string | ✔️ | The key to set. Either a JSON path (you can specify a key on the nested levels using JSONPath notation) or the key name for a root level key. Array indexing or root JSON paths aren’t supported. |
value | any scalar data type | ✔️ | The value to which the key is set. |
Returns
Returns a dynamic
property-bag with specified key-value pairs. If the input bag isn’t a property-bag, a null
value is returned.
Examples
Use a root-level key
datatable(input: dynamic) [
dynamic({'key1': 1, 'key2': 2}),
dynamic({'key1': 1, 'key3': 'abc'}),
]
| extend result = bag_set_key(input, 'key3', 3)
input | result |
---|---|
{ “key1”: 1, “key2”: 2 } | { “key1”: 1, “key2”: 2, “key3”: 3 } |
{ “key1”: 1, “key3”: “abc” } | { “key1”: 1, “key3”: 3 } |
Use a JSONPath key
datatable(input: dynamic)[
dynamic({'key1': 123, 'key2': {'prop1': 123, 'prop2': 'xyz'}}),
dynamic({'key1': 123})
]
| extend result = bag_set_key(input, '$.key2.prop1', 'abc')
input | result |
---|---|
{ “key1”: 123, “key2”: { “prop1”: 123, “prop2”: “xyz” } } | { “key1”: 123, “key2”: { “prop1”: “abc”, “prop2”: “xyz” } } |
{ “key1”: 123 } | { “key1”: 123, “key2”: { “prop1”: “abc” } } |
12.30 - bag_zip()
Creates a dynamic property-bag from two input dynamic arrays. In the resulting property-bag, the values from the first input array are used as the property keys, while the values from the second input array are used as corresponding property values.
Syntax
bag_zip(
KeysArray,
ValuesArray)
Parameters
Name | Type | Required | Description |
---|---|---|---|
KeysArray | dynamic | ✔️ | An array of strings. These strings represent the property names for the resulting property-bag. |
ValuesArray | dynamic | ✔️ | An array whose values will be the property values for the resulting property-bag. |
Returns
Returns a dynamic property-bag.
Examples
In the following example, the array of keys and the array of values are the same length and are zipped together into a dynamic property bag.
let Data = datatable(KeysArray: dynamic, ValuesArray: dynamic) [
dynamic(['a', 'b', 'c']), dynamic([1, '2', 3.4])
];
Data
| extend NewBag = bag_zip(KeysArray, ValuesArray)
KeysArray | ValuesArray | NewBag |
---|---|---|
[‘a’,‘b’,‘c’] | [1,‘2’,3.4] | {‘a’: 1,‘b’: ‘2’,‘c’: 3.4} |
More keys than values
In the following example, the array of keys is longer than the array of values. The missing values are filled with nulls.
let Data = datatable(KeysArray: dynamic, ValuesArray: dynamic) [
dynamic(['a', 'b', 'c']), dynamic([1, '2'])
];
Data
| extend NewBag = bag_zip(KeysArray, ValuesArray)
KeysArray | ValuesArray | NewBag |
---|---|---|
[‘a’,‘b’,‘c’] | [1,‘2’] | {‘a’: 1,‘b’: ‘2’,‘c’: null} |
More values than keys
In the following example, the array of values is longer than the array of keys. Values with no matching keys are ignored.
let Data = datatable(KeysArray: dynamic, ValuesArray: dynamic) [
dynamic(['a', 'b']), dynamic([1, '2', 2.5])
];
Data
| extend NewBag = bag_zip(KeysArray, ValuesArray)
KeysArray | ValuesArray | NewBag |
---|---|---|
[‘a’,‘b’] | [1,‘2’,2.5] | {‘a’: 1,‘b’: ‘2’} |
Non-string keys
In the following example, there are some values in they keys array that aren’t of type string. The non-string values are ignored.
let Data = datatable(KeysArray: dynamic, ValuesArray: dynamic) [
dynamic(['a', 8, 'b']), dynamic([1, '2', 2.5])
];
Data
| extend NewBag = bag_zip(KeysArray, ValuesArray)
KeysArray | ValuesArray | NewBag |
---|---|---|
[‘a’,8,‘b’] | [1,‘2’,2.5] | {‘a’: 1,‘b’: 2.5} |
Fill values with null
In the following example, the parameter that is supposed to be an array of values isn’t an array, so all values are filled with nulls.
let Data = datatable(KeysArray: dynamic, ValuesArray: dynamic) [
dynamic(['a', 8, 'b']), dynamic(1)
];
Data
| extend NewBag = bag_zip(KeysArray, ValuesArray)
KeysArray | ValuesArray | NewBag |
---|---|---|
[‘a’,8,‘b’] | 1 | {‘a’: null,‘b’: null} |
Null property-bag
In the following example, the parameter that is supposed to be an array of keys isn’t an array, so the resulting property-bag is null.
let Data = datatable(KeysArray: dynamic, ValuesArray: dynamic) [
dynamic('a'), dynamic([1, '2', 2.5])
];
Data
| extend NewBag = bag_zip(KeysArray, ValuesArray)
| extend IsNewBagEmpty=isnull(NewBag)
| KeysArray | ValuesArray | NewBag | IsNewBagEmpty | |–|–|–| | a | [1,‘2’,2.5] | | TRUE |
Related content
12.31 - base64_decode_toarray()
Decodes a base64 string to an array of long values.
Syntax
base64_decode_toarray(
base64_string)
Parameters
Name | Type | Required | Description |
---|---|---|---|
base64_string | string | ✔️ | The value to decode from base64 to an array of long values. |
Returns
Returns an array of long values decoded from a base64 string.
Example
print Quine=base64_decode_toarray("S3VzdG8=")
// 'K', 'u', 's', 't', 'o'
Output
Quine |
---|
[75,117,115,116,111] |
Related content
- To decode base64 strings to a UTF-8 string, see base64_decode_tostring()
- To encode strings to a base64 string, see base64_encode_tostring()
12.32 - base64_decode_toguid()
Decodes a base64 string to a GUID.
Syntax
base64_decode_toguid(
base64_string)
Parameters
Name | Type | Required | Description |
---|---|---|---|
base64_string | string | ✔️ | The value to decode from base64 to a GUID. |
Returns
Returns a GUID decoded from a base64 string.
Example
print Quine = base64_decode_toguid("JpbpECu8dUy7Pv5gbeJXAA==")
Output
Quine |
---|
10e99626-bc2b-754c-bb3e-fe606de25700 |
If you try to decode an invalid base64 string, “null” will be returned:
print Empty = base64_decode_toguid("abcd1231")
Related content
To encode a GUID to a base64 string, see base64_encode_fromguid().
12.33 - base64_decode_tostring()
Decodes a base64 string to a UTF-8 string.
Syntax
base64_decode_tostring(
base64_string)
Parameters
Name | Type | Required | Description |
---|---|---|---|
base64_string | string | ✔️ | The value to decode from base64 to UTF-8 string. |
Returns
Returns UTF-8 string decoded from base64 string.
Example
print Quine=base64_decode_tostring("S3VzdG8=")
Output
Quine |
---|
Kusto |
Trying to decode a base64 string that was generated from invalid UTF-8 encoding returns null:
print Empty=base64_decode_tostring("U3RyaW5n0KHR0tGA0L7Rh9C60LA=")
Output
Empty |
---|
Related content
- To decode base64 strings to an array of long values, see base64_decode_toarray()
- To encode strings to base64 string, see base64_encode_tostring()
12.34 - base64_encode_fromarray()
Encodes a base64 string from a bytes array.
Syntax
base64_encode_fromarray(
base64_string_decoded_as_a_byte_array)
Parameters
Name | Type | Required | Description |
---|---|---|---|
base64_string_decoded_as_a_byte_array | dynamic | ✔️ | The bytes (integer) array to be encoded into a base64 string. |
Returns
Returns the base64 string encoded from the bytes array. Note that byte is an integer type.
Examples
let bytes_array = toscalar(print base64_decode_toarray("S3VzdG8="));
print decoded_base64_string = base64_encode_fromarray(bytes_array)
Output
decoded_base64_string |
---|
S3VzdG8= |
Trying to encode a base64 string from an invalid bytes array that was generated from invalid UTF-8 encoded string will return null:
let empty_bytes_array = toscalar(print base64_decode_toarray("U3RyaW5n0KHR0tGA0L7Rh9C60LA"));
print empty_string = base64_encode_fromarray(empty_bytes_array)
Output
empty_string |
---|
Related content
- For decoding base64 strings to a UTF-8 string, see base64_decode_tostring()
- For encoding strings to a base64 string see base64_encode_tostring()
- This function is the inverse of base64_decode_toarray()
12.35 - base64_encode_fromguid()
Encodes a GUID to a base64 string.
Syntax
base64_encode_fromguid(
guid)
Parameters
Name | Type | Required | Description |
---|---|---|---|
guid | guid | ✔️ | The value to encode to a base64 string. |
Returns
Returns a base64 string encoded from a GUID.
Example
print Quine = base64_encode_fromguid(toguid("ae3133f2-6e22-49ae-b06a-16e6a9b212eb"))
Output
Quine |
---|
8jMxriJurkmwahbmqbIS6w== |
If you try to encode anything that isn’t a GUID as below, an error will be thrown:
print Empty = base64_encode_fromguid("abcd1231")
Related content
- To decode a base64 string to a GUID, see base64_decode_toguid().
- To create a GUID from a string, see toguid().
12.36 - base64_encode_tostring()
Encodes a string as base64 string.
Syntax
base64_encode_tostring(
string)
Parameters
Name | Type | Required | Description |
---|---|---|---|
string | string | ✔️ | The value to encode as a base64 string. |
Returns
Returns string encoded as a base64 string.
Example
print Quine=base64_encode_tostring("Kusto")
Output
Quine |
---|
S3VzdG8= |
Related content
- To decode base64 strings to UTF-8 strings, see base64_decode_tostring().
- To decode base64 strings to an array of long values, see base64_decode_toarray().
12.37 - beta_cdf()
Returns the standard cumulative beta distribution function.
If probability = beta_cdf(
x,…)
, then beta_inv(
probability,…)
= x.
The beta distribution is commonly used to study variation in the percentage of something across samples, such as the fraction of the day people spend watching television.
Syntax
beta_cdf(
x,
alpha,
beta)
Parameters
Name | Type | Required | Description |
---|---|---|---|
x | int, long, or real | ✔️ | A value at which to evaluate the function. |
alpha | int, long, or real | ✔️ | A parameter of the distribution. |
beta | int, long, or real | ✔️ | A parameter of the distribution. |
Returns
The cumulative beta distribution function.
Examples
datatable(x:double, alpha:double, beta:double, comment:string)
[
0.9, 10.0, 20.0, "Valid input",
1.5, 10.0, 20.0, "x > 1, yields NaN",
double(-10), 10.0, 20.0, "x < 0, yields NaN",
0.1, double(-1.0), 20.0, "alpha is < 0, yields NaN"
]
| extend b = beta_cdf(x, alpha, beta)
Output
x | alpha | beta | comment | b |
---|---|---|---|---|
0.9 | 10 | 20 | Valid input | 0.999999999999959 |
1.5 | 10 | 20 | x > 1, yields NaN | NaN |
-10 | 10 | 20 | x < 0, yields NaN | NaN |
0.1 | -1 | 20 | alpha is < 0, yields NaN | NaN |
Related content
- For computing the inverse of the beta cumulative probability density function, see beta-inv().
- For computing probability density function, see beta-pdf().
12.38 - beta_inv()
Returns the inverse of the beta cumulative probability density function.
If probability = beta_cdf(
x,…)
, then beta_inv(
probability,…)
= x.
The beta distribution can be used in project planning to model probable completion times given an expected completion time and variability.
Syntax
beta_inv(
probability,
alpha,
beta)
Parameters
Name | Type | Required | Description |
---|---|---|---|
probability | int, long, or real | ✔️ | A probability associated with the beta distribution. |
alpha | int, long, or real | ✔️ | A parameter of the distribution. |
beta | int, long, or real | ✔️ | A parameter of the distribution. |
Returns
The inverse of the beta cumulative probability density function beta_cdf()
Examples
datatable(p:double, alpha:double, beta:double, comment:string)
[
0.1, 10.0, 20.0, "Valid input",
1.5, 10.0, 20.0, "p > 1, yields null",
0.1, double(-1.0), 20.0, "alpha is < 0, yields NaN"
]
| extend b = beta_inv(p, alpha, beta)
Output
p | alpha | beta | comment | b |
---|---|---|---|---|
0.1 | 10 | 20 | Valid input | 0.226415022388749 |
1.5 | 10 | 20 | p > 1, yields null | |
0.1 | -1 | 20 | alpha is < 0, yields NaN | NaN |
Related content
- For computing cumulative beta distribution function, see beta-cdf().
- For computing probability beta density function, see beta-pdf().
12.39 - beta_pdf()
Returns the probability density beta function.
The beta distribution is commonly used to study variation in the percentage of something across samples, such as the fraction of the day people spend watching television.
Syntax
beta_pdf(
x,
alpha,
beta)
Parameters
Name | Type | Required | Description |
---|---|---|---|
x | int, long, or real | ✔️ | A value at which to evaluate the function. |
alpha | int, long, or real | ✔️ | A parameter of the distribution. |
beta | int, long, or real | ✔️ | A parameter of the distribution. |
Returns
The probability beta density function.
Examples
datatable(x:double, alpha:double, beta:double, comment:string)
[
0.5, 10.0, 20.0, "Valid input",
1.5, 10.0, 20.0, "x > 1, yields NaN",
double(-10), 10.0, 20.0, "x < 0, yields NaN",
0.1, double(-1.0), 20.0, "alpha is < 0, yields NaN"
]
| extend r = beta_pdf(x, alpha, beta)
Output
x | alpha | beta | comment | r |
---|---|---|---|---|
0.5 | 10 | 20 | Valid input | 0.746176019310951 |
1.5 | 10 | 20 | x > 1, yields NaN | NaN |
-10 | 10 | 20 | x < 0, yields NaN | NaN |
0.1 | -1 | 20 | alpha is < 0, yields NaN | NaN |
Related content
- For computing the inverse of the beta cumulative probability density function, see beta-inv().
- For the standard cumulative beta distribution function, see beta-cdf().
12.40 - bin_at()
Returns the value rounded down to the nearest bin size, which is aligned to a fixed reference point.
In contrast to the bin() function, where the point of alignment is predefined, bin_at() allows you to define a fixed point for alignment. Results can align before or after the fixed point.
Syntax
bin_at
(
value,
bin_size,
fixed_point)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | int , long , real , timespan , or datetime | ✔️ | The value to round. |
bin_size | int , long , real , or timespan | ✔️ | The size of each bin. |
fixed_point | int , long , real , timespan , or datetime | ✔️ | A constant of the same type as value, which is used as a fixed reference point. |
Returns
The nearest multiple of bin_size below the given value that aligns to the specified fixed_point.
Examples
In the following example, value is rounded down to the nearest bin_size that aligns to the fixed_point.
print bin_at(6.5, 2.5, 7)
Output
print_0 |
---|
4.5 |
In the following example, the time interval is binned into daily bins aligned to a 12 hour fixed point. The return value is -12 since a daily bin aligned to 12 hours rounds down to 12 on the previous day.
print bin_at(time(1h), 1d, 12h)
Output
print_0 |
---|
-12:00:00 |
In the following example, daily bins align to noon.
print bin_at(datetime(2017-05-15 10:20:00.0), 1d, datetime(1970-01-01 12:00:00.0))
Output
print_0 |
---|
2017-05-14T12:00:00Z |
In the following example, bins are weekly and align to the start of Sunday June 6, 2017. The example returns a bin aligned to Sundays.
print bin_at(datetime(2017-05-17 10:20:00.0), 7d, datetime(2017-06-04 00:00:00.0))
Output
print_0 |
---|
2017-05-14T00:00:00Z |
In the following example, the total number of events are grouped into daily bins aligned to the fixed_point date and time. The fixed_point value is included in one of the returned bins.
datatable(Date:datetime, NumOfEvents:int)[
datetime(2018-02-24T15:14),3,
datetime(2018-02-24T15:24),4,
datetime(2018-02-23T16:14),4,
datetime(2018-02-23T17:29),4,
datetime(2018-02-26T15:14),5]
| summarize TotalEvents=sum(NumOfEvents) by bin_at(Date, 1d, datetime(2018-02-24 15:14:00.0000000))
Output
Date | TotalEvents |
---|---|
2018-02-23T15:14:00Z | 8 |
2018-02-24T15:14:00Z | 7 |
2018-02-26T15:14:00Z | 5 |
Related content
12.41 - bin_auto()
Rounds values down to a fixed-size bin, with control over the bin size and starting point provided by a query property.
Syntax
bin_auto
(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | int, long, real, timespan, or datetime | ✔️ | The value to round into bins. |
To control the bin size and starting point, set the following parameters before using the function.
Name | Type | Required | Description |
---|---|---|---|
query_bin_auto_size | int, long, real, or timespan | ✔️ | Indicates the size of each bin. |
query_bin_auto_at | int, long, real, or timespan | Indicates one value of value which is a “fixed point” for which bin_auto(fixed_point) == fixed_point . Default is 0. |
Returns
The nearest multiple of query_bin_auto_size
below value, shifted so that query_bin_auto_at
will be translated into itself.
Examples
set query_bin_auto_size=1h;
set query_bin_auto_at=datetime(2017-01-01 00:05);
range Timestamp from datetime(2017-01-01 00:05) to datetime(2017-01-01 02:00) step 1m
| summarize count() by bin_auto(Timestamp)
Output
Timestamp | count_ |
---|---|
2017-01-01 00:05:00.0000000 | 60 |
2017-01-01 01:05:00.0000000 | 56 |
12.42 - bin()
Rounds values down to an integer multiple of a given bin size.
Used frequently in combination with summarize by ...
.
If you have a scattered set of values, they’ll be grouped into a smaller set of specific values.
Syntax
bin(
value,
roundTo)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | int, long, real, timespan, or datetime | ✔️ | The value to round down. |
roundTo | int, long, real, or timespan | ✔️ | The “bin size” that divides value. |
Returns
The nearest multiple of roundTo below value. Null values, a null bin size, or a negative bin size will result in null.
Examples
Numeric bin
print bin(4.5, 1)
Output
print_0 |
---|
4 |
Timespan bin
print bin(time(16d), 7d)
Output
print_0 |
---|
14:00:00:00 |
Datetime bin
print bin(datetime(1970-05-11 13:45:07), 1d)
Output
print_0 |
---|
1970-05-11T00:00:00Z |
Pad a table with null bins
When there are rows for bins with no corresponding row in the table, we recommend to pad the table with those bins. The following query looks at strong wind storm events in California for a week in April. However, there are no events on some of the days.
let Start = datetime('2007-04-07');
let End = Start + 7d;
StormEvents
| where StartTime between (Start .. End)
| where State == "CALIFORNIA" and EventType == "Strong Wind"
| summarize PropertyDamage=sum(DamageProperty) by bin(StartTime, 1d)
Output
StartTime | PropertyDamage |
---|---|
2007-04-08T00:00:00Z | 3000 |
2007-04-11T00:00:00Z | 1000 |
2007-04-12T00:00:00Z | 105000 |
In order to represent the full week, the following query pads the result table with null values for the missing days. Here’s a step-by-step explanation of the process:
- Use the
union
operator to add more rows to the table. - The
range
operator produces a table that has a single row and column. - The
mv-expand
operator over therange
function creates as many rows as there are bins betweenStartTime
andEndTime
. - Use a
PropertyDamage
of0
. - The
summarize
operator groups together bins from the original table to the table produced by theunion
expression. This process ensures that the output has one row per bin whose value is either zero or the original count.
let Start = datetime('2007-04-07');
let End = Start + 7d;
StormEvents
| where StartTime between (Start .. End)
| where State == "CALIFORNIA" and EventType == "Strong Wind"
| union (
range x from 1 to 1 step 1
| mv-expand StartTime=range(Start, End, 1d) to typeof(datetime)
| extend PropertyDamage=0
)
| summarize PropertyDamage=sum(DamageProperty) by bin(StartTime, 1d)
Output
StartTime | PropertyDamage |
---|---|
2007-04-07T00:00:00Z | 0 |
2007-04-08T00:00:00Z | 3000 |
2007-04-09T00:00:00Z | 0 |
2007-04-10T00:00:00Z | 0 |
2007-04-11T00:00:00Z | 1000 |
2007-04-12T00:00:00Z | 105000 |
2007-04-13T00:00:00Z | 0 |
2007-04-14T00:00:00Z | 0 |
12.43 - binary_and()
Returns a result of the bitwise AND
operation between two values.
Syntax
binary_and(
value1,
value2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value1 | long | ✔️ | The left-hand value of the bitwise AND operation. |
value2 | long | ✔️ | The right-hand value of the bitwise AND operation. |
Returns
Returns logical AND
operation on a pair of numbers: value1 & value2.
12.44 - binary_not()
Returns a bitwise negation of the input value.
Syntax
binary_not(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | long | ✔️ | The value to negate. |
Returns
Returns logical NOT operation on a number: value.
Example
binary_not(100)
Output
result |
---|
-101 |
12.45 - binary_or()
Returns a result of the bitwise or
operation of the two values.
Syntax
binary_or(
value1,
value2 )
Parameters
Name | Type | Required | Description |
---|---|---|---|
value1 | long | ✔️ | The left-hand value of the bitwise OR operation. |
value2 | long | ✔️ | The right-hand value of the bitwise OR operation. |
Returns
Returns logical OR operation on a pair of numbers: value1 | value2.
12.46 - binary_shift_left()
Returns binary shift left operation on a pair of numbers.
Syntax
binary_shift_left(
value,
shift)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | int | ✔️ | The value to shift left. |
shift | int | ✔️ | The number of bits to shift left. |
Returns
Returns binary shift left operation on a pair of numbers: value « (shift%64). If n is negative, a NULL value is returned.
Example
binary_shift_left(1,2)
Output
Result |
---|
4 |
12.47 - binary_shift_right()
Returns binary shift right operation on a pair of numbers.
Syntax
binary_shift_right(
value,
shift)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | int | ✔️ | The value to shift right. |
shift | int | ✔️ | The number of bits to shift right. |
Returns
Returns binary shift right operation on a pair of numbers: value » (shift%64). If n is negative, a NULL value is returned.
Examples
binary_shift_right(1,2)
Output
Result |
---|
0 |
12.48 - binary_xor()
Returns a result of the bitwise xor
operation of the two values.
Syntax
binary_xor(
value1,
value2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value1 | int | ✔️ | The left-side value of the XOR operation. |
value2 | int | ✔️ | The right-side value of the XOR operation. |
Returns
Returns logical XOR operation on a pair of numbers: value1 ^ value2.
Examples
binary_xor(1,1)
Output
Result |
---|
0 |
binary_xor(1,2)
Output
Result |
---|
3 |
12.49 - bitset_count_ones()
Returns the number of set bits in the binary representation of a number.
Syntax
bitset_count_ones(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | int | ✔️ | The value for which to calculate the number of set bits. |
Returns
Returns the number of set bits in the binary representation of a number.
Example
// 42 = 32+8+2 : b'00101010' == 3 bits set
print ones = bitset_count_ones(42)
Output
ones |
---|
3 |
12.50 - case()
Evaluates a list of predicates and returns the first result expression whose predicate is satisfied.
If none of the predicates return true
, the result of the else
expression is returned.
All predicate
arguments must be expressions that evaluate to a boolean
value.
All then
arguments and the else
argument must be of the same type.
Syntax
case(
predicate_1, then_1,
[predicate_2, then_2, …]
else)
Parameters
Name | Type | Required | Description |
---|---|---|---|
predicate | string | ✔️ | An expression that evaluates to a boolean value. |
then | string | ✔️ | An expression that gets evaluated and its value is returned from the function if predicate is the first predicate that evaluates to true . |
else | string | ✔️ | An expression that gets evaluated and its value is returned from the function if neither of the predicate_i evaluate to true . |
Returns
The value of the first then_i whose predicate_i evaluates to true
, or the value of else if neither of the predicates are satisfied.
Example
range Size from 1 to 15 step 2
| extend bucket = case(Size <= 3, "Small",
Size <= 10, "Medium",
"Large")
Output
Size | bucket |
---|---|
1 | Small |
3 | Small |
5 | Medium |
7 | Medium |
9 | Medium |
11 | Large |
13 | Large |
15 | Large |
12.51 - ceiling()
Calculates the smallest integer greater than, or equal to, the specified numeric expression.
Syntax
ceiling(
number)
Parameters
Name | Type | Required | Description |
---|---|---|---|
number | int, long, or real | ✔️ | The value to round up. |
Returns
The smallest integer greater than, or equal to, the specified numeric expression.
Examples
print c1 = ceiling(-1.1), c2 = ceiling(0), c3 = ceiling(0.9)
Output
c1 | c2 | c3 |
---|---|---|
-1 | 0 | 1 |
12.52 - coalesce()
Evaluates a list of expressions and returns the first non-null (or non-empty for string) expression.
Syntax
coalesce(
arg,
arg_2,[
arg_3,...])
Parameters
Name | Type | Required | Description |
---|---|---|---|
arg | scalar | ✔️ | The expression to be evaluated. |
Returns
The value of the first arg whose value isn’t null (or not-empty for string expressions).
Example
print result=coalesce(tolong("not a number"), tolong("42"), 33)
Output
result |
---|
42 |
12.53 - column_ifexists()
Displays the column, if the column exists. Otherwise, it returns the default column.
Syntax
column_ifexists(
columnName,
defaultValue)
Parameters
Name | Type | Required | Description |
---|---|---|---|
columnName | string | ✔️ | The name of the column to return. |
defaultValue | scalar | ✔️ | The default column to return if columnName doesn’t exist in the table. This value can be any scalar expression. For example, a reference to another column. |
Returns
If columnName exists, then returns the column. Otherwise, it returns the defaultValue column.
Example
This example returns the default State column, because a column named Capital doesn’t exist in the StormEvents table.
StormEvents | project column_ifexists("Capital", State)
Output
This output shows the first 10 rows of the default State column.
State |
---|
ATLANTIC SOUTH |
FLORIDA |
FLORIDA |
GEORGIA |
MISSISSIPPI |
MISSISSIPPI |
MISSISSIPPI |
MISSISSIPPI |
AMERICAN SAMOA |
KENTUCKY |
… |
12.54 - convert_angle()
Convert an angle value from one unit to another.
Syntax
convert_angle(
value,
from,
to)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | real | ✔️ | The value to be converted. |
from | string | ✔️ | The unit to convert from. For possible values, see Conversion units. |
to | string | ✔️ | The unit to convert to. For possible values, see Conversion units. |
Conversion units
- Arcminute
- Arcsecond
- Centiradian
- Deciradian
- Degree
- Gradian
- Microdegree
- Microradian
- Millidegree
- Milliradian
- Nanodegree
- Nanoradian
- NatoMil
- Radian
- Revolution
- Tilt
Returns
Returns the input value converted from one angle unit to another. Invalid units return null
.
Example
print result = convert_angle(1.2, 'Degree', 'Arcminute')
Output
result |
---|
72 |
12.55 - convert_energy()
Convert an energy value from one unit to another.
Syntax
convert_energy(
value,
from,
to)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | real | ✔️ | The value to be converted. |
from | string | ✔️ | The unit to convert from. For possible values, see Conversion units. |
to | string | ✔️ | The unit to convert to. For possible values, see Conversion units. |
Conversion units
- BritishThermalUnit
- Calorie
- DecathermEc
- DecathermImperial
- DecathermUs
- ElectronVolt
- Erg
- FootPound
- GigabritishThermalUnit
- GigaelectronVolt
- Gigajoule
- GigawattDay
- GigawattHour
- HorsepowerHour
- Joule
- KilobritishThermalUnit
- Kilocalorie
- KiloelectronVolt
- Kilojoule
- KilowattDay
- KilowattHour
- MegabritishThermalUnit
- Megacalorie
- MegaelectronVolt
- Megajoule
- MegawattDay
- MegawattHour
- Millijoule
- TeraelectronVolt
- TerawattDay
- TerawattHour
- ThermEc
- ThermImperial
- ThermUs
- WattDay
- WattHour
Returns
Returns the input value converted from one energy unit to another. Invalid units return null
.
Example
print result = convert_energy(1.2, 'Joule', 'BritishThermalUnit')
Output
result |
---|
0.00113738054437598 |
12.56 - convert_force()
Convert a force value from one unit to another.
Syntax
convert_force(
value,
from,
to)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | real | ✔️ | The value to be converted. |
from | string | ✔️ | The unit to convert from. For possible values, see Conversion units. |
to | string | ✔️ | The unit to convert to. For possible values, see Conversion units. |
Conversion units
- Decanewton
- Dyn
- KilogramForce
- Kilonewton
- KiloPond
- KilopoundForce
- Meganewton
- Micronewton
- Millinewton
- Newton
- OunceForce
- Poundal
- PoundForce
- ShortTonForce
- TonneForce
Returns
Returns the input value converted from one force unit to another. Invalid units return null
.
Example
print result = convert_force(1.2, 'Newton', 'Decanewton')
Output
result |
---|
0.12 |
12.57 - convert_length()
Convert a length value from one unit to another.
Syntax
convert_length(
value,
from,
to)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | real | ✔️ | The value to be converted. |
from | string | ✔️ | The unit to convert from. For possible values, see Conversion units. |
to | string | ✔️ | The unit to convert to. For possible values, see Conversion units. |
Conversion units
- Angstrom
- AstronomicalUnit
- Centimeter
- Chain
- DataMile
- Decameter
- Decimeter
- DtpPica
- DtpPoint
- Fathom
- Foot
- Hand
- Hectometer
- Inch
- KilolightYear
- Kilometer
- Kiloparsec
- LightYear
- MegalightYear
- Megaparsec
- Meter
- Microinch
- Micrometer
- Mil
- Mile
- Millimeter
- Nanometer
- NauticalMile
- Parsec
- PrinterPica
- PrinterPoint
- Shackle
- SolarRadius
- Twip
- UsSurveyFoot
- Yard
Returns
Returns the input value converted from one length unit to another. Invalid units return null
.
Example
print result = convert_length(1.2, 'Meter', 'Foot')
Output
result |
---|
3.93700787401575 |
12.58 - convert_mass()
Convert a mass value from one unit to another.
Syntax
convert_mass(
value,
from,
to)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | real | ✔️ | The value to be converted. |
from | string | ✔️ | The unit to convert from. For possible values, see Conversion units. |
to | string | ✔️ | The unit to convert to. For possible values, see Conversion units. |
Conversion units
- Centigram
- Decagram
- Decigram
- EarthMass
- Grain
- Gram
- Hectogram
- Kilogram
- Kilopound
- Kilotonne
- LongHundredweight
- LongTon
- Megapound
- Megatonne
- Microgram
- Milligram
- Nanogram
- Ounce
- Pound
- ShortHundredweight
- ShortTon
- Slug
- SolarMass
- Stone
- Tonne
Returns
Returns the input value converted from one mass unit to another. Invalid units return null
.
Example
print result = convert_mass(1.2, 'Kilogram', 'Pound')
Output
result |
---|
2.64554714621853 |
12.59 - convert_speed()
Convert a speed value from one unit to another.
Syntax
convert_speed(
value,
from,
to)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | real | ✔️ | The value to be converted. |
from | string | ✔️ | The unit to convert from. For possible values, see Conversion units. |
to | string | ✔️ | The unit to convert to. For possible values, see Conversion units. |
Conversion units
- CentimeterPerHour
- CentimeterPerMinute
- CentimeterPerSecond
- DecimeterPerMinute
- DecimeterPerSecond
- FootPerHour
- FootPerMinute
- FootPerSecond
- InchPerHour
- InchPerMinute
- InchPerSecond
- KilometerPerHour
- KilometerPerMinute
- KilometerPerSecond
- Knot
- MeterPerHour
- MeterPerMinute
- MeterPerSecond
- MicrometerPerMinute
- MicrometerPerSecond
- MilePerHour
- MillimeterPerHour
- MillimeterPerMinute
- MillimeterPerSecond
- NanometerPerMinute
- NanometerPerSecond
- UsSurveyFootPerHour
- UsSurveyFootPerMinute
- UsSurveyFootPerSecond
- YardPerHour
- YardPerMinute
- YardPerSecond
Returns
Returns the input value converted from one speed unit to another. Invalid units return null
.
Example
print result = convert_speed(1.2, 'MeterPerSecond', 'CentimeterPerHour')
Output
result |
---|
432000 |
12.60 - convert_temperature()
Convert a temperature value from one unit to another.
Syntax
convert_temperature(
value,
from,
to)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | real | ✔️ | The value to be converted. |
from | string | ✔️ | The unit to convert from. For possible values, see Conversion units. |
to | string | ✔️ | The unit to convert to. For possible values, see Conversion units. |
Conversion units
- DegreeCelsius
- DegreeDelisle
- DegreeFahrenheit
- DegreeNewton
- DegreeRankine
- DegreeReaumur
- DegreeRoemer
- Kelvin
- MillidegreeCelsius
- SolarTemperature
Returns
Returns the input value converted from one temperature unit to another. Invalid units return null
.
Example
print result = convert_temperature(1.2, 'Kelvin', 'DegreeCelsius')
Output
result |
---|
-271.95 |
12.61 - convert_volume()
Convert a volume value from one unit to another.
Syntax
convert_volume(
value,
from,
to)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | real | ✔️ | The value to be converted. |
from | string | ✔️ | The unit to convert from. For possible values, see Conversion units. |
to | string | ✔️ | The unit to convert to. For possible values, see Conversion units. |
Conversion units
- AcreFoot
- AuTablespoon
- BoardFoot
- Centiliter
- CubicCentimeter
- CubicDecimeter
- CubicFoot
- CubicHectometer
- CubicInch
- CubicKilometer
- CubicMeter
- CubicMicrometer
- CubicMile
- CubicMillimeter
- CubicYard
- Decaliter
- DecausGallon
- Deciliter
- DeciusGallon
- HectocubicFoot
- HectocubicMeter
- Hectoliter
- HectousGallon
- ImperialBeerBarrel
- ImperialGallon
- ImperialOunce
- ImperialPint
- KilocubicFoot
- KilocubicMeter
- KiloimperialGallon
- Kiloliter
- KilousGallon
- Liter
- MegacubicFoot
- MegaimperialGallon
- Megaliter
- MegausGallon
- MetricCup
- MetricTeaspoon
- Microliter
- Milliliter
- OilBarrel
- UkTablespoon
- UsBeerBarrel
- UsCustomaryCup
- UsGallon
- UsLegalCup
- UsOunce
- UsPint
- UsQuart
- UsTablespoon
- UsTeaspoon
Returns
Returns the input value converted from one volume unit to another. Invalid units return null
.
Example
print result = convert_volume(1.2, 'CubicMeter', 'AcreFoot')
Output
result |
---|
0.0009728568 |
12.62 - cos()
Returns the cosine function value of the specified angle. The angle is specified in radians.
Syntax
cos(
number)
Parameters
Name | Type | Required | Description |
---|---|---|---|
number | real | ✔️ | The value in radians for which to calculate the cosine. |
Returns
The cosine of number of radians.
Example
print cos(1)
Output
result |
---|
0.54030230586813977 |
12.63 - cot()
Calculates the trigonometric cotangent of the specified angle, in radians.
Syntax
cot(
number)
Parameters
Name | Type | Required | Description |
---|---|---|---|
number | real | ✔️ | The value for which to calculate the cotangent. |
Returns
The cotangent function value for number.
Example
print cot(1)
Output
result |
---|
0.64209261593433065 |
12.64 - countof()
Counts occurrences of a substring in a string. Plain string matches may overlap; regex matches don’t.
Syntax
countof(
source,
search [,
kind])
Parameters
Name | Type | Required | Description |
---|---|---|---|
source | string | ✔️ | The value to search. |
search | string | ✔️ | The value or regular expression to match inside source. |
kind | string | The value normal or regex . The default is normal . |
Returns
The number of times that the search value can be matched in the source string. Plain string matches may overlap; regex matches don’t.
Examples
Function call | Result |
---|---|
countof("aaa", "a") | 3 |
countof("aaaa", "aa") | 3 (not 2!) |
countof("ababa", "ab", "normal") | 2 |
countof("ababa", "aba") | 2 |
countof("ababa", "aba", "regex") | 1 |
countof("abcabc", "a.c", "regex") | 2 |
12.65 - current_cluster_endpoint()
Returns the network endpoint (DNS name) of the current cluster being queried.
Returns the network endpoint (DNS name) of the current Eventhouse being queried.
Syntax
current_cluster_endpoint()
Returns
The network endpoint (DNS name) of the current cluster being queried, as a value of type string
.
The network endpoint (DNS name) of the current Eventhouse being queried, as a value of type string
.
Example
print strcat("This query executed on: ", current_cluster_endpoint())
12.66 - current_database()
Returns the name of the database in scope (database that all query entities are resolved against if no other database is specified).
Syntax
current_database()
Returns
The name of the database in scope as a value of type string
.
Example
print strcat("Database in scope: ", current_database())
12.67 - current_principal_details()
Returns details of the principal running the query.
Syntax
current_principal_details()
Returns
The details of the current principal as a dynamic. The following table describes the returned fields.
Field | Description |
---|---|
UserPrincipalName | The sign-in identifier for users. For more information, see UPN. |
IdentityProvider | The source that validates the identity of the principal. |
Authority | The Microsoft Entra tenant ID. |
Mfa | Indicates the use of multifactor authentication. For more information, see Access token claims reference. |
Type | The category of the principal: aaduser , aadapp , or aadgroup . |
DisplayName | The user-friendly name for the principal that is displayed in the UI. |
ObjectId | The Microsoft Entra object ID for the principal. |
FQN | The Fully Qualified Name (FQN) of the principal. Valuable for security role management commands. For more information, see Referencing security principals. |
Country | The user’s country or region. This property is returned if the information is present. The value is a standard two-letter country or region code, for example, FR, JP, and SZ. |
TenantCountry | The resource tenant’s country or region, set at a tenant level by an admin. This property is returned if the information is present. The value is a standard two-letter country or region code, for example, FR, JP, and SZ. |
TenantRegion | The region of the resource tenant. This property is returned if the information is present. The value is a standard two-letter country or region code, for example, FR, JP, and SZ. |
Example
print details=current_principal_details()
Example output
details |
---|
{ “Country”: “DE”, “TenantCountry”: “US”, “TenantRegion”: “WW”, “UserPrincipalName”: “user@fabrikam.com”, “IdentityProvider”: “https://sts.windows.net”, “Authority”: “aaaabbbb-0000-cccc-1111-dddd2222eeee”, “Mfa”: “True”, “Type”: “AadUser”, “DisplayName”: “James Smith (upn: user@fabrikam.com)”, “ObjectId”: “aaaaaaaa-0000-1111-2222-bbbbbbbbbbbb”, “FQN”: null, “Notes”: null } |
12.68 - current_principal_is_member_of()
Checks group membership or principal identity of the current principal running the query.
Syntax
current_principal_is_member_of(
group)
Parameters
Name | Type | Required | Description |
---|---|---|---|
group | dynamic | ✔️ | An array of string literals in which each literal represents a Microsoft Entra principal. See examples for Microsoft Entra principals. |
Returns
The function returns true
if the current principal running the query is successfully matched for at least one input argument. If not, the function returns false
.
Examples
print result=current_principal_is_member_of(
'aaduser=user1@fabrikam.com',
'aadgroup=group1@fabrikam.com',
'aadapp=66ad1332-3a94-4a69-9fa2-17732f093664;72f988bf-86f1-41af-91ab-2d7cd011db47'
)
Output
result |
---|
false |
Using dynamic array instead of multiple arguments:
print result=current_principal_is_member_of(
dynamic([
'aaduser=user1@fabrikam.com',
'aadgroup=group1@fabrikam.com',
'aadapp=66ad1332-3a94-4a69-9fa2-17732f093664;72f988bf-86f1-41af-91ab-2d7cd011db47'
]))
Output
result |
---|
false |
12.69 - current_principal()
Returns the current principal name that runs the query.
Syntax
current_principal()
Returns
The current principal fully qualified name (FQN) as a string
.
The string format is:
PrinciplaType=
PrincipalId;
TenantId
Example
print fqn=current_principal()
Example output
fqn |
---|
aaduser=346e950e-4a62-42bf-96f5-4cf4eac3f11e;72f988bf-86f1-41af-91ab-2d7cd011db47 |
12.70 - cursor_after()
A predicate run over the records of a table to compare their ingestion time against a database cursor.
IngestionTime policy enabled.
Syntax
cursor_after(
RHS)
Parameters
Name | Type | Required | Description |
---|---|---|---|
RHS | string | ✔️ | Either an empty string literal or a valid database cursor value. |
Returns
A scalar value of type bool
that indicates whether the record was ingested
after the database cursor RHS (true
) or not (false
).
Related content
12.71 - cursor_before_or_at()
A predicate function run over the records of a table to compare their ingestion time against the database cursor time.
IngestionTime policy enabled.
Syntax
cursor_before_or_at(
RHS)
Parameters
Name | Type | Required | Description |
---|---|---|---|
RHS | string | ✔️ | Either an empty string literal or a valid database cursor value. |
Returns
A scalar value of type bool
that indicates whether the record was ingested
before or at the database cursor RHS (true
) or not (false
).
Related content
12.72 - cursor_current()
Retrieves the current value of the cursor of the database in scope.
Syntax
cursor_current()
Returns
Returns a single value of type string
that encodes the current value of the
cursor of the database in scope.
Related content
12.73 - datetime_add()
Calculates a new datetime from a specified period multiplied by a specified amount, added to, or subtracted from a specified datetime.
Syntax
datetime_add(
period,
amount,
datetime)
Parameters
Name | Type | Required | Description |
---|---|---|---|
period | string | ✔️ | The length of time by which to increment. |
amount | int | ✔️ | The number of periods to add to or subtract from datetime. |
datetime | datetime | ✔️ | The date to increment by the result of the period x amount calculation. |
Possible values of period:
- Year
- Quarter
- Month
- Week
- Day
- Hour
- Minute
- Second
- Millisecond
- Microsecond
- Nanosecond
Returns
A datetime after a certain time/date interval has been added.
Examples
Period
print year = datetime_add('year',1,make_datetime(2017,1,1)),
quarter = datetime_add('quarter',1,make_datetime(2017,1,1)),
month = datetime_add('month',1,make_datetime(2017,1,1)),
week = datetime_add('week',1,make_datetime(2017,1,1)),
day = datetime_add('day',1,make_datetime(2017,1,1)),
hour = datetime_add('hour',1,make_datetime(2017,1,1)),
minute = datetime_add('minute',1,make_datetime(2017,1,1)),
second = datetime_add('second',1,make_datetime(2017,1,1))
Output
year | quarter | month | week | day | hour | minute | second |
---|---|---|---|---|---|---|---|
2018-01-01 00:00:00.0000000 | 2017-04-01 00:00:00.0000000 | 2017-02-01 00:00:00.0000000 | 2017-01-08 00:00:00.0000000 | 2017-01-02 00:00:00.0000000 | 2017-01-01 01:00:00.0000000 | 2017-01-01 00:01:00.0000000 | 2017-01-01 00:00:01.0000000 |
Amount
print year = datetime_add('year',-5,make_datetime(2017,1,1)),
quarter = datetime_add('quarter',12,make_datetime(2017,1,1)),
month = datetime_add('month',-15,make_datetime(2017,1,1)),
week = datetime_add('week',100,make_datetime(2017,1,1))
Output
year | quarter | month | week |
---|---|---|---|
2012-01-01T00:00:00Z | 2020-01-01T00:00:00Z | 2015-10-01T00:00:00Z | 2018-12-02T00:00:00Z |
12.74 - datetime_diff()
Calculates the number of the specified periods between two datetime values.
Syntax
datetime_diff(
period,
datetime1,
datetime2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
period | string | ✔️ | The measurement of time used to calculate the return value. See possible values. |
datetime1 | datetime | ✔️ | The left-hand side of the subtraction equation. |
datetime2 | datetime | ✔️ | The right-hand side of the subtraction equation. |
Possible values of period
These values are case insensitive:
- Year
- Quarter
- Month
- Week
- Day
- Hour
- Minute
- Second
- Millisecond
- Microsecond
- Nanosecond
Returns
An integer that represents the amount of periods in the result of subtraction (datetime1 - datetime2).
Example
print
year = datetime_diff('year',datetime(2017-01-01),datetime(2000-12-31)),
quarter = datetime_diff('quarter',datetime(2017-07-01),datetime(2017-03-30)),
month = datetime_diff('month',datetime(2017-01-01),datetime(2015-12-30)),
week = datetime_diff('week',datetime(2017-10-29 00:00),datetime(2017-09-30 23:59)),
day = datetime_diff('day',datetime(2017-10-29 00:00),datetime(2017-09-30 23:59)),
hour = datetime_diff('hour',datetime(2017-10-31 01:00),datetime(2017-10-30 23:59)),
minute = datetime_diff('minute',datetime(2017-10-30 23:05:01),datetime(2017-10-30 23:00:59)),
second = datetime_diff('second',datetime(2017-10-30 23:00:10.100),datetime(2017-10-30 23:00:00.900)),
millisecond = datetime_diff('millisecond',datetime(2017-10-30 23:00:00.200100),datetime(2017-10-30 23:00:00.100900)),
microsecond = datetime_diff('microsecond',datetime(2017-10-30 23:00:00.1009001),datetime(2017-10-30 23:00:00.1008009)),
nanosecond = datetime_diff('nanosecond',datetime(2017-10-30 23:00:00.0000000),datetime(2017-10-30 23:00:00.0000007))
Output
year | quarter | month | week | day | hour | minute | second | millisecond | microsecond | nanosecond |
---|---|---|---|---|---|---|---|---|---|---|
17 | 2 | 13 | 5 | 29 | 2 | 5 | 10 | 100 | 100 | -700 |
12.75 - datetime_list_timezones()
Returns a list of supported timezones a time-zone specification.
Syntax
datetime_list_timezones()
Parameters
None, the function doesn’t have any parameters.
Returns
A list of timezones supported by the Internet Assigned Numbers Authority (IANA) Time Zone Database.
Example
print datetime_list_timezones()
Output print datetime_list_timezones()
Related content
- To convert from UTC to local, see datetime_utc_to_local()
- To convert a datetime from local to UTC, see datetime_local_to_utc()
- Timezones
- format_datetime()
12.76 - datetime_local_to_utc()
Converts local datetime to UTC datetime using a time-zone specification.
Syntax
datetime_local_to_utc(
from,
timezone)
Parameters
Name | Type | Required | Description |
---|---|---|---|
from | datetime | ✔️ | The local datetime to convert. |
timezone | string | ✔️ | The timezone of the desired datetime. The value must be one of the supported timezones. |
Returns
A UTC datetime that corresponds the local datetime in the specified timezone
.
Example
datatable(local_dt: datetime, tz: string)
[ datetime(2020-02-02 20:02:20), 'US/Pacific',
datetime(2020-02-02 20:02:20), 'America/Chicago',
datetime(2020-02-02 20:02:20), 'Europe/Paris']
| extend utc_dt = datetime_local_to_utc(local_dt, tz)
Output
local_dt | tz | utc_dt |
---|---|---|
2020-02-02 20:02:20.0000000 | Europe/Paris | 2020-02-02 19:02:20.0000000 |
2020-02-02 20:02:20.0000000 | America/Chicago | 2020-02-03 02:02:20.0000000 |
2020-02-02 20:02:20.0000000 | US/Pacific | 2020-02-03 04:02:20.0000000 |
range Local from datetime(2022-03-27 01:00:00.0000000) to datetime(2022-03-27 04:00:00.0000000) step 1h
| extend UTC=datetime_local_to_utc(Local, 'Europe/Brussels')
| extend BackToLocal=datetime_utc_to_local(UTC, 'Europe/Brussels')
| extend diff=Local-BackToLocal
Local | UTC | BackToLocal | diff |
---|---|---|---|
2022-03-27 02:00:00.0000000 | 2022-03-27 00:00:00.0000000 | 2022-03-27 01:00:00.0000000 | 01:00:00 |
2022-03-27 01:00:00.0000000 | 2022-03-27 00:00:00.0000000 | 2022-03-27 01:00:00.0000000 | 00:00:00 |
2022-03-27 03:00:00.0000000 | 2022-03-27 01:00:00.0000000 | 2022-03-27 03:00:00.0000000 | 00:00:00 |
2022-03-27 04:00:00.0000000 | 2022-03-27 02:00:00.0000000 | 2022-03-27 04:00:00.0000000 | 00:00:00 |
Related content
- To convert from UTC to local, see datetime_utc_to_local()
- Timezones
- List of supported timezones
- format_datetime()
12.77 - datetime_part()
Extracts the requested date part as an integer value.
Syntax
datetime_part(
part,
datetime)
Parameters
Name | Type | Required | Description |
---|---|---|---|
part | string | ✔️ | Measurement of time to extract from date. See possible values. |
date | datetime | ✔️ | The full date from which to extract part. |
Possible values of part
- Year
- Quarter
- Month
- week_of_year
- Day
- DayOfYear
- Hour
- Minute
- Second
- Millisecond
- Microsecond
- Nanosecond
Returns
An integer representing the extracted part.
Example
let dt = datetime(2017-10-30 01:02:03.7654321);
print
year = datetime_part("year", dt),
quarter = datetime_part("quarter", dt),
month = datetime_part("month", dt),
weekOfYear = datetime_part("week_of_year", dt),
day = datetime_part("day", dt),
dayOfYear = datetime_part("dayOfYear", dt),
hour = datetime_part("hour", dt),
minute = datetime_part("minute", dt),
second = datetime_part("second", dt),
millisecond = datetime_part("millisecond", dt),
microsecond = datetime_part("microsecond", dt),
nanosecond = datetime_part("nanosecond", dt)
Output
year | quarter | month | weekOfYear | day | dayOfYear | hour | minute | second | millisecond | microsecond | nanosecond |
---|---|---|---|---|---|---|---|---|---|---|---|
2017 | 4 | 10 | 44 | 30 | 303 | 1 | 2 | 3 | 765 | 765432 | 765432100 |
12.78 - datetime_utc_to_local()
Converts UTC datetime to local datetime using a time-zone specification.
Syntax
datetime_utc_to_local(
from,
timezone)
Parameters
Name | Type | Required | Description |
---|---|---|---|
from | datetime | ✔️ | The UTC datetime to convert. |
timezone | string | ✔️ | The timezone to convert to. This value must be one of the supported timezones. |
Returns
A local datetime in the timezone that corresponds the UTC datetime.
Example
print dt=now()
| extend pacific_dt = datetime_utc_to_local(dt, 'US/Pacific'), canberra_dt = datetime_utc_to_local(dt, 'Australia/Canberra')
| extend diff = pacific_dt - canberra_dt
Output
dt | pacific_dt | canberra_dt | diff |
---|---|---|---|
2022-07-11 22:18:48.4678620 | 2022-07-11 15:18:48.4678620 | 2022-07-12 08:18:48.4678620 | -17:00:00 |
Related content
- To convert a datetime from local to UTC, see datetime_local_to_utc()
- Timezones
- List of supported timezones
- format_datetime()
12.79 - dayofmonth()
Returns an integer representing the day number of the given datetime.
Syntax
dayofmonth(
date)
Parameters
Name | Type | Required | Description |
---|---|---|---|
date | datetime | ✔️ | The datetime used to extract the day number. |
Returns
An integer representing the day number of the given datetime.
Example
dayofmonth(datetime(2015-12-14))
Output
result |
---|
14 |
12.80 - dayofweek()
timespan
since the preceding Sunday.Returns the number of days since the preceding Sunday, as a timespan
.
To convert timespan
to int
, see Convert timespan to integer.
Syntax
dayofweek(
date)
Parameters
Name | Type | Required | Description |
---|---|---|---|
date | datetime | ✔️ | The datetime for which to determine the day of week. |
Returns
The timespan
since midnight at the beginning of the preceding Sunday, rounded down to an integer number of days.
Examples
The following example returns 0, indicating that the specified datetime is a Sunday.
print
Timespan = dayofweek(datetime(1947-11-30 10:00:05))
Output
Timespan |
---|
00:00:00 |
The following example returns 1, indicating that the specified datetime is a Monday.
print
Timespan = dayofweek(datetime(1970-05-11))
Output
Timespan |
---|
1.00:00:00 |
Convert timespan to integer
The following example returns the number of days both as a timespan
and as data type int
.
let dow=dayofweek(datetime(1970-5-12));
print Timespan = dow, Integer = toint(dow/1d)
Output
Timespan | Integer |
---|---|
2.00:00:00 | 2 |
Related content
12.81 - dayofyear()
Returns the integer number represents the day number of the given year.
Syntax
dayofyear(
date)
Parameters
Name | Type | Required | Description |
---|---|---|---|
date | datetime | ✔️ | The datetime for which to determine the day number. |
Returns
The day number of the given year.
Example
dayofyear(datetime(2015-12-14))
Output
result |
---|
348 |
12.82 - dcount_hll()
Calculates the distinct count from results generated by hll or hll_merge.
Read about the underlying algorithm (HyperLogLog) and estimation accuracy.
Syntax
dcount_hll(
hll)
Parameters
Name | Type | Required | Description |
---|---|---|---|
hll | string | ✔️ | An expression generated by hll or hll-merge to be used to find the distinct count. |
Returns
Returns the distinct count of each value in hll.
Example
The following example shows the distinct count hll merged results.
StormEvents
| summarize hllRes = hll(DamageProperty) by bin(StartTime,10m)
| summarize hllMerged = hll_merge(hllRes)
| project dcount_hll(hllMerged)
Output
dcount_hll_hllMerged |
---|
315 |
Estimation accuracy
12.83 - degrees()
Converts angle value in radians into value in degrees, using the formula degrees = (180 / PI ) * angle_in_radians
.
Syntax
degrees(
radians)
Parameters
Name | Type | Required | Description |
---|---|---|---|
radians | real | ✔️ | The angle in radians to convert to degrees. |
Returns
The corresponding angle in degrees for an angle specified in radians.
Examples
print degrees0 = degrees(pi()/4), degrees1 = degrees(pi()*1.5), degrees2 = degrees(0)
Output
degrees0 | degrees1 | degrees2 |
---|---|---|
45 | 270 | 0 |
12.84 - dynamic_to_json()
dynamic
to a canonical string representation.Converts a scalar value of type dynamic
to a canonical string
representation.
Syntax
dynamic_to_json(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | dynamic | ✔️ | The expression to convert to string representation. |
Returns
Returns a canonical representation of the input as a value of type string
,
according to the following rules:
If the input is a scalar value of type other than
dynamic
, the output is the application oftostring()
to that value.If the input is an array of values, the output is composed of the characters
[
,,
, and]
interspersed with the canonical representation described here of each array element.If the input is a property bag, the output is composed of the characters
{
,,
, and}
interspersed with the colon (:
)-delimited name/value pairs of the properties. The pairs are sorted by the names, and the values are in the canonical representation described here of each array element.
Example
let bag1 = dynamic_to_json(
dynamic({
'Y10':dynamic({}),
'X8': dynamic({
'c3':1,
'd8':5,
'a4':6
}),
'D1':114,
'A1':12,
'B1':2,
'C1':3,
'A14':[15, 13, 18]
}));
let bag2 = dynamic_to_json(
dynamic({
'X8': dynamic({
'a4':6,
'c3':1,
'd8':5
}),
'A14':[15, 13, 18],
'C1':3,
'B1':2,
'Y10': dynamic({}),
'A1':12, 'D1':114
}));
print AreEqual=bag1 == bag2, Result=bag1
Output
AreEqual | Result |
---|---|
true | {“A1”:12,“A14”:[15,13,18],“B1”:2,“C1”:3,“D1”:114,“X8”:{“a4”:6,“c3”:1,“d8”:5},“Y10”:{}} |
12.85 - endofday()
Returns the end of the day containing the date, shifted by an offset, if provided.
Syntax
endofday(
date [, offset])
Parameters
Name | Type | Required | Description |
---|---|---|---|
date | datetime | ✔️ | The date to find the end of. |
offset | int | The number of offset days from date. Default is 0. |
Returns
A datetime representing the end of the day for the given date value, with the offset, if specified.
Example
range offset from -1 to 1 step 1
| project dayEnd = endofday(datetime(2017-01-01 10:10:17), offset)
Output
dayEnd |
---|
2016-12-31 23:59:59.9999999 |
2017-01-01 23:59:59.9999999 |
2017-01-02 23:59:59.9999999 |
12.86 - endofmonth()
Returns the end of the month containing the date, shifted by an offset, if provided.
Syntax
endofmonth(
date [, offset])
Parameters
Name | Type | Required | Description |
---|---|---|---|
date | datetime | ✔️ | The date used to find the end of the month. |
offset | int | The number of offset months from date. Default is 0. |
Returns
A datetime representing the end of the month for the given date value, with the offset, if specified.
Example
range offset from -1 to 1 step 1
| project monthEnd = endofmonth(datetime(2017-01-01 10:10:17), offset)
Output
monthEnd |
---|
2016-12-31 23:59:59.9999999 |
2017-01-31 23:59:59.9999999 |
2017-02-28 23:59:59.9999999 |
12.87 - endofweek()
Returns the end of the week containing the date, shifted by an offset, if provided.
Last day of the week is considered to be a Saturday.
Syntax
endofweek(
date [, offset])
Parameters
Name | Type | Required | Description |
---|---|---|---|
date | datetime | ✔️ | The date used to find the end of the week. |
offset | int | The number of offset weeks from date. Default is 0. |
Returns
A datetime representing the end of the week for the given date value, with the offset, if specified.
Example
range offset from -1 to 1 step 1
| project weekEnd = endofweek(datetime(2017-01-01 10:10:17), offset)
Output
weekEnd |
---|
2016-12-31 23:59:59.9999999 |
2017-01-07 23:59:59.9999999 |
2017-01-14 23:59:59.9999999 |
12.88 - endofyear()
Returns the end of the year containing the date, shifted by an offset, if provided.
Syntax
endofyear(
date [, offset])
Parameters
Name | Type | Required | Description |
---|---|---|---|
date | datetime | ✔️ | The date used to find the end of the year. |
offset | int | The number of offset years from date. Default is 0. |
Returns
A datetime representing the end of the year for the given date value, with the &offset, if specified.
Example
range offset from -1 to 1 step 1
| project yearEnd = endofyear(datetime(2017-01-01 10:10:17), offset)
Output
yearEnd |
---|
2016-12-31 23:59:59.9999999 |
2017-12-31 23:59:59.9999999 |
2018-12-31 23:59:59.9999999 |
12.89 - erf()
Returns the error function of the input.
Syntax
erf(
x)
Parameters
Name | Type | Required | Description |
---|---|---|---|
x | real | ✔️ | The value for which to calculate the function. |
Returns
Error function of x.
Example
range x from -3 to 3 step 1
| extend erf_x = erf(x)
x | erf_x |
---|---|
-3 | -0.999977909503001 |
-2 | -0.995322265018953 |
-1 | -0.842700792949715 |
0 | 0 |
1 | 0.842700792949715 |
2 | 0.995322265018953 |
3 | 0.999977909503001 |
12.90 - erfc()
Returns the complementary error function of the input.
Syntax
erfc(
x)
Parameters
Name | Type | Required | Description |
---|---|---|---|
x | real | ✔️ | The value for which to calculate the function. |
Returns
Complementary error function of x.
Example
range x from -3 to 3 step 1
| extend erf_x = erfc(x)
x | erf_x |
---|---|
-3 | 1.999977909503001 |
-2 | 1.995322265018953 |
-1 | 1.842700792949715 |
0 | 1 |
1 | 0.157299207050285 |
2 | 0.00467773498104727 |
3 | 2.20904969985854E-05 |
12.91 - estimate_data_size()
Returns an estimated data size in bytes of the selected columns of the tabular expression.
Syntax
estimate_data_size(
columns)
Parameters
Name | Type | Required | Description |
---|---|---|---|
columns | string | ✔️ | One or more comma-separated column references in the source tabular expression to use for data size estimation. To include all columns, use the wildcard (* ) character. |
Returns
The estimated data size in bytes of the referenced columns. Estimation is based on data types and actual values.
For example, the data size for the string '{"a":"bcd"}'
is smaller than the dynamic value dynamic({"a":"bcd"})
because the latter’s internal representation is more complex than that of a string.
Example
The following example calculates the total data size using estimate_data_size()
.
range x from 1 to 10 step 1 // x (long) is 8
| extend Text = '1234567890' // Text length is 10
| summarize Total=sum(estimate_data_size(*)) // (8+10)x10 = 180
Output
Total |
---|
180 |
Related content
12.92 - exp()
The base-e exponential function of x, which is e raised to the power x: e^x.
Syntax
exp(
x)
Parameters
Name | Type | Required | Description |
---|---|---|---|
x | real | ✔️ | The value of the exponent. |
Returns
The exponential value of x.
Related content
12.93 - exp10()
The base-10 exponential function of x, which is 10 raised to the power x: 10^x.
Syntax
exp10(
x)
Parameters
Name | Type | Required | Description |
---|---|---|---|
x | real | ✔️ | The value of the exponent. |
Returns
The exponential value of x.
Related content
12.94 - exp2()
The base-2 exponential function of x, which is 2 raised to the power x: 2^x.
Syntax
exp2(
x)
Parameters
Name | Type | Required | Description |
---|---|---|---|
x | real | ✔️ | The value of the exponent. |
Returns
The exponential value of x.
Related content
12.95 - extent_id()
Returns a unique identifier that identifies the data shard (“extent”) that the current record resides in at the time the query was run.
Applying this function to calculated data that isn’t attached to a data shard returns an empty guid (all zeros).
Syntax
extent_id()
Returns
A value of type guid
that identifies the current record’s data shard at the time the query was run,
or an empty guid (all zeros).
Example
The following example shows how to get a list of all the data shards
that currently have records from an hour ago with a specific value for the
column ActivityId
. It demonstrates that some query operators (here,
the where
operator, and also extend
and project
)
preserve the information about the data shard hosting the record.
T
| where Timestamp > ago(1h)
| where ActivityId == 'dd0595d4-183e-494e-b88e-54c52fe90e5a'
| extend eid=extent_id()
| summarize by eid
12.96 - extent_tags()
Returns a dynamic array with the extent tags of the extent that the current record is in.
If you apply this function to calculated data, which isn’t attached to a data shard, returns an empty value.
Syntax
extent_tags()
Returns
A value of type dynamic
that is an array holding the current record’s extent tags,
or an empty value.
Examples
Some query operators preserve the information about the data shard hosting the record.
These operators include where
, extend
, and project
.
The following example shows how to get a list the tags of all the data shards
that have records from an hour ago, with a specific value for the
column ActivityId
.
T
| where Timestamp > ago(1h)
| where ActivityId == 'dd0595d4-183e-494e-b88e-54c52fe90e5a'
| extend tags = extent_tags()
| summarize by tostring(tags)
The following example shows how to obtain a count of all records from the last hour, which are stored in extents tagged with the tag MyTag
(and potentially other tags), but not tagged with the tag drop-by:MyOtherTag
.
T
| where Timestamp > ago(1h)
| extend Tags = extent_tags()
| where Tags has_cs 'MyTag' and Tags !has_cs 'drop-by:MyOtherTag'
| count
12.97 - extract_all()
Get all matches for a regular expression from a source string. Optionally, retrieve a subset of matching groups.
print extract_all(@"(\d+)", "a set of numbers: 123, 567 and 789") // results with the dynamic array ["123", "567", "789"]
Syntax
extract_all(
regex,
[captureGroups,
] source)
Parameters
Name | Type | Required | Description |
---|---|---|---|
regex | string | ✔️ | A regular expression containing between one and 16 capture groups. |
captureGroups | dynamic | An array that indicates the capture groups to extract. Valid values are from 1 to the number of capturing groups in the regular expression. Named capture groups are allowed as well. See examples. | |
source | string | ✔️ | The string to search. |
Returns
- If regex finds a match in source: Returns dynamic array including all matches against the indicated capture groups captureGroups, or all of capturing groups in the regex.
- If number of captureGroups is 1: The returned array has a single dimension of matched values.
- If number of captureGroups is more than 1: The returned array is a two-dimensional collection of multi-value matches per captureGroups selection, or all capture groups present in the regex if captureGroups is omitted.
- If there’s no match:
null
.
Examples
Extract a single capture group
The following query returns hex-byte representation (two hex-digits) of the GUID.
print Id="82b8be2d-dfa7-4bd1-8f63-24ad26d31449"
| extend guid_bytes = extract_all(@"([\da-f]{2})", Id)
Output
ID | guid_bytes |
---|---|
82b8be2d-dfa7-4bd1-8f63-24ad26d31449 | [“82”,“b8”,“be”,“2d”,“df”,“a7”,“4b”,“d1”,“8f”,“63”,“24”,“ad”,“26”,“d3”,“14”,“49”] |
Extract several capture groups
The following query uses a regular expression with three capturing groups to split each GUID part into first letter, last letter, and whatever is in the middle.
print Id="82b8be2d-dfa7-4bd1-8f63-24ad26d31449"
| extend guid_bytes = extract_all(@"(\w)(\w+)(\w)", Id)
Output
ID | guid_bytes |
---|---|
82b8be2d-dfa7-4bd1-8f63-24ad26d31449 | [[“8”,“2b8be2”,“d”],[“d”,“fa”,“7”],[“4”,“bd”,“1”],[“8”,“f6”,“3”],[“2”,“4ad26d3144”,“9”]] |
Extract a subset of capture groups
The following query selects a subset of capturing groups.
The regular expression matches the first letter, last letter, and all the rest.
The captureGroups parameter is used to select only the first and the last parts.
print Id="82b8be2d-dfa7-4bd1-8f63-24ad26d31449"
| extend guid_bytes = extract_all(@"(\w)(\w+)(\w)", dynamic([1,3]), Id)
Output
ID | guid_bytes |
---|---|
82b8be2d-dfa7-4bd1-8f63-24ad26d31449 | [[“8”,“d”],[“d”,“7”],[“4”,“1”],[“8”,“3”],[“2”,“9”]] |
Using named capture groups
The captureGroups in the following query uses both capture group indexes and named capture group references to fetch matching values.
print Id="82b8be2d-dfa7-4bd1-8f63-24ad26d31449"
| extend guid_bytes = extract_all(@"(?P<first>\w)(?P<middle>\w+)(?P<last>\w)", dynamic(['first',2,'last']), Id)
Output
ID | guid_bytes |
---|---|
82b8be2d-dfa7-4bd1-8f63-24ad26d31449 | [[“8”,“2b8be2”,“d”],[“d”,“fa”,“7”],[“4”,“bd”,“1”],[“8”,“f6”,“3”],[“2”,“4ad26d3144”,“9”]] |
Related content
12.98 - extract_json()
Get a specified element out of a JSON text using a path expression.
Optionally convert the extracted string to a specific type.
Syntax
extract_json(
jsonPath,
dataSource,
type)
Parameters
Name | Type | Required | Description |
---|---|---|---|
jsonPath | string | ✔️ | A JSONPath that defines an accessor into the JSON document. |
dataSource | string | ✔️ | A JSON document. |
type | string | An optional type literal. If provided, the extracted value is converted to this type. For example, typeof(long) will convert the extracted value to a long . |
Performance tips
- Apply where-clauses before using
extract_json()
. - Consider using a regular expression match with extract instead. This can run very much faster, and is effective if the JSON is produced from a template.
- Use
parse_json()
if you need to extract more than one value from the JSON. - Consider having the JSON parsed at ingestion by declaring the type of the column to be dynamic.
Returns
This function performs a JSONPath query into dataSource, which contains a valid JSON string, optionally converting that value to another type depending on the third argument.
Example
let json = '{"name": "John", "age": 30, "city": "New York"}';
print extract_json("$.name", json, typeof(string));
Output
print_0 |
---|
John |
Related content
12.99 - extract()
Get a match for a regular expression from a source string.
Optionally, convert the extracted substring to the indicated type.
Syntax
extract(
regex,
captureGroup,
source [,
typeLiteral])
Parameters
Name | Type | Required | Description |
---|---|---|---|
regex | string | ✔️ | A regular expression. |
captureGroup | int | ✔️ | The capture group to extract. 0 stands for the entire match, 1 for the value matched by the first ‘(‘parenthesis’)’ in the regular expression, and 2 or more for subsequent parentheses. |
source | string | ✔️ | The string to search. |
typeLiteral | string | If provided, the extracted substring is converted to this type. For example, typeof(long) . |
Returns
If regex finds a match in source: the substring matched against the indicated capture group captureGroup, optionally converted to typeLiteral.
If there’s no match, or the type conversion fails: null
.
Examples
Extract month from datetime string
The following query extracts the month from the string Dates
and returns a table with the date string and the month.
let Dates = datatable(DateString: string)
[
"15-12-2024",
"21-07-2023",
"10-03-2022"
];
Dates
| extend Month = extract(@"-(\d{2})-", 1, DateString, typeof(int))
| project DateString, Month
Output
DateString | Month |
---|---|
15-12-2024 | 12 |
21-07-2023 | 7 |
10-03-2022 | 3 |
Extract username from a string
The following example returns the username from the string. The regular expression ([^,]+)
matches the text following “User: " up to the next comma, effectively extracting the username.
let Text = "User: JohnDoe, Email: johndoe@example.com, Age: 29";
print UserName = extract("User: ([^,]+)", 1, Text)
Output
UserName |
---|
JohnDoe |
Related content
12.100 - format_bytes()
Formats a number as a string representing data size in bytes.
Syntax
format_bytes(
size [,
precision [,
units]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
size | real | ✔️ | The value to be formatted as data size in bytes. |
precision | int | The number of digits the value will be rounded to after the decimal point. The default is 0. | |
units | string | The units of the target data size: Bytes , KB , MB , GB , TB , PB , or EB . If this parameter is empty, the units will be auto-selected based on input value. |
Returns
A string of size formatted as data size in bytes.
Examples
print
v1 = format_bytes(564),
v2 = format_bytes(10332, 1),
v3 = format_bytes(20010332),
v4 = format_bytes(20010332, 2),
v5 = format_bytes(20010332, 0, "KB")
Output
v1 | v2 | v3 | v4 | v5 |
---|---|---|---|---|
564 Bytes | 10.1 KB | 19 MB | 19.08 MB | 19541 KB |
Related content
12.101 - format_datetime()
Formats a datetime according to the provided format.
Syntax
format_datetime(
date ,
format)
Parameters
Name | Type | Required | Description |
---|---|---|---|
date | datetime | ✔️ | The value to format. |
format | string | ✔️ | The output format comprised of one or more of the supported format elements. |
Supported format elements
The format parameter should include one or more of the following elements:
Format specifier | Description | Examples |
---|---|---|
d | The day of the month, from 1 through 31. | 2009-06-01T13:45:30 -> 1, 2009-06-15T13:45:30 -> 15 |
dd | The day of the month, from 01 through 31. | 2009-06-01T13:45:30 -> 01, 2009-06-15T13:45:30 -> 15 |
f | The tenths of a second in a date and time value. | 2009-06-15T13:45:30.6170000 -> 6, 2009-06-15T13:45:30.05 -> 0 |
ff | The hundredths of a second in a date and time value. | 2009-06-15T13:45:30.6170000 -> 61, 2009-06-15T13:45:30.0050000 -> 00 |
fff | The milliseconds in a date and time value. | 6/15/2009 13:45:30.617 -> 617, 6/15/2009 13:45:30.0005 -> 000 |
ffff | The ten thousandths of a second in a date and time value. | 2009-06-15T13:45:30.6175000 -> 6175, 2009-06-15T13:45:30.0000500 -> 0000 |
fffff | The hundred thousandths of a second in a date and time value. | 2009-06-15T13:45:30.6175400 -> 61754, 2009-06-15T13:45:30.000005 -> 00000 |
ffffff | The millionths of a second in a date and time value. | 2009-06-15T13:45:30.6175420 -> 617542, 2009-06-15T13:45:30.0000005 -> 000000 |
fffffff | The ten millionths of a second in a date and time value. | 2009-06-15T13:45:30.6175425 -> 6175425, 2009-06-15T13:45:30.0001150 -> 0001150 |
F | If non-zero, the tenths of a second in a date and time value. | 2009-06-15T13:45:30.6170000 -> 6, 2009-06-15T13:45:30.0500000 -> (no output) |
FF | If non-zero, the hundredths of a second in a date and time value. | 2009-06-15T13:45:30.6170000 -> 61, 2009-06-15T13:45:30.0050000 -> (no output) |
FFF | If non-zero, the milliseconds in a date and time value. | 2009-06-15T13:45:30.6170000 -> 617, 2009-06-15T13:45:30.0005000 -> (no output) |
FFFF | If non-zero, the ten thousandths of a second in a date and time value. | 2009-06-15T13:45:30.5275000 -> 5275, 2009-06-15T13:45:30.0000500 -> (no output) |
FFFFF | If non-zero, the hundred thousandths of a second in a date and time value. | 2009-06-15T13:45:30.6175400 -> 61754, 2009-06-15T13:45:30.0000050 -> (no output) |
FFFFFF | If non-zero, the millionths of a second in a date and time value. | 2009-06-15T13:45:30.6175420 -> 617542, 2009-06-15T13:45:30.0000005 -> (no output) |
FFFFFFF | If non-zero, the ten millionths of a second in a date and time value. | 2009-06-15T13:45:30.6175425 -> 6175425, 2009-06-15T13:45:30.0001150 -> 000115 |
h | The hour, using a 12-hour clock from 1 to 12. | 2009-06-15T01:45:30 -> 1, 2009-06-15T13:45:30 -> 1 |
hh | The hour, using a 12-hour clock from 01 to 12. | 2009-06-15T01:45:30 -> 01, 2009-06-15T13:45:30 -> 01 |
H | The hour, using a 24-hour clock from 0 to 23. | 2009-06-15T01:45:30 -> 1, 2009-06-15T13:45:30 -> 13 |
HH | The hour, using a 24-hour clock from 00 to 23. | 2009-06-15T01:45:30 -> 01, 2009-06-15T13:45:30 -> 13 |
m | The minute, from 0 through 59. | 2009-06-15T01:09:30 -> 9, 2009-06-15T13:29:30 -> 29 |
mm | The minute, from 00 through 59. | 2009-06-15T01:09:30 -> 09, 2009-06-15T01:45:30 -> 45 |
M | The month, from 1 through 12. | 2009-06-15T13:45:30 -> 6 |
MM | The month, from 01 through 12. | 2009-06-15T13:45:30 -> 06 |
s | The second, from 0 through 59. | 2009-06-15T13:45:09 -> 9 |
ss | The second, from 00 through 59. | 2009-06-15T13:45:09 -> 09 |
y | The year, from 0 to 99. | 0001-01-01T00:00:00 -> 1, 0900-01-01T00:00:00 -> 0, 1900-01-01T00:00:00 -> 0, 2009-06-15T13:45:30 -> 9, 2019-06-15T13:45:30 -> 19 |
yy | The year, from 00 to 99. | 0001-01-01T00:00:00 -> 01, 0900-01-01T00:00:00 -> 00, 1900-01-01T00:00:00 -> 00, 2019-06-15T13:45:30 -> 19 |
yyyy | The year as a four-digit number. | 0001-01-01T00:00:00 -> 0001, 0900-01-01T00:00:00 -> 0900, 1900-01-01T00:00:00 -> 1900, 2009-06-15T13:45:30 -> 2009 |
tt | AM / PM hours | 2009-06-15T13:45:09 -> PM |
Supported delimiters
The format specifier can include the following delimiters:
Delimiter | Comment |
---|---|
' ' | Space |
'/' | |
'-' | Dash |
':' | |
',' | |
'.' | |
'_' | |
'[' | |
']' |
Returns
A string with date formatted as specified by format.
Examples
The following three examples return differently formatted datetimes.
let dt = datetime(2017-01-29 09:00:05);
print
v1=format_datetime(dt,'yy-MM-dd [HH:mm:ss]')
Output
v1 |
---|
17-01-29 [09:00:05] |
let dt = datetime(2017-01-29 09:00:05);
print
v2=format_datetime(dt, 'yyyy-M-dd [H:mm:ss]')
Output
v2 |
---|
2017-1-29 [9:00:05] |
let dt = datetime(2017-01-29 09:00:05);
print
v3=format_datetime(dt, 'yy-MM-dd [hh:mm:ss tt]')
Output
v3 |
---|
17-01-29 [09:00:05 AM] |
Related content
- To convert from UTC to local, see datetime_utc_to_local()
- To convert a datetime from local to UTC, see datetime_local_to_utc()
12.102 - format_ipv4_mask()
Parses the input with a netmask and returns a string representing the IPv4 address in CIDR notation.
Syntax
format_ipv4_mask(
ip [,
prefix])
Parameters
Name | Type | Required | Description |
---|---|---|---|
ip | string | ✔️ | The IPv4 address as CIDR notation. The format may be a string or number representation in big-endian order. |
prefix | int | An integer from 0 to 32 representing the number of most-significant bits that are taken into account. If unspecified, all 32 bit-masks are used. |
Returns
If conversion is successful, the result will be a string representing IPv4 address as CIDR notation. If conversion isn’t successful, the result will be an empty string.
Examples
datatable(address:string, mask:long)
[
'192.168.1.1', 24,
'192.168.1.1', 32,
'192.168.1.1/24', 32,
'192.168.1.1/24', long(-1),
]
| extend result = format_ipv4(address, mask),
result_mask = format_ipv4_mask(address, mask)
Output
address | mask | result | result_mask |
---|---|---|---|
192.168.1.1 | 24 | 192.168.1.0 | 192.168.1.0/24 |
192.168.1.1 | 32 | 192.168.1.1 | 192.168.1.1/32 |
192.168.1.1/24 | 32 | 192.168.1.0 | 192.168.1.0/24 |
192.168.1.1/24 | -1 |
Related content
- For IPv4 address formatting without CIDR notation, see format_ipv4().
- For a list of functions related to IP addresses, see IPv4 and IPv6 functions.
12.103 - format_ipv4()
Parses the input with a netmask and returns a string representing the IPv4 address.
Syntax
format_ipv4(
ip [,
prefix])
Parameters
Name | Type | Required | Description |
---|---|---|---|
ip | string | ✔️ | The IPv4 address. The format may be a string or number representation in big-endian order. |
prefix | int | An integer from 0 to 32 representing the number of most-significant bits that are taken into account. If unspecified, all 32 bit-masks are used. |
Returns
If conversion is successful, the result will be a string representing IPv4 address. If conversion isn’t successful, the result will be an empty string.
Examples
datatable(address:string, mask:long)
[
'192.168.1.1', 24,
'192.168.1.1', 32,
'192.168.1.1/24', 32,
'192.168.1.1/24', long(-1),
]
| extend result = format_ipv4(address, mask),
result_mask = format_ipv4_mask(address, mask)
Output
address | mask | result | result_mask |
---|---|---|---|
192.168.1.1 | 24 | 192.168.1.0 | 192.168.1.0/24 |
192.168.1.1 | 32 | 192.168.1.1 | 192.168.1.1/32 |
192.168.1.1/24 | 32 | 192.168.1.0 | 192.168.1.0/24 |
192.168.1.1/24 | -1 |
Related content
- For IPv4 address formatting including CIDR notation, see format_ipv4_mask().
- For a list of functions related to IP addresses, see IPv4 and IPv6 functions.
12.104 - format_timespan()
Formats a timespan according to the provided format.
Syntax
format_timespan(
timespan ,
format)
Parameters
Name | Type | Required | Description |
---|---|---|---|
timespan | timespan | ✔️ | The value to format. |
format | string | ✔️ | The output format comprised of one or more of the supported format elements. |
Supported format elements
Format specifier | Description | Examples |
---|---|---|
d -dddddddd | The number of whole days in the time interval. Padded with zeros if needed. | 15.13:45:30: d -> 15, dd -> 15, ddd -> 015 |
f | The tenths of a second in the time interval. | 15.13:45:30.6170000 -> 6, 15.13:45:30.05 -> 0 |
ff | The hundredths of a second in the time interval. | 15.13:45:30.6170000 -> 61, 15.13:45:30.0050000 -> 00 |
fff | The milliseconds in the time interval. | 6/15/2009 13:45:30.617 -> 617, 6/15/2009 13:45:30.0005 -> 000 |
ffff | The ten thousandths of a second in the time interval. | 15.13:45:30.6175000 -> 6175, 15.13:45:30.0000500 -> 0000 |
fffff | The hundred thousandths of a second in the time interval. | 15.13:45:30.6175400 -> 61754, 15.13:45:30.000005 -> 00000 |
ffffff | The millionths of a second in the time interval. | 15.13:45:30.6175420 -> 617542, 15.13:45:30.0000005 -> 000000 |
fffffff | The ten millionths of a second in the time interval. | 15.13:45:30.6175425 -> 6175425, 15.13:45:30.0001150 -> 0001150 |
F | If non-zero, the tenths of a second in the time interval. | 15.13:45:30.6170000 -> 6, 15.13:45:30.0500000 -> (no output) |
FF | If non-zero, the hundredths of a second in the time interval. | 15.13:45:30.6170000 -> 61, 15.13:45:30.0050000 -> (no output) |
FFF | If non-zero, the milliseconds in the time interval. | 15.13:45:30.6170000 -> 617, 15.13:45:30.0005000 -> (no output) |
FFFF | If non-zero, the ten thousandths of a second in the time interval. | 15.13:45:30.5275000 -> 5275, 15.13:45:30.0000500 -> (no output) |
FFFFF | If non-zero, the hundred thousandths of a second in the time interval. | 15.13:45:30.6175400 -> 61754, 15.13:45:30.0000050 -> (no output) |
FFFFFF | If non-zero, the millionths of a second in the time interval. | 15.13:45:30.6175420 -> 617542, 15.13:45:30.0000005 -> (no output) |
FFFFFFF | If non-zero, the ten millionths of a second in the time interval. | 15.13:45:30.6175425 -> 6175425, 15.13:45:30.0001150 -> 000115 |
H | The hour, using a 24-hour clock from 0 to 23. | 15.01:45:30 -> 1, 15.13:45:30 -> 13 |
HH | The hour, using a 24-hour clock from 00 to 23. | 15.01:45:30 -> 01, 15.13:45:30 -> 13 |
m | The number of whole minutes in the time interval that aren’t included as part of hours or days. Single-digit minutes don’t have a leading zero. | 15.01:09:30 -> 9, 15.13:29:30 -> 29 |
mm | The number of whole minutes in the time interval that aren’t included as part of hours or days. Single-digit minutes have a leading zero. | 15.01:09:30 -> 09, 15.01:45:30 -> 45 |
s | The number of whole seconds in the time interval that aren’t included as part of hours, days, or minutes. Single-digit seconds don’t have a leading zero. | 15.13:45:09 -> 9 |
ss | The number of whole seconds in the time interval that aren’t included as part of hours, days, or minutes. Single-digit seconds have a leading zero. | 15.13:45:09 -> 09 |
Supported delimiters
The format specifier can include following delimiters:
Delimiter | Comment |
---|---|
' ' | Space |
'/' | |
'-' | Dash |
':' | |
',' | |
'.' | |
'_' | |
'[' | |
']' |
Returns
A string with timespan formatted as specified by format.
Examples
let t = time(29.09:00:05.12345);
print
v1=format_timespan(t, 'dd.hh:mm:ss:FF'),
v2=format_timespan(t, 'ddd.h:mm:ss [fffffff]')
Output
v1 | v2 |
---|---|
29.09:00:05:12 | 029.9:00:05 [1234500] |
12.105 - gamma()
Computes the gamma function for the provided number.
Syntax
gamma(
number)
Parameters
Name | Type | Required | Description |
---|---|---|---|
number | real | ✔️ | The number used to calculate the gamma function. |
Returns
Gamma function of number.
Related content
For computing log-gamma function, see loggamma().
12.106 - geo_info_from_ip_address()
Retrieves geolocation information about IPv4 or IPv6 addresses.
Syntax
geo_info_from_ip_address(
IpAddress )
Parameters
Name | Type | Required | Description |
---|---|---|---|
IpAddress | string | ✔️ | IPv4 or IPv6 address to retrieve geolocation information about. |
Returns
A dynamic object containing the information on IP address whereabouts (if the information is available). The object contains the following fields:
Name | Type | Description |
---|---|---|
country | string | Country name |
state | string | State (subdivision) name |
city | string | City name |
latitude | real | Latitude coordinate |
longitude | real | Longitude coordinate |
Examples
print ip_location=geo_info_from_ip_address('20.53.203.50')
Output
ip_location |
---|
{"country": "Australia", "state": "New South Wales", "city": "Sydney", "latitude": -33.8715, "longitude": 151.2006} |
print ip_location=geo_info_from_ip_address('2a03:2880:f12c:83:face:b00c::25de')
Output
ip_location |
---|
{"country": "United States", "state": "Florida", "city": "Boca Raton", "latitude": 26.3594, "longitude": -80.0771} |
12.107 - gettype()
Returns the runtime type of its single argument.
The runtime type may be different than the nominal (static) type for expressions whose nominal type is dynamic
; in such cases gettype()
can be useful to reveal the type of the actual value (how the value is encoded in memory).
Syntax
gettype(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | scalar | ✔️ | The value for which to find the type. |
Returns
A string representing the runtime type of value.
Examples
Expression | Returns |
---|---|
gettype("a") | string |
gettype(111) | long |
gettype(1==1) | bool |
gettype(now()) | datetime |
gettype(1s) | timespan |
gettype(parse_json('1')) | int |
gettype(parse_json(' "abc" ')) | string |
gettype(parse_json(' {"abc":1} ')) | dictionary |
gettype(parse_json(' [1, 2, 3] ')) | array |
gettype(123.45) | real |
gettype(guid(12e8b78d-55b4-46ae-b068-26d7a0080254)) | guid |
gettype(parse_json('')) | null |
12.108 - getyear()
datetime
input.Returns the year part of the datetime
argument.
Syntax
getyear(
date)
Parameters
Name | Type | Required | Description |
---|---|---|---|
date | datetime | ✔️ | The date for which to get the year. |
Returns
The year that contains the given date.
Example
print year = getyear(datetime(2015-10-12))
year |
---|
2015 |
12.109 - gzip_compress_to_base64_string
Performs gzip compression and encodes the result to base64.
Syntax
gzip_compress_to_base64_string(
string)
Parameters
Name | Type | Required | Description |
---|---|---|---|
string | string | ✔️ | The value to be compressed and base64 encoded. The function accepts only one argument. |
Returns
- Returns a
string
that represents gzip-compressed and base64-encoded original string. - Returns an empty result if compression or encoding failed.
Example
print res = gzip_compress_to_base64_string("1234567890qwertyuiop")
res |
---|
H4sIAAAAAAAA/wEUAOv/MTIzNDU2Nzg5MHF3ZXJ0eXVpb3A6m7f2FAAAAA== |
Related content
12.110 - gzip_decompress_from_base64_string()
Decodes the input string from base64 and performs gzip decompression.
Syntax
gzip_decompress_from_base64_string(
string)
Parameters
Name | Type | Required | Description |
---|---|---|---|
string | string | ✔️ | The value that was compressed with gzip and then base64-encoded. The function accepts only one argument. |
Returns
- Returns a UTF-8
string
that represents the original string. - Returns an empty result if decompression or decoding failed.
- For example, invalid gzip-compressed and base 64-encoded strings will return an empty output.
Examples
Valid input
print res=gzip_decompress_from_base64_string("H4sIAAAAAAAA/wEUAOv/MTIzNDU2Nzg5MHF3ZXJ0eXVpb3A6m7f2FAAAAA==")
res |
---|
“1234567890qwertyuiop” |
Invalid input
print res=gzip_decompress_from_base64_string("x0x0x0")
res |
---|
Related content
12.111 - has_any_ipv4_prefix()
Returns a boolean value indicating whether one of specified IPv4 address prefixes appears in a text.
IP address entrances in a text must be properly delimited with non-alphanumeric characters. For example, properly delimited IP addresses are:
- “These requests came from: 192.168.1.1, 10.1.1.115 and 10.1.1.201”
- “05:04:54 127.0.0.1 GET /favicon.ico 404”
Performance tips
Syntax
has_any_ipv4_prefix(
source ,
ip_address_prefix [,
ip_address_prefix_2,
…] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
source | string | ✔️ | The value to search. |
ip_address_prefix | string or dynamic | ✔️ | An IP address prefix, or an array of IP address prefixes, for which to search. A valid IP address prefix is either a complete IPv4 address, such as 192.168.1.11 , or its prefix ending with a dot, such as 192. , 192.168. or 192.168.1. . |
Returns
true
if the one of specified IP address prefixes is a valid IPv4 address prefix, and it was found in source. Otherwise, the function returns false
.
Examples
IP addresses as list of strings
print result=has_any_ipv4_prefix('05:04:54 127.0.0.1 GET /favicon.ico 404', '127.0.', '192.168.') // true
result |
---|
true |
IP addresses as dynamic array
print result=has_any_ipv4_prefix('05:04:54 127.0.0.1 GET /favicon.ico 404', dynamic(["127.0.", "192.168."]))
result |
---|
true |
Invalid IPv4 prefix
print result=has_any_ipv4_prefix('05:04:54 127.0.0.1 GET /favicon.ico 404', '127.0')
result |
---|
false |
Improperly deliminated IP address
print result=has_any_ipv4_prefix('05:04:54127.0.0.1 GET /favicon.ico 404', '127.0.', '192.')
result |
---|
false |
12.112 - has_any_ipv4()
Returns a value indicating whether one of specified IPv4 addresses appears in a text.
IP address entrances in a text must be properly delimited with non-alphanumeric characters. For example, properly delimited IP addresses are:
- “These requests came from: 192.168.1.1, 10.1.1.115 and 10.1.1.201”
- “05:04:54 127.0.0.1 GET /favicon.ico 404”
Performance tips
Syntax
has_any_ipv4(
source ,
ip_address [,
ip_address_2,
…] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
source | string | ✔️ | The value to search. |
ip_address | string or dynamic | ✔️ | An IP address, or an array of IP addresses, for which to search. |
Returns
true
if one of specified IP addresses is a valid IPv4 address, and it was found in source. Otherwise, the function returns false
.
Examples
IP addresses as list of strings
print result=has_any_ipv4('05:04:54 127.0.0.1 GET /favicon.ico 404', '127.0.0.1', '127.0.0.2')
result |
---|
true |
IP addresses as dynamic array
print result=has_any_ipv4('05:04:54 127.0.0.1 GET /favicon.ico 404', dynamic(['127.0.0.1', '127.0.0.2']))
result |
---|
true |
Invalid IPv4 address
print result=has_any_ipv4('05:04:54 127.0.0.256 GET /favicon.ico 404', dynamic(["127.0.0.256", "192.168.1.1"]))
result |
---|
false |
Improperly deliminated IP address
print result=has_any_ipv4('05:04:54127.0.0.1 GET /favicon.ico 404', '127.0.0.1', '192.168.1.1') // false, improperly delimited IP address
result |
---|
false |
12.113 - has_ipv4_prefix()
Returns a value indicating whether a specified IPv4 address prefix appears in a text.
A valid IP address prefix is either a complete IPv4 address (192.168.1.11
) or its prefix ending with a dot (192.
, 192.168.
or 192.168.1.
).
IP address entrances in a text must be properly delimited with nonalphanumeric characters. For example, properly delimited IP addresses are:
- “These requests came from: 192.168.1.1, 10.1.1.115 and 10.1.1.201”
- “05:04:54 127.0.0.1 GET /favicon.ico 404”
Syntax
has_ipv4_prefix(
source ,
ip_address_prefix )
Parameters
Name | Type | Required | Description |
---|---|---|---|
source | string | ✔️ | The text to search. |
ip_address_prefix | string | ✔️ | The IP address prefix for which to search. |
Returns
true
if the ip_address_prefix is a valid IPv4 address prefix, and it was found in source. Otherwise, the function returns false
.
Examples
Properly formatted IPv4 prefix
print result=has_ipv4_prefix('05:04:54 127.0.0.1 GET /favicon.ico 404', '127.0.')
result |
---|
true |
Invalid IPv4 prefix
print result=has_ipv4_prefix('05:04:54 127.0.0.1 GET /favicon.ico 404', '127.0')
result |
---|
false |
Invalid IPv4 address
print result=has_ipv4_prefix('05:04:54 127.0.0.256 GET /favicon.ico 404', '127.0.')
result |
---|
false |
Improperly delimited IPv4 address
print result=has_ipv4_prefix('05:04:54127.0.0.1 GET /favicon.ico 404', '127.0.')
result |
---|
false |
12.114 - has_ipv4()
Returns a value indicating whether a specified IPv4 address appears in a text.
IP address entrances in a text must be properly delimited with non-alphanumeric characters. For example, properly delimited IP addresses are:
- “These requests came from: 192.168.1.1, 10.1.1.115 and 10.1.1.201”
- “05:04:54 127.0.0.1 GET /favicon.ico 404”
Syntax
has_ipv4(
source ,
ip_address )
Parameters
Name | Type | Required | Description |
---|---|---|---|
source | string | ✔️ | The text to search. |
ip_address | string | ✔️ | The value containing the IP address for which to search. |
Returns
true
if the ip_address is a valid IPv4 address, and it was found in source. Otherwise, the function returns false
.
Examples
Properly formatted IP address
print result=has_ipv4('05:04:54 127.0.0.1 GET /favicon.ico 404', '127.0.0.1')
Output
result |
---|
true |
Invalid IP address
print result=has_ipv4('05:04:54 127.0.0.256 GET /favicon.ico 404', '127.0.0.256')
Output
result |
---|
false |
Improperly delimited IP
print result=has_ipv4('05:04:54127.0.0.1 GET /favicon.ico 404', '127.0.0.1')
Output
result |
---|
false |
12.115 - hash_combine()
Combines hash values of two or more hashes.
Syntax
hash_combine(
h1 ,
h2 [,
h3 …])
Parameters
Name | Type | Required | Description |
---|---|---|---|
h1, h2, … hN | long | ✔️ | The hash values to combine. |
Returns
The combined hash value of the given scalars.
Examples
print value1 = "Hello", value2 = "World"
| extend h1 = hash(value1), h2=hash(value2)
| extend combined = hash_combine(h1, h2)
Output
value1 | value2 | h1 | h2 | combined |
---|---|---|---|---|
Hello | World | 753694413698530628 | 1846988464401551951 | -1440138333540407281 |
12.116 - hash_many()
Returns a combined hash value of multiple values.
Syntax
hash_many(
s1 ,
s2 [,
s3 …])
Parameters
Name | Type | Required | Description |
---|---|---|---|
s1, s2, …, sN | scalar | ✔️ | The values to hash together. |
Returns
The hash() function is applied to each of the specified scalars. The resulting hashes are combined into a single hash and returned.
Examples
print value1 = "Hello", value2 = "World"
| extend combined = hash_many(value1, value2)
Output
value1 | value2 | combined |
---|---|---|
Hello | World | -1440138333540407281 |
12.117 - hash_md5()
Returns an MD5 hash value of the input.
Syntax
hash_md5(
source)
Parameters
Name | Type | Required | Description |
---|---|---|---|
source | scalar | ✔️ | The value to be hashed. |
Returns
The MD5 hash value of the given scalar, encoded as a hex string (a string of characters, each two of which represent a single Hex number between 0 and 255).
Examples
print
h1=hash_md5("World"),
h2=hash_md5(datetime(2020-01-01))
Output
h1 | h2 |
---|---|
f5a7924e621e84c9280a9a27e1bcb7f6 | 786c530672d1f8db31fee25ea8a9390b |
The following example uses the hash_md5()
function to aggregate StormEvents based on State’s MD5 hash value.
StormEvents
| summarize StormCount = count() by State, StateHash=hash_md5(State)
| top 5 by StormCount
Output
State | StateHash | StormCount |
---|---|---|
TEXAS | 3b00dbe6e07e7485a1c12d36c8e9910a | 4701 |
KANSAS | e1338d0ac8be43846cf9ae967bd02e7f | 3166 |
IOWA | 6d4a7c02942f093576149db764d4e2d2 | 2337 |
ILLINOIS | 8c00d9e0b3fcd55aed5657e42cc40cf1 | 2022 |
MISSOURI | 2d82f0c963c0763012b2539d469e5008 | 2016 |
12.118 - hash_sha1()
Returns a sha1 hash value of the source input.
Syntax
hash_sha1(
source)
Parameters
Name | Type | Required | Description |
---|---|---|---|
source | scalar | ✔️ | The value to be hashed. |
Returns
The sha1 hash value of the given scalar, encoded as a hex string (a string of characters, each two of which represent a single Hex number between 0 and 255).
Examples
print
h1=hash_sha1("World"),
h2=hash_sha1(datetime(2020-01-01))
Output
h1 | h2 |
---|---|
70c07ec18ef89c5309bbb0937f3a6342411e1fdd | e903e533f4d636b4fc0dcf3cf81e7b7f330de776 |
The following example uses the hash_sha1()
function to aggregate StormEvents based on State’s SHA1 hash value.
StormEvents
| summarize StormCount = count() by State, StateHash=hash_sha1(State)
| top 5 by StormCount desc
Output
State | StateHash | StormCount |
---|---|---|
TEXAS | 3128d805194d4e6141766cc846778eeacb12e3ea | 4701 |
KANSAS | ea926e17098148921e472b1a760cd5a8117e84d6 | 3166 |
IOWA | cacf86ec119cfd5b574bde5b59604774de3273db | 2337 |
ILLINOIS | 03740763b16dae9d799097f51623fe635d8c4852 | 2022 |
MISSOURI | 26d938907240121b54d9e039473dacc96e712f61 | 2016 |
12.119 - hash_sha256()
Returns a sha256 hash value of the source input.
Syntax
hash_sha256(
source)
Parameters
Name | Type | Required | Description |
---|---|---|---|
source | scalar | ✔️ | The value to be hashed. |
Returns
The sha256 hash value of the given scalar, encoded as a hex string (a string of characters, each two of which represent a single Hex number between 0 and 255).
Examples
print
h1=hash_sha256("World"),
h2=hash_sha256(datetime(2020-01-01))
Output
h1 | h2 |
---|---|
78ae647dc5544d227130a0682a51e30bc7777fbb6d8a8f17007463a3ecd1d524 | ba666752dc1a20eb750b0eb64e780cc4c968bc9fb8813461c1d7e750f302d71d |
The following example uses the hash_sha256()
function to aggregate StormEvents based on State’s SHA256 hash value.
StormEvents
| summarize StormCount = count() by State, StateHash=hash_sha256(State)
| top 5 by StormCount desc
Output
State | StateHash | StormCount |
---|---|---|
TEXAS | 9087f20f23f91b5a77e8406846117049029e6798ebbd0d38aea68da73a00ca37 | 4701 |
KANSAS | c80e328393541a3181b258cdb4da4d00587c5045e8cf3bb6c8fdb7016b69cc2e | 3166 |
IOWA | f85893dca466f779410f65cd904fdc4622de49e119ad4e7c7e4a291ceed1820b | 2337 |
ILLINOIS | ae3eeabfd7eba3d9a4ccbfed6a9b8cff269dc43255906476282e0184cf81b7fd | 2022 |
MISSOURI | d15dfc28abc3ee73b7d1f664a35980167ca96f6f90e034db2a6525c0b8ba61b1 | 2016 |
12.120 - hash_xxhash64()
Returns an xxhash64 value for the input value.
Syntax
hash_xxhash64(
source [,
mod])
Parameters
Name | Type | Required | Description |
---|---|---|---|
source | scalar | ✔️ | The value to be hashed. |
mod | int | A modulo value to be applied to the hash result, so that the output value is between 0 and mod - 1 . This parameter is useful for limiting the range of possible output values or for compressing the output of the hash function into a smaller range. |
Returns
The hash value of source. If mod is specified, the function returns the hash value modulo the value of mod, meaning that the output of the function will be the remainder of the hash value divided by mod. The output will be a value between 0
and mod - 1
, inclusive.
Examples
String input
print result=hash_xxhash64("World")
result |
---|
1846988464401551951 |
String input with mod
print result=hash_xxhash64("World", 100)
result |
---|
51 |
Datetime input
print result=hash_xxhash64(datetime("2015-01-01"))
result |
---|
1380966698541616202 |
12.121 - hash()
Returns a hash value for the input value.
Syntax
hash(
source [,
mod])
Parameters
Name | Type | Required | Description |
---|---|---|---|
source | scalar | ✔️ | The value to be hashed. |
mod | int | A modulo value to be applied to the hash result, so that the output value is between 0 and mod - 1 . This parameter is useful for limiting the range of possible output values or for compressing the output of the hash function into a smaller range. |
Returns
The hash value of source. If mod is specified, the function returns the hash value modulo the value of mod, meaning that the output of the function will be the remainder of the hash value divided by mod. The output will be a value between 0
and mod - 1
, inclusive.
Examples
String input
print result=hash("World")
result |
---|
1846988464401551951 |
String input with mod
print result=hash("World", 100)
result |
---|
51 |
Datetime input
print result=hash(datetime("2015-01-01"))
result |
---|
1380966698541616202 |
Use hash to check data distribution
Use the hash()
function for sampling data if the values in one of its columns is uniformly distributed. In the following example, StartTime values are uniformly distributed and the function is used to run a query on 10% of the data.
StormEvents
| where hash(StartTime, 10) == 0
| summarize StormCount = count(), TypeOfStorms = dcount(EventType) by State
| top 5 by StormCount desc
12.122 - hll_merge()
Merges HLL results. This is the scalar version of the aggregate version hll_merge()
.
Read about the underlying algorithm (HyperLogLog) and estimation accuracy.
Syntax
hll_merge(
hll,
hll2,
[ hll3,
… ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
hll, hll2, … | string | ✔️ | The column names containing HLL values to merge. The function expects between 2-64 arguments. |
Returns
Returns one HLL value. The value is the result of merging the columns hll, hll2, … hllN.
Examples
This example shows the value of the merged columns.
range x from 1 to 10 step 1
| extend y = x + 10
| summarize hll_x = hll(x), hll_y = hll(y)
| project merged = hll_merge(hll_x, hll_y)
| project dcount_hll(merged)
Output
dcount_hll_merged |
---|
20 |
Estimation accuracy
Related content
12.123 - hourofday()
Returns the integer number representing the hour number of the given date.
Syntax
hourofday(
date)
Parameters
Name | Type | Required | Description |
---|---|---|---|
date | datetime | ✔️ | The date for which to return the hour number. |
Returns
An integer between 0-23 representing the hour number of the day for date.
Example
print hour=hourofday(datetime(2015-12-14 18:54))
hour |
---|
18 |
12.124 - iff()
Returns the :::no-loc text=“then”::: value when the :::no-loc text=“if”::: condition evaluates to true
, otherwise it returns the :::no-loc text=“else”::: value.
Syntax
iff(
:::no-loc text=“if”:::,
:::no-loc text=“then”:::,
:::no-loc text=“else”:::)
Parameters
Name | Type | Required | Description |
---|---|---|---|
:::no-loc text=“if”::: | string | ✔️ | An expression that evaluates to a boolean value. |
:::no-loc text=“then”::: | scalar | ✔️ | An expression that returns its value when the :::no-loc text=“if”::: condition evaluates to true . |
:::no-loc text=“else”::: | scalar | ✔️ | An expression that returns its value when the :::no-loc text=“if”::: condition evaluates to false . |
Returns
This function returns the :::no-loc text=“then”::: value when the :::no-loc text=“if”::: condition evaluates to true
, otherwise it returns the :::no-loc text=“else”::: value.
Examples
Classify data using iff()
The following query uses the iff()
function to categorize storm events as either “Rain event” or “Not rain event” based on their event type, and then projects the state, event ID, event type, and the new rain category.
StormEvents
| extend Rain = iff((EventType in ("Heavy Rain", "Flash Flood", "Flood")), "Rain event", "Not rain event")
| project State, EventId, EventType, Rain
Output
The following table shows only the first five rows.
State | EventId | EventType | Rain |
---|---|---|---|
ATLANTIC SOUTH | 61032 | Waterspout | Not rain event |
FLORIDA | 60904 | Heavy Rain | Rain event |
FLORIDA | 60913 | Tornado | Not rain event |
GEORGIA | 64588 | Thunderstorm Wind | Not rain event |
MISSISSIPPI | 68796 | Thunderstorm Wind | Not rain event |
… | … | … | … |
Combine iff() with other functions
The following query calculates the total damage from crops and property, categorizes the severity of storm events based on total damage, direct injuries, and direct deaths, and then summarizes the total number of events and the number of events by severity.
StormEvents
| extend TotalDamage = DamageCrops + DamageProperty
| extend Severity = iff(TotalDamage > 1000000 or InjuriesDirect > 10 or DeathsDirect > 0, "High", iff(TotalDamage < 50000 and InjuriesDirect == 0 and DeathsDirect == 0, "Low", "Moderate"))
| summarize TotalEvents = count(), SeverityEvents = count() by Severity
Output
Severity | TotalEvents |
---|---|
Low | 54805 |
High | 977 |
Moderate | 3284 |
Related content
12.125 - indexof_regex()
regex
input.Returns the zero-based index of the first occurrence of a specified lookup regular expression within the input string.
See indexof()
.
Syntax
indexof_regex(
string,
match[,
start[,
length[,
occurrence]]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
string | string | ✔️ | The source string to search. |
match | string | ✔️ | The regular expression lookup string. |
start | int | The search start position. A negative value will offset the starting search position from the end of the string by this many steps: abs( start) . | |
length | int | The number of character positions to examine. A value of -1 means unlimited length. | |
occurrence | int | The number of the occurrence. The default is 1. |
Returns
The zero-based index position of match.
- Returns -1 if match isn’t found in string.
- Returns
null
if:- start is less than 0.
- occurrence is less than 0.
- length is less than -1.
Examples
print
idx1 = indexof_regex("abcabc", @"a.c"), // lookup found in input string
idx2 = indexof_regex("abcabcdefg", @"a.c", 0, 9, 2), // lookup found in input string
idx3 = indexof_regex("abcabc", @"a.c", 1, -1, 2), // there's no second occurrence in the search range
idx4 = indexof_regex("ababaa", @"a.a", 0, -1, 2), // Matches don't overlap so full lookup can't be found
idx5 = indexof_regex("abcabc", @"a|ab", -1) // invalid start argument
Output
idx1 | idx2 | idx3 | idx4 | idx5 |
---|---|---|---|---|
0 | 3 | -1 | -1 |
12.126 - indexof()
Reports the zero-based index of the first occurrence of a specified string within the input string.
For more information, see indexof_regex()
.
Syntax
indexof(
string,
match[,
start[,
length[,
occurrence]]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
string | string | ✔️ | The source string to search. |
match | string | ✔️ | The string for which to search. |
start | int | The search start position. A negative value will offset the starting search position from the end of the string by this many steps: abs( start) . | |
length | int | The number of character positions to examine. A value of -1 means unlimited length. | |
occurrence | int | The number of the occurrence. The default is 1. |
Returns
The zero-based index position of match.
- Returns -1 if match isn’t found in string.
- Returns
null
if:- start is less than 0.
- occurrence is less than 0.
- length is less than -1.
Examples
print
idx1 = indexof("abcdefg","cde") // lookup found in input string
, idx2 = indexof("abcdefg","cde",1,4) // lookup found in researched range
, idx3 = indexof("abcdefg","cde",1,2) // search starts from index 1, but stops after 2 chars, so full lookup can't be found
, idx4 = indexof("abcdefg","cde",3,4) // search starts after occurrence of lookup
, idx5 = indexof("abcdefg","cde",-5) // negative start index
, idx6 = indexof(1234567,5,1,4) // two first parameters were forcibly casted to strings "12345" and "5"
, idx7 = indexof("abcdefg","cde",2,-1) // lookup found in input string
, idx8 = indexof("abcdefgabcdefg", "cde", 1, 10, 2) // lookup found in input range
, idx9 = indexof("abcdefgabcdefg", "cde", 1, -1, 3) // the third occurrence of lookup is not in researched range
Output
idx1 | idx2 | idx3 | idx4 | idx5 | idx6 | idx7 | idx8 | idx9 |
---|---|---|---|---|---|---|---|---|
2 | 2 | -1 | -1 | 2 | 4 | 2 | 9 | -1 |
12.127 - ingestion_time()
Returns the approximate datetime in UTC format indicating when the current record was ingested.
This function must be used in the context of a table or a materialized view. Otherwise, this function produces null values.
If IngestionTime policy was not enabled when the data was ingested, the function returns null values.
Retrieves the datetime
when the record was ingested and ready for query.
Syntax
ingestion_time()
Returns
A datetime
value specifying the approximate time of ingestion into a table.
Example
T
| extend ingestionTime = ingestion_time() | top 10 by ingestionTime
12.128 - ipv4_compare()
Compares two IPv4 strings. The two IPv4 strings are parsed and compared while accounting for the combined IP-prefix mask calculated from argument prefixes, and the optional PrefixMask
argument.
Syntax
ipv4_compare(
Expr1,
Expr2[ ,
PrefixMask])
Parameters
Name | Type | Required | Description |
---|---|---|---|
Expr1, Expr2 | string | ✔️ | A string expression representing an IPv4 address. IPv4 strings can be masked using IP-prefix notation. |
PrefixMask | int | An integer from 0 to 32 representing the number of most-significant bits that are taken into account. |
Returns
0
: If the long representation of the first IPv4 string argument is equal to the second IPv4 string argument1
: If the long representation of the first IPv4 string argument is greater than the second IPv4 string argument-1
: If the long representation of the first IPv4 string argument is less than the second IPv4 string argumentnull
: If conversion for one of the two IPv4 strings wasn’t successful.
Examples: IPv4 comparison equality cases
Compare IPs using the IP-prefix notation specified inside the IPv4 strings
datatable(ip1_string:string, ip2_string:string)
[
'192.168.1.0', '192.168.1.0', // Equal IPs
'192.168.1.1/24', '192.168.1.255', // 24 bit IP-prefix is used for comparison
'192.168.1.1', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison
'192.168.1.1/30', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison
]
| extend result = ipv4_compare(ip1_string, ip2_string)
Output
ip1_string | ip2_string | result |
---|---|---|
192.168.1.0 | 192.168.1.0 | 0 |
192.168.1.1/24 | 192.168.1.255 | 0 |
192.168.1.1 | 192.168.1.255/24 | 0 |
192.168.1.1/30 | 192.168.1.255/24 | 0 |
Compare IPs using IP-prefix notation specified inside the IPv4 strings and as additional argument of the ipv4_compare()
function
datatable(ip1_string:string, ip2_string:string, prefix:long)
[
'192.168.1.1', '192.168.1.0', 31, // 31 bit IP-prefix is used for comparison
'192.168.1.1/24', '192.168.1.255', 31, // 24 bit IP-prefix is used for comparison
'192.168.1.1', '192.168.1.255', 24, // 24 bit IP-prefix is used for comparison
]
| extend result = ipv4_compare(ip1_string, ip2_string, prefix)
Output
ip1_string | ip2_string | prefix | result |
---|---|---|---|
192.168.1.1 | 192.168.1.0 | 31 | 0 |
192.168.1.1/24 | 192.168.1.255 | 31 | 0 |
192.168.1.1 | 192.168.1.255 | 24 | 0 |
Related content
- Overview of IPv4/IPv6 functions
- Overview of IPv4 text match functions
12.129 - ipv4_is_in_any_range()
Checks whether IPv4 string address is in any of the specified IPv4 address ranges.
Performance tips
Syntax
ipv4_is_in_any_range(
Ipv4Address ,
Ipv4Range [ ,
Ipv4Range …] )
ipv4_is_in_any_range(
Ipv4Address ,
Ipv4Ranges )
Parameters
Name | Type | Required | Description |
---|---|---|---|
Ipv4Address | string | ✔️ | An expression representing an IPv4 address. |
Ipv4Range | string | ✔️ | An IPv4 range or list of IPv4 ranges written with IP-prefix notation. |
Ipv4Ranges | dynamic | ✔️ | A dynamic array containing IPv4 ranges written with IP-prefix notation. |
Returns
true
: If the IPv4 address is in the range of any of the specified IPv4 networks.false
: Otherwise.null
: If conversion for one of the two IPv4 strings wasn’t successful.
Examples
Syntax using list of strings
print Result=ipv4_is_in_any_range('192.168.1.6', '192.168.1.1/24', '10.0.0.1/8', '127.1.0.1/16')
Output
Result |
---|
true |
Syntax using dynamic array
print Result=ipv4_is_in_any_range("127.0.0.1", dynamic(["127.0.0.1", "192.168.1.1"]))
Output
Result |
---|
true |
Extend table with IPv4 range check
let LocalNetworks=dynamic([
"192.168.1.1/16",
"127.0.0.1/8",
"10.0.0.1/8"
]);
let IPs=datatable(IP:string) [
"10.1.2.3",
"192.168.1.5",
"123.1.11.21",
"1.1.1.1"
];
IPs
| extend IsLocal=ipv4_is_in_any_range(IP, LocalNetworks)
Output
IP | IsLocal |
---|---|
10.1.2.3 | true |
192.168.1.5 | true |
123.1.11.21 | false |
1.1.1.1 | false |
Related content
- Overview of IPv4/IPv6 functions
- Overview of IPv4 text match functions
12.130 - ipv4_is_in_range()
Checks if IPv4 string address is in IPv4-prefix notation range.
Syntax
ipv4_is_in_range(
Ipv4Address,
Ipv4Range)
Parameters
Name | Type | Required | Description |
---|---|---|---|
Ipv4Address | string | ✔️ | An expression representing an IPv4 address. |
Ipv4Range | string | ✔️ | An IPv4 range or list of IPv4 ranges written with IP-prefix notation. |
Returns
true
: If the long representation of the first IPv4 string argument is in range of the second IPv4 string argument.false
: Otherwise.null
: If conversion for one of the two IPv4 strings wasn’t successful.
Example
datatable(ip_address:string, ip_range:string)
[
'192.168.1.1', '192.168.1.1', // Equal IPs
'192.168.1.1', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison
]
| extend result = ipv4_is_in_range(ip_address, ip_range)
Output
ip_address | ip_range | result |
---|---|---|
192.168.1.1 | 192.168.1.1 | true |
192.168.1.1 | 192.168.1.255/24 | true |
Related content
- Overview of IPv4/IPv6 functions
- Overview of IPv4 text match functions
12.131 - ipv4_is_match()
Matches two IPv4 strings. The two IPv4 strings are parsed and compared while accounting for the combined IP-prefix mask calculated from argument prefixes, and the optional prefix
argument.
Syntax
ipv4_is_match(
ip1,
ip2[ ,
prefix])
Parameters
Name | Type | Required | Description |
---|---|---|---|
ip1, ip2 | string | ✔️ | An expression representing an IPv4 address. IPv4 strings can be masked using IP-prefix notation. |
prefix | int | An integer from 0 to 32 representing the number of most-significant bits that are taken into account. |
Returns
true
: If the long representation of the first IPv4 string argument is equal to the second IPv4 string argument.false
: Otherwise.null
: If conversion for one of the two IPv4 strings wasn’t successful.
Examples
Simple example
print ipv4_is_match('192.168.1.1/24', '192.168.1.255')
Output
print_0 |
---|
true |
IPv4 comparison equality - IP-prefix notation specified inside the IPv4 strings
datatable(ip1_string:string, ip2_string:string)
[
'192.168.1.0', '192.168.1.0', // Equal IPs
'192.168.1.1/24', '192.168.1.255', // 24 bit IP-prefix is used for comparison
'192.168.1.1', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison
'192.168.1.1/30', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison
]
| extend result = ipv4_is_match(ip1_string, ip2_string)
Output
ip1_string | ip2_string | result |
---|---|---|
192.168.1.0 | 192.168.1.0 | true |
192.168.1.1/24 | 192.168.1.255 | true |
192.168.1.1 | 192.168.1.255/24 | true |
192.168.1.1/30 | 192.168.1.255/24 | true |
IPv4 comparison equality - IP-prefix notation specified inside the IPv4 strings and an additional argument of the ipv4_is_match()
function
datatable(ip1_string:string, ip2_string:string, prefix:long)
[
'192.168.1.1', '192.168.1.0', 31, // 31 bit IP-prefix is used for comparison
'192.168.1.1/24', '192.168.1.255', 31, // 24 bit IP-prefix is used for comparison
'192.168.1.1', '192.168.1.255', 24, // 24 bit IP-prefix is used for comparison
]
| extend result = ipv4_is_match(ip1_string, ip2_string, prefix)
Output
ip1_string | ip2_string | prefix | result |
---|---|---|---|
192.168.1.1 | 192.168.1.0 | 31 | true |
192.168.1.1/24 | 192.168.1.255 | 31 | true |
192.168.1.1 | 192.168.1.255 | 24 | true |
Related content
- Overview of IPv4/IPv6 functions
- Overview of IPv4 text match functions
12.132 - ipv4_is_private()
Checks if the IPv4 string address belongs to a set of private network IPs.
Private network addresses were originally defined to help delay IPv4 address exhaustion. IP packets originating from or addressed to a private IP address can’t be routed through the public internet.
Private IPv4 addresses
The Internet Engineering Task Force (IETF) has directed the Internet Assigned Numbers Authority (IANA) to reserve the following IPv4 address ranges for private networks:
IP address range | Number of addresses | Largest CIDR block (subnet mask) |
---|---|---|
10.0.0.0 – 10.255.255.255 | 16777216 | 10.0.0.0/8 (255.0.0.0) |
172.16.0.0 – 172.31.255.255 | 1048576 | 172.16.0.0/12 (255.240.0.0) |
192.168.0.0 – 192.168.255.255 | 65536 | 192.168.0.0/16 (255.255.0.0) |
ipv4_is_private('192.168.1.1/24') == true
ipv4_is_private('10.1.2.3/24') == true
ipv4_is_private('202.1.2.3') == false
ipv4_is_private("127.0.0.1") == false
Syntax
ipv4_is_private(
ip)
Parameters
Name | Type | Required | Description |
---|---|---|---|
ip | string | ✔️ | An expression representing an IPv4 address. IPv4 strings can be masked using IP-prefix notation. |
Returns
true
: If the IPv4 address belongs to any of the private network ranges.false
: Otherwise.null
: If parsing of the input as IPv4 address string wasn’t successful.
Example: Check if IPv4 belongs to a private network
datatable(ip_string:string)
[
'10.1.2.3',
'192.168.1.1/24',
'127.0.0.1',
]
| extend result = ipv4_is_private(ip_string)
Output
ip_string | result |
---|---|
10.1.2.3 | true |
192.168.1.1/24 | true |
127.0.0.1 | false |
Related content
- Overview of IPv4/IPv6 functions
- Overview of IPv4 text match functions
12.133 - ipv4_netmask_suffix()
Returns the value of the IPv4 netmask suffix from an IPv4 string address.
Syntax
ipv4_netmask_suffix(
ip)
Parameters
Name | Type | Required | Description |
---|---|---|---|
ip | string | ✔️ | An expression representing an IPv4 address. IPv4 strings can be masked using IP-prefix notation. |
Returns
- The value of the netmask suffix the IPv4 address. If the suffix isn’t present in the input, a value of
32
(full netmask suffix) is returned. null
: If parsing the input as an IPv4 address string wasn’t successful.
Example: Resolve IPv4 mask suffix
datatable(ip_string:string)
[
'10.1.2.3',
'192.168.1.1/24',
'127.0.0.1/16',
]
| extend cidr_suffix = ipv4_netmask_suffix(ip_string)
Output
ip_string | cidr_suffix |
---|---|
10.1.2.3 | 32 |
192.168.1.1/24 | 24 |
127.0.0.1/16 | 16 |
Related content
- Overview of IPv4/IPv6 functions
- Overview of IPv4 text match functions
12.134 - ipv4_range_to_cidr_list()
Converts a IPv4 address range denoted by starting and ending IPv4 addresses to a list of IPv4 ranges in CIDR notation.
Syntax
ipv4_range_to_cidr_list(
StartAddress ,
EndAddress )
Parameters
Name | Type | Required | Description |
---|---|---|---|
StartAddress | string | ✔️ | An expression representing a starting IPv4 address of the range. |
EndAddress | string | ✔️ | An expression representing an ending IPv4 address of the range. |
Returns
A dynamic array object containing the list of ranges in CIDR notation.
Examples
print start_IP="1.1.128.0", end_IP="1.1.140.255"
| project ipv4_range_list = ipv4_range_to_cidr_list(start_IP, end_IP)
Output
ipv4_range_list |
---|
["1.1.128.0/21", "1.1.136.0/22","1.1.140.0/24"] |
Related content
- Overview of IPv4/IPv6 functions
- Overview of IPv4 text match functions
12.135 - ipv6_compare()
Compares two IPv6 or IPv4 network address strings. The two IPv6 strings are parsed and compared while accounting for the combined IP-prefix mask calculated from argument prefixes, and the optional prefix
argument.
Syntax
ipv6_compare(
ip1,
ip2[ ,
prefix])
Parameters
Name | Type | Required | Description |
---|---|---|---|
ip1, ip2 | string | ✔️ | An expression representing an IPv6 or IPv4 address. IPv6 and IPv4 strings can be masked using IP-prefix notation. |
prefix | int | An integer from 0 to 128 representing the number of most significant bits that are taken into account. |
Returns
0
: If the long representation of the first IPv6 string argument is equal to the second IPv6 string argument.1
: If the long representation of the first IPv6 string argument is greater than the second IPv6 string argument.-1
: If the long representation of the first IPv6 string argument is less than the second IPv6 string argument.null
: If conversion for one of the two IPv6 strings wasn’t successful.
Examples: IPv6/IPv4 comparison equality cases
Compare IPs using the IP-prefix notation specified inside the IPv6/IPv4 strings
datatable(ip1_string:string, ip2_string:string)
[
// IPv4 are compared as IPv6 addresses
'192.168.1.1', '192.168.1.1', // Equal IPs
'192.168.1.1/24', '192.168.1.255', // 24 bit IP4-prefix is used for comparison
'192.168.1.1', '192.168.1.255/24', // 24 bit IP4-prefix is used for comparison
'192.168.1.1/30', '192.168.1.255/24', // 24 bit IP4-prefix is used for comparison
// IPv6 cases
'fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7994', // Equal IPs
'fe80::85d:e82c:9446:7994/120', 'fe80::85d:e82c:9446:7998', // 120 bit IP6-prefix is used for comparison
'fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7998/120', // 120 bit IP6-prefix is used for comparison
'fe80::85d:e82c:9446:7994/120', 'fe80::85d:e82c:9446:7998/120', // 120 bit IP6-prefix is used for comparison
// Mixed case of IPv4 and IPv6
'192.168.1.1', '::ffff:c0a8:0101', // Equal IPs
'192.168.1.1/24', '::ffff:c0a8:01ff', // 24 bit IP-prefix is used for comparison
'::ffff:c0a8:0101', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison
'::192.168.1.1/30', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison
]
| extend result = ipv6_compare(ip1_string, ip2_string)
Output
ip1_string | ip2_string | result |
---|---|---|
192.168.1.1 | 192.168.1.1 | 0 |
192.168.1.1/24 | 192.168.1.255 | 0 |
192.168.1.1 | 192.168.1.255/24 | 0 |
192.168.1.1/30 | 192.168.1.255/24 | 0 |
fe80::85d:e82c:9446:7994 | fe80::85d:e82c:9446:7994 | 0 |
fe80::85d:e82c:9446:7994/120 | fe80::85d:e82c:9446:7998 | 0 |
fe80::85d:e82c:9446:7994 | fe80::85d:e82c:9446:7998/120 | 0 |
fe80::85d:e82c:9446:7994/120 | fe80::85d:e82c:9446:7998/120 | 0 |
192.168.1.1 | ::ffff:c0a8:0101 | 0 |
192.168.1.1/24 | ::ffff:c0a8:01ff | 0 |
::ffff:c0a8:0101 | 192.168.1.255/24 | 0 |
::192.168.1.1/30 | 192.168.1.255/24 | 0 |
Compare IPs using IP-prefix notation specified inside the IPv6/IPv4 strings and as additional argument of the ipv6_compare()
function
datatable(ip1_string:string, ip2_string:string, prefix:long)
[
// IPv4 are compared as IPv6 addresses
'192.168.1.1', '192.168.1.0', 31, // 31 bit IP4-prefix is used for comparison
'192.168.1.1/24', '192.168.1.255', 31, // 24 bit IP4-prefix is used for comparison
'192.168.1.1', '192.168.1.255', 24, // 24 bit IP4-prefix is used for comparison
// IPv6 cases
'fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7995', 127, // 127 bit IP6-prefix is used for comparison
'fe80::85d:e82c:9446:7994/127', 'fe80::85d:e82c:9446:7998', 120, // 120 bit IP6-prefix is used for comparison
'fe80::85d:e82c:9446:7994/120', 'fe80::85d:e82c:9446:7998', 127, // 120 bit IP6-prefix is used for comparison
// Mixed case of IPv4 and IPv6
'192.168.1.1/24', '::ffff:c0a8:01ff', 127, // 127 bit IP6-prefix is used for comparison
'::ffff:c0a8:0101', '192.168.1.255', 120, // 120 bit IP6-prefix is used for comparison
'::192.168.1.1/30', '192.168.1.255/24', 127, // 120 bit IP6-prefix is used for comparison
]
| extend result = ipv6_compare(ip1_string, ip2_string, prefix)
Output
ip1_string | ip2_string | prefix | result |
---|---|---|---|
192.168.1.1 | 192.168.1.0 | 31 | 0 |
192.168.1.1/24 | 192.168.1.255 | 31 | 0 |
192.168.1.1 | 192.168.1.255 | 24 | 0 |
fe80::85d:e82c:9446:7994 | fe80::85d:e82c:9446:7995 | 127 | 0 |
fe80::85d:e82c:9446:7994/127 | fe80::85d:e82c:9446:7998 | 120 | 0 |
fe80::85d:e82c:9446:7994/120 | fe80::85d:e82c:9446:7998 | 127 | 0 |
192.168.1.1/24 | ::ffff:c0a8:01ff | 127 | 0 |
::ffff:c0a8:0101 | 192.168.1.255 | 120 | 0 |
::192.168.1.1/30 | 192.168.1.255/24 | 127 | 0 |
Related content
- Overview of IPv4/IPv6 functions
12.136 - ipv6_is_in_any_range()
Checks whether an IPv6 string address is in any of the specified IPv6 address ranges.
Performance tips
Syntax
ipv6_is_in_any_range(
Ipv6Address ,
Ipv6Range [ ,
Ipv6Range …] )
ipv6_is_in_any_range(
Ipv6Address ,
Ipv6Ranges )
Parameters
Name | Type | Required | Description |
---|---|---|---|
Ipv6Address | string | ✔️ | An expression representing an IPv6 address. |
Ipv6Range | string | ✔️ | An expression representing an IPv6 range using IP-prefix notation. |
Ipv6Ranges | dynamic | ✔️ | An array containing IPv6 ranges using IP-prefix notation. |
Returns
true
: If the IPv6 address is in the range of any of the specified IPv6 networks.false
: Otherwise.null
: If conversion for one of the two IPv6 strings wasn’t successful.
Example
let LocalNetworks=dynamic([
"a5e:f127:8a9d:146d:e102:b5d3:c755:f6cd/112",
"0:0:0:0:0:ffff:c0a8:ac/60"
]);
let IPs=datatable(IP:string) [
"a5e:f127:8a9d:146d:e102:b5d3:c755:abcd",
"a5e:f127:8a9d:146d:e102:b5d3:c755:abce",
"a5e:f127:8a9d:146d:e102:b5d3:c755:abcf",
"a5e:f127:8a9d:146d:e102:b5d3:c756:abd1",
];
IPs
| extend IsLocal=ipv6_is_in_any_range(IP, LocalNetworks)
Output
IP | IsLocal |
---|---|
a5e:f127:8a9d:146d:e102:b5d3:c755:abcd | True |
a5e:f127:8a9d:146d:e102:b5d3:c755:abce | True |
a5e:f127:8a9d:146d:e102:b5d3:c755:abcf | True |
a5e:f127:8a9d:146d:e102:b5d3:c756:abd1 | False |
Related content
- Overview of IPv4/IPv6 functions
12.137 - ipv6_is_in_range()
Checks if an IPv6 string address is in the IPv6-prefix notation range.
Syntax
ipv6_is_in_range(
Ipv6Address,
Ipv6Range)
Parameters
Name | Type | Required | Description |
---|---|---|---|
Ipv6Address | string | ✔️ | An expression representing an IPv6 address. |
Ipv6Range | string | ✔️ | An expression representing an IPv6 range using IP-prefix notation. |
Returns
true
: If the long representation of the first IPv6 string argument is in range of the second IPv6 string argument.false
: Otherwise.null
: If conversion for one of the two IPv6 strings wasn’t successful.
Example
datatable(ip_address:string, ip_range:string)
[
'a5e:f127:8a9d:146d:e102:b5d3:c755:abcd', 'a5e:f127:8a9d:146d:e102:b5d3:c755:0000/112',
'a5e:f127:8a9d:146d:e102:b5d3:c755:abcd', 'a5e:f127:8a9d:146d:e102:b5d3:c755:abcd',
'a5e:f127:8a9d:146d:e102:b5d3:c755:abcd', '0:0:0:0:0:ffff:c0a8:ac/60',
]
| extend result = ipv6_is_in_range(ip_address, ip_range)
Output
ip_address | ip_range | result |
---|---|---|
a5e:f127:8a9d:146d:e102:b5d3:c755:abcd | a5e:f127:8a9d:146d:e102:b5d3:c755:0000/112 | True |
a5e:f127:8a9d:146d:e102:b5d3:c755:abcd | a5e:f127:8a9d:146d:e102:b5d3:c755:abcd | True |
a5e:f127:8a9d:146d:e102:b5d3:c755:abcd | 0:0:0:0:0:ffff:c0a8:ac/60 | False |
Related content
- Overview of IPv4/IPv6 functions
12.138 - ipv6_is_match()
Matches two IPv6 or IPv4 network address strings. The two IPv6/IPv4 strings are parsed and compared while accounting for the combined IP-prefix mask calculated from argument prefixes, and the optional prefix
argument.
Syntax
ipv6_is_match(
ip1,
ip2[ ,
prefix])
Parameters
Name | Type | Required | Description |
---|---|---|---|
ip1, ip2 | string | ✔️ | An expression representing an IPv6 or IPv4 address. IPv6 and IPv4 strings can be masked using IP-prefix notation. |
prefix | int | An integer from 0 to 128 representing the number of most-significant bits that are taken into account. |
Returns
true
: If the long representation of the first IPv6/IPv4 string argument is equal to the second IPv6/IPv4 string argument.false
: Otherwise.null
: If conversion for one of the two IPv6/IPv4 strings wasn’t successful.
Examples
IPv6/IPv4 comparison equality case - IP-prefix notation specified inside the IPv6/IPv4 strings
datatable(ip1_string:string, ip2_string:string)
[
// IPv4 are compared as IPv6 addresses
'192.168.1.1', '192.168.1.1', // Equal IPs
'192.168.1.1/24', '192.168.1.255', // 24 bit IP4-prefix is used for comparison
'192.168.1.1', '192.168.1.255/24', // 24 bit IP4-prefix is used for comparison
'192.168.1.1/30', '192.168.1.255/24', // 24 bit IP4-prefix is used for comparison
// IPv6 cases
'fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7994', // Equal IPs
'fe80::85d:e82c:9446:7994/120', 'fe80::85d:e82c:9446:7998', // 120 bit IP6-prefix is used for comparison
'fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7998/120', // 120 bit IP6-prefix is used for comparison
'fe80::85d:e82c:9446:7994/120', 'fe80::85d:e82c:9446:7998/120', // 120 bit IP6-prefix is used for comparison
// Mixed case of IPv4 and IPv6
'192.168.1.1', '::ffff:c0a8:0101', // Equal IPs
'192.168.1.1/24', '::ffff:c0a8:01ff', // 24 bit IP-prefix is used for comparison
'::ffff:c0a8:0101', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison
'::192.168.1.1/30', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison
]
| extend result = ipv6_is_match(ip1_string, ip2_string)
Output
ip1_string | ip2_string | result |
---|---|---|
192.168.1.1 | 192.168.1.1 | 1 |
192.168.1.1/24 | 192.168.1.255 | 1 |
192.168.1.1 | 192.168.1.255/24 | 1 |
192.168.1.1/30 | 192.168.1.255/24 | 1 |
fe80::85d:e82c:9446:7994 | fe80::85d:e82c:9446:7994 | 1 |
fe80::85d:e82c:9446:7994/120 | fe80::85d:e82c:9446:7998 | 1 |
fe80::85d:e82c:9446:7994 | fe80::85d:e82c:9446:7998/120 | 1 |
fe80::85d:e82c:9446:7994/120 | fe80::85d:e82c:9446:7998/120 | 1 |
192.168.1.1 | ::ffff:c0a8:0101 | 1 |
192.168.1.1/24 | ::ffff:c0a8:01ff | 1 |
::ffff:c0a8:0101 | 192.168.1.255/24 | 1 |
::192.168.1.1/30 | 192.168.1.255/24 | 1 |
IPv6/IPv4 comparison equality case- IP-prefix notation specified inside the IPv6/IPv4 strings and as additional argument of the ipv6_is_match()
function
datatable(ip1_string:string, ip2_string:string, prefix:long)
[
// IPv4 are compared as IPv6 addresses
'192.168.1.1', '192.168.1.0', 31, // 31 bit IP4-prefix is used for comparison
'192.168.1.1/24', '192.168.1.255', 31, // 24 bit IP4-prefix is used for comparison
'192.168.1.1', '192.168.1.255', 24, // 24 bit IP4-prefix is used for comparison
// IPv6 cases
'fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7995', 127, // 127 bit IP6-prefix is used for comparison
'fe80::85d:e82c:9446:7994/127', 'fe80::85d:e82c:9446:7998', 120, // 120 bit IP6-prefix is used for comparison
'fe80::85d:e82c:9446:7994/120', 'fe80::85d:e82c:9446:7998', 127, // 120 bit IP6-prefix is used for comparison
// Mixed case of IPv4 and IPv6
'192.168.1.1/24', '::ffff:c0a8:01ff', 127, // 127 bit IP6-prefix is used for comparison
'::ffff:c0a8:0101', '192.168.1.255', 120, // 120 bit IP6-prefix is used for comparison
'::192.168.1.1/30', '192.168.1.255/24', 127, // 120 bit IP6-prefix is used for comparison
]
| extend result = ipv6_is_match(ip1_string, ip2_string, prefix)
Output
ip1_string | ip2_string | prefix | result |
---|---|---|---|
192.168.1.1 | 192.168.1.0 | 31 | 1 |
192.168.1.1/24 | 192.168.1.255 | 31 | 1 |
192.168.1.1 | 192.168.1.255 | 24 | 1 |
fe80::85d:e82c:9446:7994 | fe80::85d:e82c:9446:7995 | 127 | 1 |
fe80::85d:e82c:9446:7994/127 | fe80::85d:e82c:9446:7998 | 120 | 1 |
fe80::85d:e82c:9446:7994/120 | fe80::85d:e82c:9446:7998 | 127 | 1 |
192.168.1.1/24 | ::ffff:c0a8:01ff | 127 | 1 |
::ffff:c0a8:0101 | 192.168.1.255 | 120 | 1 |
::192.168.1.1/30 | 192.168.1.255/24 | 127 | 1 |
Related content
- Overview of IPv4/IPv6 functions
12.139 - isascii()
Returns true
if the argument is a valid ASCII string.
Syntax
isascii(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | string | ✔️ | The value to check if a valid ASCII string. |
Returns
A boolean value indicating whether value is a valid ASCII string.
Example
print result=isascii("some string")
Output
result |
---|
true |
12.140 - isempty()
Returns true
if the argument is an empty string or is null.
Syntax
isempty(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | string | ✔️ | The value to check if empty or null. |
Returns
A boolean value indicating whether value is an empty string or is null.
Example
x | isempty(x) |
---|---|
"" | true |
“x” | false |
parsejson("") | true |
parsejson("[]") | false |
parsejson("{}") | false |
12.141 - isfinite()
Returns whether the input is a finite value, meaning it’s not infinite or NaN.
Syntax
isfinite(
number)
Parameters
Name | Type | Required | Description |
---|---|---|---|
number | real | ✔️ | The value to check if finite. |
Returns
true
if x is finite and false
otherwise.
Example
range x from -1 to 1 step 1
| extend y = 0.0
| extend div = 1.0*x/y
| extend isfinite=isfinite(div)
Output
x | y | div | isfinite |
---|---|---|---|
-1 | 0 | -∞ | 0 |
0 | 0 | NaN | 0 |
1 | 0 | ∞ | 0 |
Related content
12.142 - isinf()
Returns whether the input is an infinite (positive or negative) value.
Syntax
isinf(
number)
Parameters
Name | Type | Required | Description |
---|---|---|---|
number | real | ✔️ | The value to check if infinite. |
Returns
true
if x is a positive or negative infinite and false
otherwise.
Example
range x from -1 to 1 step 1
| extend y = 0.0
| extend div = 1.0*x/y
| extend isinf=isinf(div)
Output
x | y | div | isinf |
---|---|---|---|
-1 | 0 | -∞ | true |
0 | 0 | NaN | false |
1 | 0 | ∞ | true |
Related content
- To check if a value is null, see isnull().
- To check if a value is finite, see isfinite().
- To check if a value is NaN (Not-a-Number), see isnan().
12.143 - isnan()
Returns whether the input is a Not-a-Number (NaN) value.
Syntax
isnan(
number)
Parameters
Name | Type | Required | Description |
---|---|---|---|
number | scalar | ✔️ | The value to check if NaN. |
Returns
true
if x is NaN and false
otherwise.
Example
range x from -1 to 1 step 1
| extend y = (-1*x)
| extend div = 1.0*x/y
| extend isnan=isnan(div)
Output
x | y | div | isnan |
---|---|---|---|
-1 | 1 | -1 | false |
0 | 0 | NaN | true |
1 | -1 | -1 | false |
Related content
- To check if a value is null, see isnull().
- To check if a value is finite, see isfinite().
- To check if a value is infinite, see isinf().
12.144 - isnotempty()
Returns true
if the argument isn’t an empty string, and it isn’t null.
Syntax
isnotempty(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | scalar | ✔️ | The value to check if not empty or null. |
Returns
true
if value isn’t null and false
otherwise.
Example
Find the storm events for which there’s a begin location.
StormEvents
| where isnotempty(BeginLat) and isnotempty(BeginLon)
12.145 - isnotnull()
Returns true
if the argument isn’t null.
Syntax
isnotnull(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | scalar | ✔️ | The value to check if not null. |
Returns
true
if value isn’t null and false
otherwise.
Example
Find the storm events for which there’s a begin location.
StormEvents
| where isnotnull(BeginLat) and isnotnull(BeginLon)
12.146 - isnull()
Evaluates an expression and returns a Boolean result indicating whether the value is null.
Syntax
isnull(
Expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
Expr | scalar | ✔️ | The expression to evaluate whether the value is null. The expression can be any scalar value other than strings, arrays, or objects that always return false . For more information, see The dynamic data type. |
Returns
Returns true
if the value is null and false
otherwise. Empty strings, arrays, property bags, and objects always return false
.
The following table lists return values for different expressions (x):
x | isnull(x) |
---|---|
"" | false |
"x" | false |
parse_json("") | true |
parse_json("[]") | false |
parse_json("{}") | false |
Example
Find the storm events for which there’s no begin location.
StormEvents
| where isnull(BeginLat) and isnull(BeginLon)
| project StartTime, EndTime, EpisodeId, EventId, State, EventType, BeginLat, BeginLon
Output
StartTime | EndTime | EpisodeId | EventId | State | EventType | BeginLat | BeginLon |
---|---|---|---|---|---|---|---|
2007-01-01T00:00:00Z | 2007-01-01T05:00:00Z | 4171 | 23358 | WISCONSIN | Winter Storm | ||
2007-01-01T00:00:00Z | 2007-01-31T23:59:00Z | 1492 | 7067 | MINNESOTA | Drought | ||
2007-01-01T00:00:00Z | 2007-01-31T23:59:00Z | 1492 | 7068 | MINNESOTA | Drought | ||
2007-01-01T00:00:00Z | 2007-01-31T23:59:00Z | 1492 | 7069 | MINNESOTA | Drought | ||
2007-01-01T00:00:00Z | 2007-01-31T23:59:00Z | 1492 | 7065 | MINNESOTA | Drought | ||
2007-01-01T00:00:00Z | 2007-01-31T23:59:00Z | 1492 | 7070 | MINNESOTA | Drought | ||
2007-01-01T00:00:00Z | 2007-01-31T23:59:00Z | 1492 | 7071 | MINNESOTA | Drought | ||
2007-01-01T00:00:00Z | 2007-01-31T23:59:00Z | 1492 | 7072 | MINNESOTA | Drought | ||
2007-01-01T00:00:00Z | 2007-01-31T23:59:00Z | 2380 | 11735 | MINNESOTA | Drought | ||
2007-01-01T00:00:00Z | 2007-01-31T23:59:00Z | 1492 | 7073 | MINNESOTA | Drought | ||
2007-01-01T00:00:00Z | 2007-01-31T23:59:00Z | 2240 | 10857 | TEXAS | Drought | ||
2007-01-01T00:00:00Z | 2007-01-31T23:59:00Z | 2240 | 10858 | TEXAS | Drought | ||
2007-01-01T00:00:00Z | 2007-01-31T23:59:00Z | 1492 | 7066 | MINNESOTA | Drought | ||
… | … | … | … | … | … | … | … |
12.147 - isutf8()
Returns true
if the argument is a valid UTF8 string.
Syntax
isutf8(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | string | ✔️ | The value to check if a valid UTF8 string. |
Returns
A boolean value indicating whether value is a valid UTF8 string.
Example
print result=isutf8("some string")
12.148 - jaccard_index()
Calculates the Jaccard index of two input sets.
Syntax
jaccard_index
(set1, set2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
set1 | dynamic | ✔️ | The array representing the first set for the calculation. |
set2 | dynamic | ✔️ | The array representing the second set for the calculation. |
Returns
The Jaccard index of the two input sets. The Jaccard index formula is |set1 ∩ set2| / |set1 ∪ set2|.
Examples
print set1=dynamic([1,2,3]), set2=dynamic([1,2,3,4])
| extend jaccard=jaccard_index(set1, set2)
Output
set1 | set2 | jaccard |
---|---|---|
[1,2,3] | [1,2,3,4] | 0.75 |
12.149 - log()
The natural logarithm is the base-e logarithm: the inverse of the natural exponential function (exp).
Syntax
log(
number)
Parameters
Name | Type | Required | Description |
---|---|---|---|
number | real | ✔️ | The number for which to calculate the logarithm. |
Returns
log()
returns the natural logarithm of the input.null
if the argument is negative or null or can’t be converted to areal
value.
Example
print result=log(5)
Output
result |
---|
1.6094379124341003 |
Related content
12.150 - log10()
log10()
returns the common (base-10) logarithm of the input.
Syntax
log10(
number)
Parameters
Name | Type | Required | Description |
---|---|---|---|
number | real | ✔️ | The number for which to calculate the base-10 logarithm. |
Returns
- The common logarithm is the base-10 logarithm: the inverse of the exponential function (exp) with base 10.
null
if the argument is negative or null or can’t be converted to areal
value.
Example
print result=log10(5)
Output
result |
---|
0.69897000433601886 |
Related content
12.151 - log2()
The logarithm is the base-2 logarithm: the inverse of the exponential function (exp) with base 2.
Syntax
log2(
number)
Parameters
Name | Type | Required | Description |
---|---|---|---|
number | real | ✔️ | The number for which to calculate the base-2 logarithm. |
Returns
- The logarithm is the base-2 logarithm: the inverse of the exponential function (exp) with base 2.
null
if the argument is negative or null or can’t be converted to areal
value.
Example
print result=log2(5)
Output
result |
---|
2.3219280948873622 |
Related content
12.152 - loggamma()
Computes log of the absolute value of the gamma function
Syntax
loggamma(
number)
Parameters
Name | Type | Required | Description |
---|---|---|---|
number | real | ✔️ | The number for which to calculate the gamma. |
Example
print result=loggamma(5)
Output
result |
---|
3.1780538303479458 |
Returns
- Returns the natural logarithm of the absolute value of the gamma function of x.
- For computing gamma function, see gamma().
12.153 - make_datetime()
Creates a datetime scalar value between the specified date and time.
Syntax
make_datetime(
year, month, day)
make_datetime(
year, month, day, hour, minute)
make_datetime(
year, month, day, hour, minute, second)
Parameters
Name | Type | Required | Description |
---|---|---|---|
year | int | ✔️ | The year value between 0 to 9999. |
month | int | ✔️ | The month value between 1 to 12. |
day | int | ✔️ | The day value between 1 to 28-31, depending on the month. |
hour | int | The hour value between 0 to 23. | |
minute | int | The minute value between 0 to 59. | |
second | double | The second value between 0 to 59.9999999. |
Returns
If successful, the result will be a datetime value, otherwise, the result will be null.
Example
print year_month_day = make_datetime(2017,10,01)
Output
year_month_day |
---|
2017-10-01 00:00:00.0000000 |
print year_month_day_hour_minute = make_datetime(2017,10,01,12,10)
Output
year_month_day_hour_minute |
---|
2017-10-01 12:10:00.0000000 |
print year_month_day_hour_minute_second = make_datetime(2017,10,01,12,11,0.1234567)
Output
year_month_day_hour_minute_second |
---|
2017-10-01 12:11:00.1234567 |
12.154 - make_timespan()
Creates a timespan scalar value from the specified time period.
Syntax
make_timespan(
hour, minute)
make_timespan(
hour, minute, second)
make_timespan(
day, hour, minute, second)
Parameters
Name | Type | Required | Description |
---|---|---|---|
day | int | ✔️ | The day. |
hour | int | ✔️ | The hour. A value from 0-23. |
minute | int | The minute. A value from 0-59. | |
second | real | The second. A value from 0 to 59.9999999. |
Returns
If the creation is successful, the result will be a timespan value. Otherwise, the result will be null.
Example
print ['timespan'] = make_timespan(1,12,30,55.123)
Output
timespan |
---|
1.12:30:55.1230000 |
Related content
12.155 - max_of()
Returns the maximum value of all argument expressions.
Syntax
max_of(
arg,
arg_2,
[ arg_3,
… ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
arg_i | scalar | ✔️ | The values to compare. |
- All arguments must be of the same type.
- Maximum of 64 arguments is supported.
- Non-null values take precedence to null values.
Returns
The maximum value of all argument expressions.
Examples
Find the largest number
This query returns the maximum value of the numbers in the string.
print result = max_of(10, 1, -3, 17)
Output
result |
---|
17 |
Find the maximum value in a data-table
This query returns the highest value from columns A and B. Notice that non-null values take precedence over null values.
datatable (A: int, B: int)
[
1, 6,
8, 1,
int(null), 2,
1, int(null),
int(null), int(null)
]
| project max_of(A, B)
Output
result |
---|
6 |
8 |
2 |
1 |
(null) |
Find the maximum datetime
This query returns the later of the two datetime values from columns A and B.
datatable (A: datetime, B: datetime)
[
datetime(2024-12-15 07:15:22), datetime(2024-12-15 07:15:24),
datetime(2024-12-15 08:00:00), datetime(2024-12-15 09:30:00),
datetime(2024-12-15 10:45:00), datetime(2024-12-14 10:45:00)
]
| project maxDate = max_of(A, B)
Output
maxDate |
---|
2024-12-15 07:15:24 |
2024-12-15 09:30:00 |
2024-12-15 10:45:00 |
12.156 - merge_tdigest()
Merges tdigest
results (scalar version of the aggregate version tdigest_merge()
).
Read more about the underlying algorithm (T-Digest) and the estimated error here.
Syntax
merge_tdigest(
exprs)
Parameters
Name | Type | Required | Description |
---|---|---|---|
exprs | dynamic | ✔️ | One or more comma-separated column references that have the tdigest values to be merged. |
Returns
The result for merging the columns *Expr1*
, *Expr2*
, … *ExprN*
to one tdigest
.
Example
range x from 1 to 10 step 1
| extend y = x + 10
| summarize tdigestX = tdigest(x), tdigestY = tdigest(y)
| project merged = merge_tdigest(tdigestX, tdigestY)
| project percentile_tdigest(merged, 100, typeof(long))
Output
percentile_tdigest_merged |
---|
20 |
12.157 - min_of()
Returns the minimum value of several evaluated scalar expressions.
Syntax
min_of
(
arg,
arg_2,
[ arg_3, … ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
arg, arg_2, … | scalar | ✔️ | A comma separated list of 2-64 scalar expressions to compare. The function returns the minimum value among these expressions. |
- All arguments must be of the same type.
- Maximum of 64 arguments is supported.
- Non-null values take precedence to null values.
Returns
The minimum value of all argument expressions.
Examples
Find the maximum value in an array:
print result=min_of(10, 1, -3, 17)
Output
result |
---|
-3 |
Find the minimum value in a data-table. Non-null values take precedence over null values:
datatable (A: int, B: int)
[
5, 2,
10, 1,
int(null), 3,
1, int(null),
int(null), int(null)
]
| project min_of(A, B)
Output
result |
---|
2 |
1 |
3 |
1 |
(null) |
12.158 - monthofyear()
Returns the integer number from 1-12 representing the month number of the given year.
Syntax
monthofyear(
date)
Parameters
Name | Type | Required | Description |
---|---|---|---|
date | datetime | ✔️ | The date for which to find the month number. |
Returns
An integer from 1-12 representing the month number of the given year.
Example
print result=monthofyear(datetime("2015-12-14"))
Output
result |
---|
12 |
12.159 - new_guid()
Returns a random GUID (Globally Unique Identifier).
Syntax
new_guid()
Returns
A new value of type guid
.
Example
print guid=new_guid()
Output
guid |
---|
2157828f-e871-479a-9d1c-17ffde915095 |
12.160 - not()
Reverses the value of its bool
argument.
Syntax
not(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | scalar | ✔️ | An expression that evaluates to a boolean value. The result of this expression is reversed. |
Returns
Returns the reversed logical value of its bool
argument.
Examples
The following query returns the number of events that are not a tornado, per state.
StormEvents
| where not(EventType == "Tornado")
| summarize count() by State
Output
State | Count |
---|---|
TEXAS | 4485 |
KANSAS | 3005 |
IOWA | 2286 |
ILLINOIS | 1999 |
MISSOURI | 1971 |
GEORGIA | 1927 |
MINNESOTA | 1863 |
WISCONSIN | 1829 |
NEBRASKA | 1715 |
NEW YORK | 1746 |
… | … |
The following query excludes records where either the EventType is hail, or the state is Alaska.
StormEvents
| where not(EventType == "Hail" or State == "Alaska")
The next query excludes records where both the EventType is hail and the state is Alaska simultaneously.
StormEvents
| where not(EventType == "Hail" and State == "Alaska")
Combine with other conditions
You can also combine the not() function with other conditions. The following query returns all records where the EventType is not a flood and the property damage is greater than $1,000,000.
StormEvents
| where not(EventType == "Flood") and DamageProperty > 1000000
12.161 - now()
Returns the current UTC time, optionally offset by a given timespan.
The current UTC time will stay the same across all uses of now()
in a single query statement, even if there’s technically a small time difference between when each now()
runs.
Syntax
now(
[ offset ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
offset | timespan | A timespan to add to the current UTC clock time. The default value is 0. |
Returns
The current UTC clock time, plus the offset time if provided, as a datetime
.
Examples
Show the current time
print now()
Show the time 2 days ago
print now(-2d)
Find time elapsed from a given event
The following example shows the time elapsed since the start of the storm events.
StormEvents
| extend Elapsed=now() - StartTime
| take 10
Get the date relative to a specific time interval
let T = datatable(label: string, timespanValue: timespan) [
"minute", 60s,
"hour", 1h,
"day", 1d,
"year", 365d
];
T
| extend timeAgo = now() - timespanValue
Output
label | timespanValue | timeAgo |
---|---|---|
year | 365.00:00:00 | 2022-06-19T08:22:54.6623324Z |
day | 1.00:00:00 | 2023-06-18T08:22:54.6623324Z |
hour | 01:00:00 | 2023-06-19T07:22:54.6623324Z |
minute | 00:01:00 | 2023-06-19T08:21:54.6623324Z |
12.162 - pack_all()
Creates a dynamic property bag object from all the columns of the tabular expression.
Syntax
pack_all(
[ ignore_null_empty ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
ignore_null_empty | bool | Indicates whether to ignore null/empty columns and exclude them from the resulting property bag. The default value is false . |
Example
The following query will use pack_all()
to create columns for the below table.
SourceNumber | TargetNumber | CharsCount |
---|---|---|
555-555-1234 | 555-555-1212 | 46 |
555-555-1234 | 555-555-1213 | 50 |
555-555-1313 | 42 | |
555-555-3456 | 74 |
datatable(SourceNumber:string,TargetNumber:string,CharsCount:long)
[
'555-555-1234','555-555-1212',46,
'555-555-1234','555-555-1213',50,
'555-555-1313','',42,
'','555-555-3456',74
]
| extend Packed=pack_all(), PackedIgnoreNullEmpty=pack_all(true)
Output
SourceNumber | TargetNumber | CharsCount | Packed | PackedIgnoreNullEmpty |
---|---|---|---|---|
555-555-1234 | 555-555-1212 | 46 | {“SourceNumber”:“555-555-1234”, “TargetNumber”:“555-555-1212”, “CharsCount”: 46} | {“SourceNumber”:“555-555-1234”, “TargetNumber”:“555-555-1212”, “CharsCount”: 46} |
555-555-1234 | 555-555-1213 | 50 | {“SourceNumber”:“555-555-1234”, “TargetNumber”:“555-555-1213”, “CharsCount”: 50} | {“SourceNumber”:“555-555-1234”, “TargetNumber”:“555-555-1213”, “CharsCount”: 50} |
555-555-1313 | 42 | {“SourceNumber”:“555-555-1313”, “TargetNumber”:"", “CharsCount”: 42} | {“SourceNumber”:“555-555-1313”, “CharsCount”: 42} | |
555-555-3456 | 74 | {“SourceNumber”:"", “TargetNumber”:“555-555-3456”, “CharsCount”: 74} | {“TargetNumber”:“555-555-3456”, “CharsCount”: 74} |
12.163 - pack_array()
Packs all input values into a dynamic array.
Syntax
pack_array(
value1,
[ value2, … ])
pack_array(*)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value1…valueN | string | ✔️ | Input expressions to be packed into a dynamic array. |
The wildcard * | string | Providing the wildcard * packs all input columns into a dynamic array. |
Returns
A dynamic array that includes the values of value1, value2, … valueN.
Example
range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| project pack_array(x, y, z)
Output
Column1 |
---|
[1,2,4] |
[2,4,8] |
[3,6,12] |
range x from 1 to 3 step 1
| extend y = tostring(x * 2)
| extend z = (x * 2) * 1s
| project pack_array(x, y, z)
Output
Column1 |
---|
[1,“2”,“00:00:02”] |
[2,“4”,“00:00:04”] |
[3,“6”,“00:00:06”] |
12.164 - parse_command_line()
Parses a Unicode command-line string and returns a dynamic array of the command-line arguments.
Syntax
parse_command_line(
command_line, parser_type)
Parameters
Name | Type | Required | Description |
---|---|---|---|
command_line | string | ✔️ | The command line value to parse. |
parser_type | string | ✔️ | The only value that is currently supported is "windows" , which parses the command line the same way as CommandLineToArgvW. |
Returns
A dynamic array of the command-line arguments.
Example
print parse_command_line("echo \"hello world!\"", "windows")
Output
Result |
---|
[“echo”,“hello world!”] |
12.165 - parse_csv()
Splits a given string representing a single record of comma-separated values and returns a string array with these values.
Syntax
parse_csv(
csv_text)
Parameters
Name | Type | Required | Description |
---|---|---|---|
csv_text | string | ✔️ | A single record of comma-separated values. |
Returns
A string array that contains the split values.
Examples
Filter by count of values in record
Count the conference sessions with more than three participants.
ConferenceSessions
| where array_length(parse_csv(participants)) > 3
| distinct *
Output
sessionid | … | participants |
---|---|---|
CON-PRT157 | … | Guy Reginiano, Guy Yehudy, Pankaj Suri, Saeed Copty |
BRK3099 | … | Yoni Leibowitz, Eric Fleischman, Robert Pack, Avner Aharoni |
Use escaping quotes
print result=parse_csv('aa,"b,b,b",cc,"Escaping quotes: ""Title""","line1\nline2"')
Output
result |
---|
[ “aa”, “b,b,b”, “cc”, “Escaping quotes: "Title"”, “line1\nline2” ] |
CSV with multiple records
Only the first record is taken since this function doesn’t support multiple records.
print result_multi_record=parse_csv('record1,a,b,c\nrecord2,x,y,z')
Output
result_multi_record |
---|
[ “record1”, “a”, “b”, “c” ] |
12.166 - parse_ipv4_mask()
Converts the input string of IPv4 and netmask to a signed, 64-bit wide, long number representation in big-endian order.
Syntax
parse_ipv4_mask(
ip ,
prefix)
Parameters
Name | Type | Required | Description |
---|---|---|---|
ip | string | ✔️ | The IPv4 address to convert to a long number. |
prefix | int | ✔️ | An integer from 0 to 32 representing the number of most-significant bits that are taken into account. |
Returns
If conversion is successful, the result is a long number.
If conversion isn’t successful, the result is null
.
Example
print parse_ipv4_mask("127.0.0.1", 24)
12.167 - parse_ipv4()
Converts IPv4 string to a signed 64-bit wide long number representation in big-endian order.
Syntax
parse_ipv4(
ip)
Parameters
Name | Type | Required | Description |
---|---|---|---|
ip | string | ✔️ | The IPv4 that is converted to long. The value may include net-mask using IP-prefix notation. |
Returns
If conversion is successful, the result is a long number.
If conversion isn’t successful, the result is null
.
Example
datatable(ip_string: string)
[
'192.168.1.1', '192.168.1.1/24', '255.255.255.255/31'
]
| extend ip_long = parse_ipv4(ip_string)
Output
ip_string | ip_long |
---|---|
192.168.1.1 | 3232235777 |
192.168.1.1/24 | 3232235776 |
255.255.255.255/31 | 4294967294 |
12.168 - parse_ipv6_mask()
Converts IPv6/IPv4 string and netmask to a canonical IPv6 string representation.
Syntax
parse_ipv6_mask(
ip,
prefix)
Parameters
Name | Type | Required | Description |
---|---|---|---|
ip | string | The IPv6/IPv4 network address to convert to canonical IPv6 representation. The value may include net-mask using IP-prefix notation. | |
prefix | int | An integer from 0 to 128 representing the number of most-significant bits that are taken into account. |
Returns
If conversion is successful, the result is a string representing a canonical IPv6 network address. If conversion isn’t successful, the result is an empty string.
Example
datatable(ip_string: string, netmask: long)
[
// IPv4 addresses
'192.168.255.255', 120, // 120-bit netmask is used
'192.168.255.255/24', 124, // 120-bit netmask is used, as IPv4 address doesn't use upper 8 bits
'255.255.255.255', 128, // 128-bit netmask is used
// IPv6 addresses
'fe80::85d:e82c:9446:7994', 128, // 128-bit netmask is used
'fe80::85d:e82c:9446:7994/120', 124, // 120-bit netmask is used
// IPv6 with IPv4 notation
'::192.168.255.255', 128, // 128-bit netmask is used
'::192.168.255.255/24', 128, // 120-bit netmask is used, as IPv4 address doesn't use upper 8 bits
]
| extend ip6_canonical = parse_ipv6_mask(ip_string, netmask)
Output
ip_string | netmask | ip6_canonical |
---|---|---|
192.168.255.255 | 120 | 0000:0000:0000:0000:0000:ffff:c0a8:ff00 |
192.168.255.255/24 | 124 | 0000:0000:0000:0000:0000:ffff:c0a8:ff00 |
255.255.255.255 | 128 | 0000:0000:0000:0000:0000:ffff:ffff:ffff |
fe80::85d:e82c:9446:7994 | 128 | fe80:0000:0000:0000:085d:e82c:9446:7994 |
fe80::85d:e82c:9446:7994/120 | 124 | fe80:0000:0000:0000:085d:e82c:9446:7900 |
::192.168.255.255 | 128 | 0000:0000:0000:0000:0000:ffff:c0a8:ffff |
::192.168.255.255/24 | 128 | 0000:0000:0000:0000:0000:ffff:c0a8:ff00 |
12.169 - parse_ipv6()
Converts IPv6 or IPv4 string to a canonical IPv6 string representation.
Syntax
parse_ipv6(
ip)
Parameters
Name | Type | Required | Description |
---|---|---|---|
ip | string | ✔️ | The IPv6/IPv4 network address that is converted to canonical IPv6 representation. The value may include net-mask using IP-prefix notation. |
Returns
If conversion is successful, the result is a string representing a canonical IPv6 network address. If conversion isn’t successful, the result is an empty string.
Example
datatable(ipv4: string)
[
'192.168.255.255', '192.168.255.255/24', '255.255.255.255'
]
| extend ipv6 = parse_ipv6(ipv4)
Output
ipv4 | ipv6 |
---|---|
192.168.255.255 | 0000:0000:0000:0000:0000:ffff:c0a8:ffff |
192.168.255.255/24 | 0000:0000:0000:0000:0000:ffff:c0a8:ff00 |
255.255.255.255 | 0000:0000:0000:0000:0000:ffff:ffff:ffff |
12.170 - parse_json() function
dynamic
.Interprets a string
as a JSON value and returns the value as dynamic
. If possible, the value is converted into relevant data types. For strict parsing with no data type conversion, use extract() or extract_json() functions.
It’s better to use the parse_json() function over the extract_json() function when you need to extract more than one element of a JSON compound object. Use dynamic() when possible.
Syntax
parse_json(
json)
Parameters
Name | Type | Required | Description |
---|---|---|---|
json | string | ✔️ | The string in the form of a JSON-formatted value or a dynamic property bag to parse as JSON. |
Returns
An object of type dynamic
that is determined by the value of json:
- If json is of type
dynamic
, its value is used as-is. - If json is of type
string
, and is a properly formatted JSON string, then the string is parsed, and the value produced is returned. - If json is of type
string
, but it isn’t a properly formatted JSON string, then the returned value is an object of typedynamic
that holds the originalstring
value.
Example
In the following example, when context_custom_metrics
is a string
that looks like this:
{"duration":{"value":118.0,"count":5.0,"min":100.0,"max":150.0,"stdDev":0.0,"sampledValue":118.0,"sum":118.0}}
then the following query retrieves the value of the duration
slot in the object, and from that it retrieves two slots, duration.value
and duration.min
(118.0
and 110.0
, respectively).
datatable(context_custom_metrics:string)
[
'{"duration":{"value":118.0,"count":5.0,"min":100.0,"max":150.0,"stdDev":0.0,"sampledValue":118.0,"sum":118.0}}'
]
| extend d = parse_json(context_custom_metrics)
| extend duration_value = d.duration.value, duration_min = d.duration.min
Notes
It’s common to have a JSON string describing a property bag in which one of the “slots” is another JSON string.
For example:
let d='{"a":123, "b":"{\\"c\\":456}"}';
print d
In such cases, it isn’t only necessary to invoke parse_json
twice, but also to make sure that in the second call, tostring
is used. Otherwise, the second call to parse_json
will just pass on the input to the output as-is, because its declared type is dynamic
.
let d='{"a":123, "b":"{\\"c\\":456}"}';
print d_b_c=parse_json(tostring(parse_json(d).b)).c
Related content
12.171 - parse_path()
Parses a file path string
and returns a dynamic
object that contains the following parts of the path:
- Scheme
- RootPath
- DirectoryPath
- DirectoryName
- Filename
- Extension
- AlternateDataStreamName
In addition to the simple paths with both types of slashes, the function supports paths with:
- Schemas. For example, “file://…”
- Shared paths. For example, “\shareddrive\users…”
- Long paths. For example, “\?\C:…”"
Syntax
parse_path(
path)
Parameters
Name | Type | Required | Description |
---|---|---|---|
path | string | ✔️ | The file path. |
Returns
An object of type dynamic
that included the path components as listed above.
Example
datatable(p:string)
[
@"C:\temp\file.txt",
@"temp\file.txt",
"file://C:/temp/file.txt:some.exe",
@"\\shared\users\temp\file.txt.gz",
"/usr/lib/temp/file.txt"
]
| extend path_parts = parse_path(p)
Output
p | path_parts |
---|---|
C:\temp\file.txt | {“Scheme”:"",“RootPath”:“C:”,“DirectoryPath”:“C:\temp”,“DirectoryName”:“temp”,“Filename”:“file.txt”,“Extension”:“txt”,“AlternateDataStreamName”:""} |
temp\file.txt | {“Scheme”:"",“RootPath”:"",“DirectoryPath”:“temp”,“DirectoryName”:“temp”,“Filename”:“file.txt”,“Extension”:“txt”,“AlternateDataStreamName”:""} |
file://C:/temp/file.txt:some.exe | {“Scheme”:“file”,“RootPath”:“C:”,“DirectoryPath”:“C:/temp”,“DirectoryName”:“temp”,“Filename”:“file.txt”,“Extension”:“txt”,“AlternateDataStreamName”:“some.exe”} |
\shared\users\temp\file.txt.gz | {“Scheme”:"",“RootPath”:"",“DirectoryPath”:"\\shared\users\temp",“DirectoryName”:“temp”,“Filename”:“file.txt.gz”,“Extension”:“gz”,“AlternateDataStreamName”:""} |
/usr/lib/temp/file.txt | {“Scheme”:"",“RootPath”:"",“DirectoryPath”:"/usr/lib/temp",“DirectoryName”:“temp”,“Filename”:“file.txt”,“Extension”:“txt”,“AlternateDataStreamName”:""} |
12.172 - parse_url()
Parses an absolute URL string
and returns a dynamic
object contains URL parts.
Syntax
parse_url(
url)
Parameters
Name | Type | Required | Description |
---|---|---|---|
url | string | ✔️ | An absolute URL, including its scheme, or the query part of the URL. For example, use the absolute https://bing.com instead of bing.com . |
Returns
An object of type dynamic that included the URL components: Scheme, Host, Port, Path, Username, Password, Query Parameters, Fragment.
Example
print Result=parse_url("scheme://username:password@host:1234/this/is/a/path?k1=v1&k2=v2#fragment")
Output
Result |
---|
{“Scheme”:“scheme”, “Host”:“host”, “Port”:“1234”, “Path”:“this/is/a/path”, “Username”:“username”, “Password”:“password”, “Query Parameters”:"{“k1”:“v1”, “k2”:“v2”}", “Fragment”:“fragment”} |
12.173 - parse_urlquery()
Returns a dynamic
object that contains the query parameters.
Syntax
parse_urlquery(
query)
Parameters
Name | Type | Required | Description |
---|---|---|---|
query | string | ✔️ | The query part of the URL. The format must follow URL query standards (key=value& …). |
Returns
An object of type dynamic that includes the query parameters.
Examples
print Result=parse_urlquery("k1=v1&k2=v2&k3=v3")
Output
Result |
---|
{ “Query Parameters”:"{“k1”:“v1”, “k2”:“v2”, “k3”:“v3”}" } |
The following example uses a function to extract specific query parameters.
let getQueryParamValue = (querystring: string, param: string) {
let params = parse_urlquery(querystring);
tostring(params["Query Parameters"].[param])
};
print UrlQuery = 'view=vs-2019&preserve-view=true'
| extend view = getQueryParamValue(UrlQuery, 'view')
| extend preserve = getQueryParamValue(UrlQuery, 'preserve-view')
Output
UrlQuery | view | preserve |
---|---|---|
view=vs-2019&preserve-view=true | vs-2019 | true |
12.174 - parse_user_agent()
Interprets a user-agent string, which identifies the user’s browser and provides certain system details to servers hosting the websites the user visits. The result is returned as dynamic
.
Syntax
parse_user_agent(
user-agent-string, look-for)
Parameters
Name | Type | Required | Description |
---|---|---|---|
user-agent-string | string | ✔️ | The user-agent string to parse. |
look-for | string or dynamic | ✔️ | The value to search for in user-agent-string. The possible options are “browser”, “os”, or “device”. If only a single parsing target is required, it can be passed a string parameter. If two or three targets are required, they can be passed as a dynamic array. |
Returns
An object of type dynamic
that contains the information about the requested parsing targets.
Browser: Family, MajorVersion, MinorVersion, Patch
OperatingSystem: Family, MajorVersion, MinorVersion, Patch, PatchMinor
Device: Family, Brand, Model
When the function is used in a query, make sure it runs in a distributed manner on multiple machines. If queries with this function are frequently used, you may want to pre-create the results via update policy, but you need to take into account that using this function inside the update policy will increase the ingestion latency.
Examples
Look-for parameter as string
print useragent = "Mozilla/5.0 (Windows; U; en-US) AppleWebKit/531.9 (KHTML, like Gecko) AdobeAIR/2.5.1"
| extend x = parse_user_agent(useragent, "browser")
Expected result is a dynamic object:
{
"Browser": {
"Family": "AdobeAIR",
"MajorVersion": "2",
"MinorVersion": "5",
"Patch": "1"
}
}
Look-for parameter as dynamic array
print useragent = "Mozilla/5.0 (SymbianOS/9.2; U; Series60/3.1 NokiaN81-3/10.0.032 Profile/MIDP-2.0 Configuration/CLDC-1.1 ) AppleWebKit/413 (KHTML, like Gecko) Safari/4"
| extend x = parse_user_agent(useragent, dynamic(["browser","os","device"]))
Expected result is a dynamic object:
{
"Browser": {
"Family": "Nokia OSS Browser",
"MajorVersion": "3",
"MinorVersion": "1",
"Patch": ""
},
"OperatingSystem": {
"Family": "Symbian OS",
"MajorVersion": "9",
"MinorVersion": "2",
"Patch": "",
"PatchMinor": ""
},
"Device": {
"Family": "Nokia N81",
"Brand": "Nokia",
"Model": "N81-3"
}
}
12.175 - parse_version()
Converts the input string representation of a version number into a decimal number that can be compared.
Syntax
parse_version
(
version)
Parameters
Name | Type | Required | Description |
---|---|---|---|
version | string | ✔️ | The version to be parsed. |
Returns
If conversion is successful, the result is a decimal; otherwise, the result is null
.
Examples
Parse version strings
The following query shows version strings with their parsed version numbers.
let dt = datatable(v: string)
[
"0.0.0.5", "0.0.7.0", "0.0.3", "0.2", "0.1.2.0", "1.2.3.4", "1"
];
dt
| extend parsedVersion = parse_version(v)
Output
v | parsedVersion |
---|---|
0.0.0.5 | 5 |
0.0.7.0 | 700,000,000 |
0.0.3 | 300,000,000 |
0.2 | 20,000,000,000,000,000 |
0.1.2.0 | 10,000,000,200,000,000 |
1.2.3.4 | 1,000,000,020,000,000,300,000,004 |
1 | 1,000,000,000,000,000,000,000,000 |
Compare parsed version strings
The following query identifies which labs have equipment needing updates by comparing their parsed version strings to the minimum version number “1.0.0.0”.
let dt = datatable(lab: string, v: string)
[
"Lab A", "0.0.0.5",
"Lab B", "0.0.7.0",
"Lab D","0.0.3",
"Lab C", "0.2",
"Lab G", "0.1.2.0",
"Lab F", "1.2.3.4",
"Lab E", "1",
];
dt
| extend parsed_version = parse_version(v)
| extend needs_update = iff(parsed_version < parse_version("1.0.0.0"), "Yes", "No")
| project lab, v, needs_update
| sort by lab asc , v, needs_update
Output
lab | v | needs_update |
---|---|---|
Lab A | 0.0.0.5 | Yes |
Lab B | 0.0.7.0 | Yes |
Lab C | 0.2 | Yes |
Lab D | 0.0.3 | Yes |
Lab E | 1 | No |
Lab F | 1.2.3.4 | No |
Lab G | 0.1.2.0 | Yes |
12.176 - parse_xml()
Interprets a string
as an XML value, converts the value to a JSON, and returns the value as dynamic
.
Syntax
parse_xml(
xml)
Parameters
Name | Type | Required | Description |
---|---|---|---|
xml | string | ✔️ | The XML-formatted string value to parse. |
Returns
An object of type dynamic that is determined by the value of xml, or null, if the XML format is invalid.
The conversion is done as follows:
XML | JSON | Access |
---|---|---|
<e/> | { “e”: null } | o.e |
<e>text</e> | { “e”: “text” } | o.e |
<e name="value" /> | { “e”:{"@name": “value”} } | o.e["@name"] |
<e name="value">text</e> | { “e”: { “@name”: “value”, “#text”: “text” } } | o.e["@name"] o.e["#text"] |
<e> <a>text</a> <b>text</b> </e> | { “e”: { “a”: “text”, “b”: “text” } } | o.e.a o.e.b |
<e> <a>text</a> <a>text</a> </e> | { “e”: { “a”: [“text”, “text”] } } | o.e.a[0] o.e.a[1] |
<e> text <a>text</a> </e> | { “e”: { “#text”: “text”, “a”: “text” } } | 1`o.e["#text"] o.e.a |
Example
In the following example, when context_custom_metrics
is a string
that looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<duration>
<value>118.0</value>
<count>5.0</count>
<min>100.0</min>
<max>150.0</max>
<stdDev>0.0</stdDev>
<sampledValue>118.0</sampledValue>
<sum>118.0</sum>
</duration>
then the following CSL Fragment translates the XML to the following JSON:
{
"duration": {
"value": 118.0,
"count": 5.0,
"min": 100.0,
"max": 150.0,
"stdDev": 0.0,
"sampledValue": 118.0,
"sum": 118.0
}
}
and retrieves the value of the duration
slot
in the object, and from that it retrieves two slots, duration.value
and
duration.min
(118.0
and 100.0
, respectively).
T
| extend d=parse_xml(context_custom_metrics)
| extend duration_value=d.duration.value, duration_min=d["duration"]["min"]
12.177 - percentile_array_tdigest()
Calculates the percentile result from the tdigest
results (which was generated by tdigest() or tdigest_merge())
Syntax
percentiles_array_tdigest(
tdigest,
percentile1 [,
percentile2,
…])
percentiles_array_tdigest(
tdigest,
Dynamic array [,
typeLiteral ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
tdigest | string | ✔️ | The tdigest or tdigest_merge() results used to calculate the percentiles. |
percentile | real | ✔️ | A value or comma-separated list of values that specifies the percentiles. |
Dynamic array | dynamic | ✔️ | A dynamic array of real numbers that specify the percentiles. |
typeLiteral | string | A type literal. For example, typeof(long) . If provided, the result set is of this type. |
Returns
The percentile/percentiles value of each value in tdigest.
a dynamic array that includes the results. (such like percentiles()
)
Examples
StormEvents
| summarize tdigestRes = tdigest(DamageProperty) by State
| project percentiles_array_tdigest(tdigestRes, range(0, 100, 50), typeof(int))
Output
percentile_tdigest_tdigestRes |
---|
[0,0,0] |
[0,0,62000000] |
[0,0,110000000] |
[0,0,1200000] |
[0,0,250000] |
12.178 - percentile_tdigest()
Calculates the percentile result from the tdigest
results (which was generated by tdigest() or tdigest_merge())
Syntax
percentile_tdigest(
expr,
percentile1 ,
typeLiteral)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | An expression that was generated by tdigest or tdigest_merge(). |
percentile | long | ✔️ | The value that specifies the percentile. |
typeLiteral | string | A type literal. If provided, the result set will be of this type. For example, typeof(long) will cast all results to type long . |
Returns
The percentile value of each value in expr.
Examples
StormEvents
| summarize tdigestRes = tdigest(DamageProperty) by State
| project percentile_tdigest(tdigestRes, 100)
Output
percentile_tdigest_tdigestRes |
---|
0 |
62000000 |
110000000 |
1200000 |
250000 |
StormEvents
| summarize tdigestRes = tdigest(DamageProperty) by State
| union (StormEvents | summarize tdigestRes = tdigest(EndTime) by State)
| project percentile_tdigest(tdigestRes, 100)
Output
percentile_tdigest_tdigestRes |
---|
[0] |
[62000000] |
[“2007-12-20T11:30:00.0000000Z”] |
[“2007-12-31T23:59:00.0000000Z”] |
12.179 - percentrank_tdigest()
Calculates the approximate rank of the value in a set, where rank is expressed as a percentage of the set’s size. This function can be viewed as the inverse of the percentile.
Syntax
percentrank_tdigest(
digest,
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
digest | string | ✔️ | An expression that was generated by tdigest() or tdigest_merge(). |
value | scalar | ✔️ | An expression representing a value to be used for percentage ranking calculation. |
Returns
The percentage rank of value in a dataset.
Examples
Getting the percentrank_tdigest() of the damage property that valued 4490$ is ~85%:
StormEvents
| summarize tdigestRes = tdigest(DamageProperty)
| project percentrank_tdigest(tdigestRes, 4490)
Output
Column1 |
---|
85.0015237192293 |
Using percentile 85 over the damage property should give 4490$:
StormEvents
| summarize tdigestRes = tdigest(DamageProperty)
| project percentile_tdigest(tdigestRes, 85, typeof(long))
Output
percentile_tdigest_tdigestRes |
---|
4490 |
12.180 - pi()
Returns the constant value of Pi.
Syntax
pi()
Returns
The double value of Pi (3.1415926…)
12.181 - pow()
Returns a result of raising to power
Syntax
pow(
base,
exponent )
Parameters
Name | Type | Required | Description |
---|---|---|---|
base | int, real, or long | ✔️ | The base value. |
exponent | int, real, or long | ✔️ | The exponent value. |
Returns
Returns base raised to the power exponent: base ^ exponent.
Example
print result=pow(2, 3)
Output
result |
---|
8 |
12.182 - punycode_domain_from_string
Decodes input string from encoded Internationalized Domain Name in Applications (IDNA) punycode form.
Syntax
punycode_domain_from_string(
encoded_string)
Parameters
Name | Type | Required | Description |
---|---|---|---|
encoded_string | string | ✔️ | An IDNA string to be decoded from punycode form. The function accepts one string argument. |
Returns
- Returns a
string
that represents the original Internationalized Domain Name. - Returns an empty result if decoding failed.
Example
datatable(encoded:string)
[
"xn--Ge-mia.Bulg.edu",
"xn--Lin-8na.Celtchair.org",
"xn--Ry-lja8c.xn--Jng-uta63a.xn--Bng-9ka.com",
]
| extend domain=punycode_domain_from_string(encoded)
encoded | domain |
---|---|
xn–Ge-mia.Bulg.edu | Gáe.Bulg.edu |
xn–Lin-8na.Celtchair.org | Lúin.Celtchair.org |
xn–Ry-lja8c.xn–Jng-uta63a.xn–Bng-9ka.com | Rúyì.Jīngū.Bàng.com |
Related content
- To encode a domain name to punycode form, see punycode_domain_to_string().
12.183 - punycode_domain_to_string
Encodes Internationalized Domain Name in Applications (IDNA) string to Punycode form.
Syntax
punycode_domain_to_string(
domain)
Parameters
Name | Type | Required | Description |
---|---|---|---|
domain | string | ✔️ | A string to be encoded to punycode form. The function accepts one string argument. |
Returns
- Returns a
string
that represents punycode-encoded original string. - Returns an empty result if encoding failed.
Examples
datatable(domain:string )['Lê Lợi。Thuận Thiên。com', 'Riðill。Skáldskaparmál。org', "Kaledvoulc'h.Artorījos.edu"]
| extend str=punycode_domain_to_string(domain)
domain | str |
---|---|
Lê Lợi。Thuận Thiên。com | xn–L Li-gpa4517b.xn–Thun Thin-s4a7194f.com |
Riðill。Skáldskaparmál。org | xn–Riill-jta.xn–Skldskaparml-dbbj.org |
Kaledvoulc’h.Artorījos.edu | Kaledvoulc’h.xn–Artorjos-ejb.edu |
Related content
- To retrieve the original decoded string, see punycode_domain_from_string().
12.184 - punycode_from_string
Encodes input string to Punycode form. The result string contains only ASCII characters. The result string doesn’t start with “xn–”.
Syntax
punycode_from_string('input_string')
Parameters
Name | Type | Required | Description |
---|---|---|---|
input_string | string | ✔️ | A string to be encoded to punycode form. The function accepts one string argument. |
Returns
- Returns a
string
that represents punycode-encoded original string. - Returns an empty result if encoding failed.
Examples
print encoded = punycode_from_string('académie-française')
encoded |
---|
acadmie-franaise-npb1a |
print domain='艺术.com'
| extend domain_vec = split(domain, '.')
| extend encoded_host = punycode_from_string(tostring(domain_vec[0]))
| extend encoded_domain = strcat('xn--', encoded_host, '.', domain_vec[1])
domain | domain_vec | encoded_host | encoded_domain |
---|---|---|---|
艺术.com | [“艺术”,“com”] | cqv902d | xn–cqv902d.com |
Related content
- Use punycode_to_string() to retrieve the original decoded string.
12.185 - punycode_to_string
Decodes input string from punycode form. The string shouldn’t contain the initial xn–, and must contain only ASCII characters.
Syntax
punycode_to_string('input_string')
Parameters
Name | Type | Required | Description |
---|---|---|---|
input_string | string | ✔️ | A string to be decoded from punycode form. The function accepts one string argument. |
Returns
- Returns a
string
that represents the original, decoded string. - Returns an empty result if decoding failed.
Example
print decoded = punycode_to_string('acadmie-franaise-npb1a')
decoded |
---|
académie-française |
Related content
- Use punycode_from_string() to encode a string to punycode form.
12.186 - radians()
Converts angle value in degrees into value in radians, using formula radians = (PI / 180 ) * angle_in_degrees
Syntax
radians(
degrees)
Parameters
Name | Type | Required | Description |
---|---|---|---|
degrees | real | ✔️ | The angle in degrees. |
Returns
The corresponding angle in radians for an angle specified in degrees.
Example
print radians0 = radians(90), radians1 = radians(180), radians2 = radians(360)
Output
radians0 | radians1 | radians2 |
---|---|---|
1.5707963267949 | 3.14159265358979 | 6.28318530717959 |
12.187 - rand()
Returns a random number.
rand()
rand(1000)
Syntax
rand()
- returns a value of typereal
with a uniform distribution in the range [0.0, 1.0).rand(
N)
- returns a value of typereal
chosen with a uniform distribution from the set {0.0, 1.0, …, N - 1}.
12.188 - range()
Generates a dynamic array holding a series of equally spaced values.
Syntax
range(
start,
stop [,
step])
Parameters
Name | Type | Required | Description |
---|---|---|---|
start | scalar | ✔️ | The value of the first element in the resulting array. |
stop | scalar | ✔️ | The maximum value of the last element in the resulting array, such that the last value in the series is less than or equal to the stop value. |
step | scalar | The difference between two consecutive elements of the array. The default value for step is 1 for numeric and 1h for timespan or datetime . |
Returns
A dynamic array whose values are: start, start + step, … up to and including stop. The array is truncated if the maximum number of results allowed is reached.
Examples
The following example returns an array of numbers from one to eight, with an increment of three.
print r = range(1, 8, 3)
Output
r |
---|
[1,4,7] |
The following example returns an array with all dates from the year 2007.
print r = range(datetime(2007-01-01), datetime(2007-12-31), 1d)
Output
r |
---|
[“2007-01-01T00:00:00.0000000Z”,“2007-01-02T00:00:00.0000000Z”,“2007-01-03T00:00:00.0000000Z”,…..,“2007-12-31T00:00:00.0000000Z”] |
The following example returns an array with numbers between one and three.
print range(1, 3)
Output
print_0 |
---|
[1,2,3] |
The following example returns a range of hours between one hour and five hours.
print range(1h, 5h)
Output
print_0 |
---|
1,000,000 |
["01:00:00","02:00:00","03:00:00","04:00:00","05:00:00"] : |
The following example returns a truncated array as the range exceeds the maximum results limit. The example demonstrates that the limit is exceeded by using the mv-expand operator to expand the array into multiple records and then counting the number of records.
" target="_blank">Run the query
print r = range(1,1000000000)
| mv-expand r
| count
Output
Count |
---|
1,048,576 |
12.189 - rank_tdigest()
Calculates the approximate rank of the value in a set.
Rank of value v
in a set S
is defined as count of members of S
that are smaller or equal to v
, S
is represented by its tdigest
.
Syntax
rank_tdigest(
digest,
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
digest | string | An expression that was generated by tdigest() or tdigest_merge(). | |
value | scalar | An expression representing a value to be used for ranking calculation. |
Returns
The rank foreach value in a dataset.
Examples
In a sorted list (1-1000), the rank of 685 is its index:
range x from 1 to 1000 step 1
| summarize t_x=tdigest(x)
| project rank_of_685=rank_tdigest(t_x, 685)
Output
rank_of_685 |
---|
685 |
This query calculates the rank of value 4490$ over all damage properties costs:
StormEvents
| summarize tdigestRes = tdigest(DamageProperty)
| project rank_of_4490=rank_tdigest(tdigestRes, 4490)
Output
rank_of_4490 |
---|
50207 |
Getting the estimated percentage of the rank (by dividing by the set size):
StormEvents
| summarize tdigestRes = tdigest(DamageProperty), count()
| project rank_tdigest(tdigestRes, 4490) * 100.0 / count_
Output
Column1 |
---|
85.0015237192293 |
The percentile 85 of the damage properties costs is 4490$:
StormEvents
| summarize tdigestRes = tdigest(DamageProperty)
| project percentile_tdigest(tdigestRes, 85, typeof(long))
Output
percentile_tdigest_tdigestRes |
---|
4490 |
12.190 - regex_quote()
Returns a string that escapes all regular expression characters.
Syntax
regex_quote(
string)
Parameters
Name | Type | Required | Description |
---|---|---|---|
string | string | ✔️ | The string to escape. |
Returns
Returns string where all regex expression characters are escaped.
Example
print result = regex_quote('(so$me.Te^xt)')
Output
result |
---|
\(so\$me\.Te\^xt\) |
12.191 - repeat()
Generates a dynamic array containing a series comprised of repeated numbers.
Syntax
repeat(
value,
count)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | bool , int , long , real , datetime , string or timespan | ✔️ | The value of the element in the resulting array. |
count | int | ✔️ | The count of the elements in the resulting array. |
Returns
If count is equal to zero, an empty array is returned. If count is less than zero, a null value is returned.
Examples
The following example returns [1, 1, 1]
:
T | extend r = repeat(1, 3)
12.192 - replace_regex()
Replaces all regular expression matches with a specified pattern.
Syntax
replace_regex(
source,
lookup_regex,
rewrite_pattern)
Parameters
Name | Type | Required | Description |
---|---|---|---|
source | string | ✔️ | The text to search and replace. |
lookup_regex | string | ✔️ | The regular expression to search for in text. The expression can contain capture groups in parentheses. To match over multiple lines, use the m or s flags. For more information on flags, see Grouping and flags. |
rewrite_pattern | string | ✔️ | The replacement regex for any match made by matchingRegex. Use \0 to refer to the whole match, \1 for the first capture group, \2 and so on for subsequent capture groups. |
Returns
Returns the source after replacing all matches of lookup_regex with evaluations of rewrite_pattern. Matches do not overlap.
Example
range x from 1 to 5 step 1
| extend str=strcat('Number is ', tostring(x))
| extend replaced=replace_regex(str, @'is (\d+)', @'was: \1')
Output
x | str | replaced |
---|---|---|
1 | Number is 1.000000 | Number was: 1.000000 |
2 | Number is 2.000000 | Number was: 2.000000 |
3 | Number is 3.000000 | Number was: 3.000000 |
4 | Number is 4.000000 | Number was: 4.000000 |
5 | Number is 5.000000 | Number was: 5.000000 |
Related content
- To replace a single string, see replace_string().
- To replace multiple strings, see replace_strings().
- To replace a set of characters, see translate().
12.193 - replace_string()
Replaces all string matches with a specified string.
To replace multiple strings, see replace_strings().
Syntax
replace_string(
text,
lookup,
rewrite)
Parameters
Name | Type | Required | Description |
---|---|---|---|
text | string | ✔️ | The source string. |
lookup | string | ✔️ | The string to be replaced. |
rewrite | string | ✔️ | The replacement string. |
Returns
Returns the text after replacing all matches of lookup with evaluations of rewrite. Matches don’t overlap.
Examples
Replace words in a string
The following example uses replace_string()
to replace the word “cat” with the word “hamster” in the Message
string.
print Message="A magic trick can turn a cat into a dog"
| extend Outcome = replace_string(
Message, "cat", "hamster") // Lookup strings
Output
Message | Outcome |
---|---|
A magic trick can turn a cat into a dog | A magic trick can turn a hamster into a dog |
Generate and modify a sequence of numbers
The following example creates a table with column x
containing numbers from one to five, incremented by one. It adds the column str
that concatenates “Number is " with the string representation of the x
column values using the strcat()
function. It then adds the replaced
column where “was” replaces the word “is” in the strings from the str
column.
range x from 1 to 5 step 1
| extend str=strcat('Number is ', tostring(x))
| extend replaced=replace_string(str, 'is', 'was')
Output
x | str | replaced |
---|---|---|
1 | Number is 1.000000 | Number was 1.000000 |
2 | Number is 2.000000 | Number was 2.000000 |
3 | Number is 3.000000 | Number was 3.000000 |
4 | Number is 4.000000 | Number was 4.000000 |
5 | Number is 5.000000 | Number was 5.000000 |
Related content
- To replace multiple strings, see replace_strings().
- To replace strings based on regular expression, see replace_regex().
- To replace a set of characters, see translate().
12.194 - replace_strings()
Replaces all strings matches with specified strings.
To replace an individual string, see replace_string().
Syntax
replace_strings(
text,
lookups,
rewrites)
Parameters
Name | Type | Required | Description |
---|---|---|---|
text | string | ✔️ | The source string. |
lookups | dynamic | ✔️ | The array that includes lookup strings. Array element that isn’t a string is ignored. |
rewrites | dynamic | ✔️ | The array that includes rewrites. Array element that isn’t a string is ignored (no replacement made). |
Returns
Returns text after replacing all matches of lookups with evaluations of rewrites. Matches don’t overlap.
Examples
Simple replacement
print Message="A magic trick can turn a cat into a dog"
| extend Outcome = replace_strings(
Message,
dynamic(['cat', 'dog']), // Lookup strings
dynamic(['dog', 'pigeon']) // Replacements
)
Message | Outcome |
---|---|
A magic trick can turn a cat into a dog | A magic trick can turn a dog into a pigeon |
Replacement with an empty string
Replacement with an empty string removes the matching string.
print Message="A magic trick can turn a cat into a dog"
| extend Outcome = replace_strings(
Message,
dynamic(['turn', ' into a dog']), // Lookup strings
dynamic(['disappear', '']) // Replacements
)
Message | Outcome |
---|---|
A magic trick can turn a cat into a dog | A magic trick can disappear a cat |
Replacement order
The order of match elements matters: the earlier match takes the precedence.
Note the difference between Outcome1 and Outcome2: This
vs Thwas
.
print Message="This is an example of using replace_strings()"
| extend Outcome1 = replace_strings(
Message,
dynamic(['This', 'is']), // Lookup strings
dynamic(['This', 'was']) // Replacements
),
Outcome2 = replace_strings(
Message,
dynamic(['is', 'This']), // Lookup strings
dynamic(['was', 'This']) // Replacements
)
Message | Outcome1 | Outcome2 |
---|---|---|
This is an example of using replace_strings() | This was an example of using replace_strings() | Thwas was an example of using replace_strings() |
Nonstring replacement
Replace elements that aren’t strings aren’t replaced and the original string is kept. The match is still considered being valid, and other possible replacements aren’t performed on the matched string. In the following example, ‘This’ isn’t replaced with the numeric 12345
, and it remains in the output unaffected by possible match with ‘is’.
print Message="This is an example of using replace_strings()"
| extend Outcome = replace_strings(
Message,
dynamic(['This', 'is']), // Lookup strings
dynamic([12345, 'was']) // Replacements
)
Message | Outcome |
---|---|
This is an example of using replace_strings() | This was an example of using replace_strings() |
Related content
- For a replacement of a single string, see replace_string().
- For a replacement based on regular expression, see replace_regex().
- For replacing a set of characters, see translate().
12.195 - reverse()
Function reverses the order of the input string.
If the input value isn’t of type string
, then the function forcibly casts the value to type string
.
Syntax
reverse(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | string | ✔️ | input value. |
Returns
The reverse order of a string value.
Examples
print str = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
| extend rstr = reverse(str)
Output
str | rstr |
---|---|
ABCDEFGHIJKLMNOPQRSTUVWXYZ | ZYXWVUTSRQPONMLKJIHGFEDCBA |
print ['int'] = 12345, ['double'] = 123.45,
['datetime'] = datetime(2017-10-15 12:00), ['timespan'] = 3h
| project rint = reverse(['int']), rdouble = reverse(['double']),
rdatetime = reverse(['datetime']), rtimespan = reverse(['timespan'])
Output
rint | rdouble | rdatetime | rtimespan |
---|---|---|---|
54321 | 54.321 | Z0000000.00:00:21T51-01-7102 | 00:00:30 |
12.196 - round()
Returns the rounded number to the specified precision.
Syntax
round(
number [,
precision])
Parameters
Name | Type | Required | Description |
---|---|---|---|
number | long or real | ✔️ | The number to calculate the round on. |
precision | int | The number of digits to round to. The default is 0. |
Returns
The rounded number to the specified precision.
Round is different from the bin()
function in
that the round()
function rounds a number to a specific number of digits while the bin()
function rounds the value to an integer multiple of a given bin size. For example, round(2.15, 1)
returns 2.2 while bin(2.15, 1)
returns 2.
Examples
round(2.98765, 3) // 2.988
round(2.15, 1) // 2.2
round(2.15) // 2 // equivalent to round(2.15, 0)
round(-50.55, -2) // -100
round(21.5, -1) // 20
12.197 - Scalar Functions
This article lists all available scalar functions grouped by type. For aggregation functions, see Aggregation function types.
Binary functions
Function Name | Description |
---|---|
binary_and() | Returns a result of the bitwise and operation between two values. |
binary_not() | Returns a bitwise negation of the input value. |
binary_or() | Returns a result of the bitwise or operation of the two values. |
binary_shift_left() | Returns binary shift left operation on a pair of numbers: a « n. |
binary_shift_right() | Returns binary shift right operation on a pair of numbers: a » n. |
binary_xor() | Returns a result of the bitwise xor operation of the two values. |
bitset_count_ones() | Returns the number of set bits in the binary representation of a number. |
Conversion functions
Function Name | Description |
---|---|
tobool() | Convert inputs to boolean (signed 8-bit) representation. |
todatetime() | Converts input to datetime scalar. |
todouble() | Converts the input to a value of type real. |
tostring() | Converts input to a string representation. |
totimespan() | Converts input to timespan scalar. |
DateTime/timespan functions
Function Name | Description |
---|---|
ago() | Subtracts the given timespan from the current UTC clock time. |
datetime_add() | Calculates a new datetime from a specified datepart multiplied by a specified amount, added to a specified datetime. |
datetime_diff() | Returns the end of the year containing the date, shifted by an offset, if provided. |
datetime_local_to_utc() | Converts local datetime to UTC datetime using a time-zone specification. |
datetime_part() | Extracts the requested date part as an integer value. |
datetime_utc_to_local() | Converts UTC datetimgoe to local datetime using a time-zone specification. |
dayofmonth() | Returns the integer number representing the day number of the given month. |
dayofweek() | Returns the integer number of days since the preceding Sunday, as a timespan. |
dayofyear() | Returns the integer number represents the day number of the given year. |
endofday() | Returns the end of the day containing the date, shifted by an offset, if provided. |
endofmonth() | Returns the end of the month containing the date, shifted by an offset, if provided. |
endofweek() | Returns the end of the week containing the date, shifted by an offset, if provided. |
endofyear() | Returns the end of the year containing the date, shifted by an offset, if provided. |
format_datetime() | Formats a datetime parameter based on the format pattern parameter. |
format_timespan() | Formats a format-timespan parameter based on the format pattern parameter. |
getyear() | Returns the year part of the datetime argument. |
hourofday() | Returns the integer number representing the hour number of the given date. |
make_datetime() | Creates a datetime scalar value from the specified date and time. |
make_timespan() | Creates a timespan scalar value from the specified time period. |
monthofyear() | Returns the integer number that represents the month number of the given year. |
now() | Returns the current UTC clock time, optionally offset by a given timespan. |
startofday() | Returns the start of the day containing the date, shifted by an offset, if provided. |
startofmonth() | Returns the start of the month containing the date, shifted by an offset, if provided. |
startofweek() | Returns the start of the week containing the date, shifted by an offset, if provided. |
startofyear() | Returns the start of the year containing the date, shifted by an offset, if provided. |
todatetime() | Converts input to datetime scalar. |
totimespan() | Converts input to timespan scalar. |
unixtime_microseconds_todatetime() | Converts unix-epoch microseconds to UTC datetime. |
unixtime_milliseconds_todatetime() | Converts unix-epoch milliseconds to UTC datetime. |
unixtime_nanoseconds_todatetime() | Converts unix-epoch nanoseconds to UTC datetime. |
unixtime_seconds_todatetime() | Converts unix-epoch seconds to UTC datetime. |
weekofyear() | Returns an integer representing the week number. |
Dynamic/array functions
Function Name | Description |
---|---|
array_concat() | Concatenates a number of dynamic arrays to a single array. |
array_iff() | Applies element-wise iif function on arrays. |
array_index_of() | Searches the array for the specified item, and returns its position. |
array_length() | Calculates the number of elements in a dynamic array. |
array_reverse() | Reverses the order of the elements in a dynamic array. |
array_rotate_left() | Rotates values inside a dynamic array to the left. |
array_rotate_right() | Rotates values inside a dynamic array to the right. |
array_shift_left() | Shifts values inside a dynamic array to the left. |
array_shift_right() | Shifts values inside a dynamic array to the right. |
array_slice() | Extracts a slice of a dynamic array. |
array_sort_asc() | Sorts a collection of arrays in ascending order. |
array_sort_desc() | Sorts a collection of arrays in descending order. |
array_split() | Builds an array of arrays split from the input array. |
array_sum() | Calculates the sum of a dynamic array. |
bag_has_key() | Checks whether a dynamic bag column contains a given key. |
bag_keys() | Enumerates all the root keys in a dynamic property-bag object. |
bag_merge() | Merges dynamic property-bags into a dynamic property-bag with all properties merged. |
bag_pack() | Creates a dynamic object (property bag) from a list of names and values. |
bag_pack_columns() | Creates a dynamic object (property bag) from a list of columns. |
bag_remove_keys() | Removes keys and associated values from a dynamic property-bag. |
bag_set_key() | Sets a given key to a given value in a dynamic property-bag. |
jaccard_index() | Computes the Jaccard index of two sets. |
pack_all() | Creates a dynamic object (property bag) from all the columns of the tabular expression. |
pack_array() | Packs all input values into a dynamic array. |
repeat() | Generates a dynamic array holding a series of equal values. |
set_difference() | Returns an array of the set of all distinct values that are in the first array but aren’t in other arrays. |
set_has_element() | Determines whether the specified array contains the specified element. |
set_intersect() | Returns an array of the set of all distinct values that are in all arrays. |
set_union() | Returns an array of the set of all distinct values that are in any of provided arrays. |
treepath() | Enumerates all the path expressions that identify leaves in a dynamic object. |
zip() | The zip function accepts any number of dynamic arrays. Returns an array whose elements are each an array with the elements of the input arrays of the same index. |
Window scalar functions
Function Name | Description |
---|---|
next() | For the serialized row set, returns a value of a specified column from the later row according to the offset. |
prev() | For the serialized row set, returns a value of a specified column from the earlier row according to the offset. |
row_cumsum() | Calculates the cumulative sum of a column. |
row_number() | Returns a row’s number in the serialized row set - consecutive numbers starting from a given index or from 1 by default. |
row_rank_dense() | Returns a row’s dense rank in the serialized row set. |
row_rank_min() | Returns a row’s minimal rank in the serialized row set. |
Flow control functions
Function Name | Description |
---|---|
toscalar() | Returns a scalar constant value of the evaluated expression. |
Mathematical functions
Function Name | Description |
---|---|
abs() | Calculates the absolute value of the input. |
acos() | Returns the angle whose cosine is the specified number (the inverse operation of cos()). |
asin() | Returns the angle whose sine is the specified number (the inverse operation of sin()). |
atan() | Returns the angle whose tangent is the specified number (the inverse operation of tan()). |
atan2() | Calculates the angle, in radians, between the positive x-axis and the ray from the origin to the point (y, x). |
beta_cdf() | Returns the standard cumulative beta distribution function. |
beta_inv() | Returns the inverse of the beta cumulative probability beta density function. |
beta_pdf() | Returns the probability density beta function. |
cos() | Returns the cosine function. |
cot() | Calculates the trigonometric cotangent of the specified angle, in radians. |
degrees() | Converts angle value in radians into value in degrees, using formula degrees = (180 / PI) * angle-in-radians. |
erf() | Returns the error function. |
erfc() | Returns the complementary error function. |
exp() | The base-e exponential function of x, which is e raised to the power x: e^x. |
exp10() | The base-10 exponential function of x, which is 10 raised to the power x: 10^x. |
exp2() | The base-2 exponential function of x, which is 2 raised to the power x: 2^x. |
gamma() | Computes gamma function. |
isfinite() | Returns whether input is a finite value (isn’t infinite or NaN). |
isinf() | Returns whether input is an infinite (positive or negative) value. |
isnan() | Returns whether input is Not-a-Number (NaN) value. |
log() | Returns the natural logarithm function. |
log10() | Returns the common (base-10) logarithm function. |
log2() | Returns the base-2 logarithm function. |
loggamma() | Computes log of absolute value of the gamma function. |
not() | Reverses the value of its bool argument. |
pi() | Returns the constant value of Pi (π). |
pow() | Returns a result of raising to power. |
radians() | Converts angle value in degrees into value in radians, using formula radians = (PI / 180) * angle-in-degrees. |
rand() | Returns a random number. |
range() | Generates a dynamic array holding a series of equally spaced values. |
round() | Returns the rounded source to the specified precision. |
sign() | Sign of a numeric expression. |
sin() | Returns the sine function. |
sqrt() | Returns the square root function. |
tan() | Returns the tangent function. |
welch_test() | Computes the p-value of the Welch-test function. |
Metadata functions
Function Name | Description |
---|---|
column_ifexists() | Takes a column name as a string and a default value. Returns a reference to the column if it exists, otherwise - returns the default value. |
current_cluster_endpoint() | Returns the current cluster running the query. |
current_database() | Returns the name of the database in scope. |
current_principal() | Returns the current principal running this query. |
current_principal_details() | Returns details of the principal running the query. |
current_principal_is_member_of() | Checks group membership or principal identity of the current principal running the query. |
cursor_after() | Used to access to the records that were ingested after the previous value of the cursor. |
estimate_data_size() | Returns an estimated data size of the selected columns of the tabular expression. |
extent_id() | Returns a unique identifier that identifies the data shard (“extent”) that the current record resides in. |
extent_tags() | Returns a dynamic array with the tags of the data shard (“extent”) that the current record resides in. |
ingestion_time() | Retrieves the record’s $IngestionTime hidden datetime column, or null. |
Rounding functions
Function Name | Description |
---|---|
bin() | Rounds values down to an integer multiple of a given bin size. |
bin_at() | Rounds values down to a fixed-size “bin”, with control over the bin’s starting point. (See also bin function.) |
ceiling() | Calculates the smallest integer greater than, or equal to, the specified numeric expression. |
Conditional functions
Function Name | Description |
---|---|
case() | Evaluates a list of predicates and returns the first result expression whose predicate is satisfied. |
coalesce() | Evaluates a list of expressions and returns the first non-null (or non-empty for string) expression. |
iff() | Evaluate the first argument (the predicate), and returns the value of either the second or third arguments, depending on whether the predicate evaluated to true (second) or false (third). |
max_of() | Returns the maximum value of several evaluated numeric expressions. |
min_of() | Returns the minimum value of several evaluated numeric expressions. |
Series element-wise functions
Function Name | Description |
---|---|
series_abs() | Calculates the element-wise absolute value of the numeric series input. |
series_acos() | Calculates the element-wise arccosine function of the numeric series input. |
series_add() | Calculates the element-wise addition of two numeric series inputs. |
series_asin() | Calculates the element-wise arcsine function of the numeric series input. |
series_atan() | Calculates the element-wise arctangent function of the numeric series input. |
series_ceiling() | Calculates the element-wise ceiling function of the numeric series input. |
series_cos() | Calculates the element-wise cosine function of the numeric series input. |
series_divide() | Calculates the element-wise division of two numeric series inputs. |
series_equals() | Calculates the element-wise equals (== ) logic operation of two numeric series inputs. |
series_exp() | Calculates the element-wise base-e exponential function (e^x) of the numeric series input. |
series_floor() | Calculates the element-wise floor function of the numeric series input. |
series_greater() | Calculates the element-wise greater (> ) logic operation of two numeric series inputs. |
series_greater_equals() | Calculates the element-wise greater or equals (>= ) logic operation of two numeric series inputs. |
series_less() | Calculates the element-wise less (< ) logic operation of two numeric series inputs. |
series_less_equals() | Calculates the element-wise less or equal (<= ) logic operation of two numeric series inputs. |
series_log() | Calculates the element-wise natural logarithm function (base-e) of the numeric series input. |
series_multiply() | Calculates the element-wise multiplication of two numeric series inputs. |
series_not_equals() | Calculates the element-wise not equals (!= ) logic operation of two numeric series inputs. |
series_pow() | Calculates the element-wise power of two numeric series inputs. |
series_sign() | Calculates the element-wise sign of the numeric series input. |
series_sin() | Calculates the element-wise sine function of the numeric series input. |
series_subtract() | Calculates the element-wise subtraction of two numeric series inputs. |
series_tan() | Calculates the element-wise tangent function of the numeric series input. |
Series processing functions
Function Name | Description |
---|---|
series_cosine_similarity() | Calculates the cosine similarity of two numeric series. |
series_decompose() | Does a decomposition of the series into components. |
series_decompose_anomalies() | Finds anomalies in a series based on series decomposition. |
series_decompose_forecast() | Forecast based on series decomposition. |
series_dot_product() | Calculates the dot product of two numeric series. |
series_fill_backward() | Performs backward fill interpolation of missing values in a series. |
series_fill_const() | Replaces missing values in a series with a specified constant value. |
series_fill_forward() | Performs forward fill interpolation of missing values in a series. |
series_fill_linear() | Performs linear interpolation of missing values in a series. |
series_fft() | Applies the Fast Fourier Transform (FFT) on a series. |
series_fir() | Applies a Finite Impulse Response filter on a series. |
series_fit_2lines() | Applies two segments linear regression on a series, returning multiple columns. |
series_fit_2lines_dynamic() | Applies two segments linear regression on a series, returning dynamic object. |
series_fit_line() | Applies linear regression on a series, returning multiple columns. |
series_fit_line_dynamic() | Applies linear regression on a series, returning dynamic object. |
series_fit_poly() | Applies polynomial regression on a series, returning multiple columns. |
series_ifft() | Applies the Inverse Fast Fourier Transform (IFFT) on a series. |
series_iir() | Applies an Infinite Impulse Response filter on a series. |
series_magnitude() | Calculates the magnitude of the numeric series. |
series_outliers() | Scores anomaly points in a series. |
series_pearson_correlation() | Calculates the Pearson correlation coefficient of two series. |
series_periods_detect() | Finds the most significant periods that exist in a time series. |
series_periods_validate() | Checks whether a time series contains periodic patterns of given lengths. |
series_seasonal() | Finds the seasonal component of the series. |
series_stats() | Returns statistics for a series in multiple columns. |
series_stats_dynamic() | Returns statistics for a series in dynamic object. |
series_sum() | Calculates the sum of numeric series elements. |
String functions
Function Name | Description |
---|---|
base64_encode_tostring() | Encodes a string as base64 string. |
base64_encode_fromguid() | Encodes a GUID as base64 string. |
base64_decode_tostring() | Decodes a base64 string to a UTF-8 string. |
base64_decode_toarray() | Decodes a base64 string to an array of long values. |
base64_decode_toguid() | Decodes a base64 string to a GUID. |
countof() | Counts occurrences of a substring in a string. Plain string matches may overlap; regex matches don’t. |
extract() | Get a match for a regular expression from a text string. |
extract_all() | Get all matches for a regular expression from a text string. |
extract_json() | Get a specified element out of a JSON text using a path expression. |
has_any_index() | Searches the string for items specified in the array and returns the position of the first item found in the string. |
indexof() | Function reports the zero-based index of the first occurrence of a specified string within input string. |
isempty() | Returns true if the argument is an empty string or is null. |
isnotempty() | Returns true if the argument isn’t an empty string or a null. |
isnotnull() | Returns true if the argument is not null. |
isnull() | Evaluates its sole argument and returns a bool value indicating if the argument evaluates to a null value. |
parse_command_line() | Parses a Unicode command line string and returns an array of the command line arguments. |
parse_csv() | Splits a given string representing comma-separated values and returns a string array with these values. |
parse_ipv4() | Converts input to long (signed 64-bit) number representation. |
parse_ipv4_mask() | Converts input string and IP-prefix mask to long (signed 64-bit) number representation. |
parse_ipv6() | Converts IPv6 or IPv4 string to a canonical IPv6 string representation. |
parse_ipv6_mask() | Converts IPv6 or IPv4 string and netmask to a canonical IPv6 string representation. |
parse_json() | Interprets a string as a JSON value and returns the value as dynamic. |
parse_url() | Parses an absolute URL string and returns a dynamic object contains all parts of the URL. |
parse_urlquery() | Parses a url query string and returns a dynamic object contains the Query parameters. |
parse_version() | Converts input string representation of version to a comparable decimal number. |
replace_regex() | Replace all regex matches with another string. |
replace_string() | Replace all single string matches with a specified string. |
replace_strings() | Replace all multiple strings matches with specified strings. |
punycode_from_string() | Encodes domain name to Punycode form. |
punycode_to_string() | Decodes domain name from Punycode form. |
reverse() | Function makes reverse of input string. |
split() | Splits a given string according to a given delimiter and returns a string array with the contained substrings. |
strcat() | Concatenates between 1 and 64 arguments. |
strcat_delim() | Concatenates between 2 and 64 arguments, with delimiter, provided as first argument. |
strcmp() | Compares two strings. |
strlen() | Returns the length, in characters, of the input string. |
strrep() | Repeats given string provided number of times (default - 1). |
substring() | Extracts a substring from a source string starting from some index to the end of the string. |
toupper() | Converts a string to upper case. |
translate() | Replaces a set of characters (‘searchList’) with another set of characters (‘replacementList’) in a given a string. |
trim() | Removes all leading and trailing matches of the specified regular expression. |
trim_end() | Removes trailing match of the specified regular expression. |
trim_start() | Removes leading match of the specified regular expression. |
url_decode() | The function converts encoded URL into a regular URL representation. |
url_encode() | The function converts characters of the input URL into a format that can be transmitted over the Internet. |
IPv4/IPv6 functions
Function Name | Description |
---|---|
ipv4_compare() | Compares two IPv4 strings. |
ipv4_is_in_range() | Checks if IPv4 string address is in IPv4-prefix notation range. |
ipv4_is_in_any_range() | Checks if IPv4 string address is any of the IPv4-prefix notation ranges. |
ipv4_is_match() | Matches two IPv4 strings. |
ipv4_is_private() | Checks if IPv4 string address belongs to a set of private network IPs. |
ipv4_netmask_suffix | Returns the value of the IPv4 netmask suffix from IPv4 string address. |
parse_ipv4() | Converts input string to long (signed 64-bit) number representation. |
parse_ipv4_mask() | Converts input string and IP-prefix mask to long (signed 64-bit) number representation. |
ipv4_range_to_cidr_list() | Converts IPv4 address range to a list of CIDR ranges. |
ipv6_compare() | Compares two IPv4 or IPv6 strings. |
ipv6_is_match() | Matches two IPv4 or IPv6 strings. |
parse_ipv6() | Converts IPv6 or IPv4 string to a canonical IPv6 string representation. |
parse_ipv6_mask() | Converts IPv6 or IPv4 string and netmask to a canonical IPv6 string representation. |
format_ipv4() | Parses input with a netmask and returns string representing IPv4 address. |
format_ipv4_mask() | Parses input with a netmask and returns string representing IPv4 address as CIDR notation. |
ipv6_is_in_range() | Checks if an IPv6 string address is in IPv6-prefix notation range. |
ipv6_is_in_any_range() | Checks if an IPv6 string address is in any of the IPv6-prefix notation ranges. |
geo_info_from_ip_address() | Retrieves geolocation information about IPv4 or IPv6 addresses. |
IPv4 text match functions
Function Name | Description |
---|---|
has_ipv4() | Searches for an IPv4 address in a text. |
has_ipv4_prefix() | Searches for an IPv4 address or prefix in a text. |
has_any_ipv4() | Searches for any of the specified IPv4 addresses in a text. |
has_any_ipv4_prefix() | Searches for any of the specified IPv4 addresses or prefixes in a text. |
Type functions
Function Name | Description |
---|---|
gettype() | Returns the runtime type of its single argument. |
Scalar aggregation functions
Function Name | Description |
---|---|
dcount_hll() | Calculates the dcount from hll results (which was generated by hll or hll-merge). |
hll_merge() | Merges hll results (scalar version of the aggregate version hll-merge()). |
percentile_tdigest() | Calculates the percentile result from tdigest results (which was generated by tdigest or merge_tdigest). |
percentile_array_tdigest() | Calculates the percentile array result from tdigest results (which was generated by tdigest or merge_tdigest). |
percentrank_tdigest() | Calculates the percentage ranking of a value in a dataset. |
rank_tdigest() | Calculates relative rank of a value in a set. |
merge_tdigest() | Merge tdigest results (scalar version of the aggregate version tdigest-merge()). |
Geospatial functions
Function Name | Description |
---|---|
geo_angle() | Calculates clockwise angle in radians between two lines on Earth. |
geo_azimuth() | Calculates clockwise angle in radians between the line from point1 to true north and a line from point1 to point2 on Earth. |
geo_distance_2points() | Calculates the shortest distance between two geospatial coordinates on Earth. |
geo_distance_point_to_line() | Calculates the shortest distance between a coordinate and a line or multiline on Earth. |
geo_distance_point_to_polygon() | Calculates the shortest distance between a coordinate and a polygon or multipolygon on Earth. |
geo_intersects_2lines() | Calculates whether the two lines or multilines intersects. |
geo_intersects_2polygons() | Calculates whether the two polygons or multipolygons intersects. |
geo_intersects_line_with_polygon() | Calculates whether the line or multiline intersects with polygon or multipolygon. |
geo_intersection_2lines() | Calculates the intersection of two lines or multilines. |
geo_intersection_2polygons() | Calculates the intersection of two polygons or multipolygons. |
geo_intersection_line_with_polygon() | Calculates the intersection of line or multiline with polygon or multipolygon. |
geo_point_buffer() | Calculates polygon that contains all points within the given radius of the point on Earth. |
geo_point_in_circle() | Calculates whether the geospatial coordinates are inside a circle on Earth. |
geo_point_in_polygon() | Calculates whether the geospatial coordinates are inside a polygon or a multipolygon on Earth. |
geo_point_to_geohash() | Calculates the Geohash string value for a geographic location. |
geo_point_to_s2cell() | Calculates the S2 Cell token string value for a geographic location. |
geo_point_to_h3cell() | Calculates the H3 Cell token string value for a geographic location. |
geo_line_buffer() | Calculates polygon or multipolygon that contains all points within the given radius of the input line or multiline on Earth. |
geo_line_centroid() | Calculates the centroid of line or a multiline on Earth. |
geo_line_densify() | Converts planar line edges to geodesics by adding intermediate points. |
geo_line_length() | Calculates the total length of line or a multiline on Earth. |
geo_line_simplify() | Simplifies line or a multiline by replacing nearly straight chains of short edges with a single long edge on Earth. |
geo_line_to_s2cells() | Calculates S2 cell tokens that cover a line or multiline on Earth. Useful geospatial join tool. |
geo_polygon_area() | Calculates the area of polygon or a multipolygon on Earth. |
geo_polygon_buffer() | Calculates polygon or multipolygon that contains all points within the given radius of the input polygon or multipolygon on Earth. |
geo_polygon_centroid() | Calculates the centroid of polygon or a multipolygon on Earth. |
geo_polygon_densify() | Converts polygon or multipolygon planar edges to geodesics by adding intermediate points. |
geo_polygon_perimeter() | Calculates the length of the boundary of polygon or a multipolygon on Earth. |
geo_polygon_simplify() | Simplifies polygon or a multipolygon by replacing nearly straight chains of short edges with a single long edge on Earth. |
geo_polygon_to_s2cells() | Calculates S2 Cell tokens that cover a polygon or multipolygon on Earth. Useful geospatial join tool. |
geo_polygon_to_h3cells() | Converts polygon to H3 cells. Useful geospatial join and visualization tool. |
geo_geohash_to_central_point() | Calculates the geospatial coordinates that represent the center of a Geohash rectangular area. |
geo_geohash_neighbors() | Calculates the geohash neighbors. |
geo_geohash_to_polygon() | Calculates the polygon that represents the geohash rectangular area. |
geo_s2cell_to_central_point() | Calculates the geospatial coordinates that represent the center of an S2 Cell. |
geo_s2cell_neighbors() | Calculates the S2 cell neighbors. |
geo_s2cell_to_polygon() | Calculates the polygon that represents the S2 Cell rectangular area. |
geo_h3cell_to_central_point() | Calculates the geospatial coordinates that represent the center of an H3 Cell. |
geo_h3cell_neighbors() | Calculates the H3 cell neighbors. |
geo_h3cell_to_polygon() | Calculates the polygon that represents the H3 Cell rectangular area. |
geo_h3cell_parent() | Calculates the H3 cell parent. |
geo_h3cell_children() | Calculates the H3 cell children. |
geo_h3cell_level() | Calculates the H3 cell resolution. |
geo_h3cell_rings() | Calculates the H3 cell Rings. |
geo_simplify_polygons_array() | Simplifies polygons by replacing nearly straight chains of short edges with a single long edge, while ensuring mutual boundaries consistency related to each other, on Earth. |
geo_union_lines_array() | Calculates the union of lines or multilines on Earth. |
geo_union_polygons_array() | Calculates the union of polygons or multipolygons on Earth. |
Hash functions
Function Name | Description |
---|---|
hash() | Returns a hash value for the input value. |
hash_combine() | Combines two or more hash values. |
hash_many() | Returns a combined hash value of multiple values. |
hash_md5() | Returns an MD5 hash value for the input value. |
hash_sha1() | Returns a SHA1 hash value for the input value. |
hash_sha256() | Returns a SHA256 hash value for the input value. |
hash_xxhash64() | Returns an XXHASH64 hash value for the input value. |
Units conversion functions
Function Name | Description |
---|---|
convert_angle() | Returns the input value converted from one angle unit to another |
convert_energy() | Returns the input value converted from one energy unit to another |
convert_force() | Returns the input value converted from one force unit to another |
convert_length() | Returns the input value converted from one length unit to another |
convert_mass() | Returns the input value converted from one mass unit to another |
convert_speed() | Returns the input value converted from one speed unit to another |
convert_temperature() | Returns the input value converted from one temperature unit to another |
convert_volume() | Returns the input value converted from one volume unit to another |
| convert_volume() | Returns the input value converted from one volume unit to another |
12.198 - set_difference()
Returns a dynamic
(JSON) array of the set of all distinct values that are in the first array but aren’t in other arrays - (((arr1 \ arr2) \ arr3) \ …).
Syntax
set_difference(
set1,
set2 [,
set3, …])
Parameters
Name | Type | Required | Description |
---|---|---|---|
set1…setN | dynamic | ✔️ | Arrays used to create a difference set. A minimum of 2 arrays are required. See pack_array. |
Returns
Returns a dynamic array of the set of all distinct values that are in set1 but aren’t in other arrays.
Example
range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| extend w = z * 2
| extend a1 = pack_array(x,y,x,z), a2 = pack_array(x, y), a3 = pack_array(x,y,w)
| project set_difference(a1, a2, a3)
Output
Column1 |
---|
[4] |
[8] |
[12] |
print arr = set_difference(dynamic([1,2,3]), dynamic([1,2,3]))
Output
arr |
---|
[] |
Related content
12.199 - set_has_element()
Determines whether the specified set contains the specified element.
Syntax
set_has_element(
set,
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
set | dynamic | ✔️ | The input array to search. |
value | ✔️ | The value for which to search. The value should be of type long , int , double , datetime , timespan , decimal , string , guid , or bool . |
Returns
true
or false
depending on if the value exists in the array.
Example
print arr=dynamic(["this", "is", "an", "example"])
| project Result=set_has_element(arr, "example")
Output
Result |
---|
true |
Related content
Use array_index_of(arr, value)
to find the position at which the value exists in the array. Both functions are equally performant.
12.200 - set_intersect()
Returns a dynamic
array of the set of all distinct values that are in all arrays - (arr1 ∩ arr2 ∩ …).
Syntax
set_intersect(
set1,
set2 [,
set3, …])
Parameters
Name | Type | Required | Description |
---|---|---|---|
set1…setN | dynamic | ✔️ | Arrays used to create an intersect set. A minimum of 2 arrays are required. See pack_array. |
Returns
Returns a dynamic array of the set of all distinct values that are in all arrays.
Example
range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| extend w = z * 2
| extend a1 = pack_array(x,y,x,z), a2 = pack_array(x, y), a3 = pack_array(w,x)
| project set_intersect(a1, a2, a3)
Output
Column1 |
---|
[1] |
[2] |
[3] |
print arr = set_intersect(dynamic([1, 2, 3]), dynamic([4,5]))
Output
arr |
---|
[] |
Related content
12.201 - set_union()
Returns a dynamic
array of the set of all distinct values that are in any of the arrays - (arr1 ∪ arr2 ∪ …).
Syntax
set_union(
set1,
set2 [,
set3, …])
Parameters
Name | Type | Required | Description |
---|---|---|---|
set1…setN | dynamic | ✔️ | Arrays used to create a union set. A minimum of two arrays are required. See pack_array. |
Returns
Returns a dynamic array of the set of all distinct values that are in any of arrays.
Example
Set from multiple dynamic array
range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| extend w = z * 2
| extend a1 = pack_array(x,y,x,z), a2 = pack_array(x, y), a3 = pack_array(w)
| project a1,a2,a3,Out=set_union(a1, a2, a3)
Output
a1 | a2 | a3 | Out |
---|---|---|---|
[1,2,1,4] | [1,2] | [8] | [1,2,4,8] |
[2,4,2,8] | [2,4] | [16] | [2,4,8,16] |
[3,6,3,12] | [3,6] | [24] | [3,6,12,24] |
Set from one dynamic array
datatable (Arr1: dynamic)
[
dynamic(['A4', 'A2', 'A7', 'A2']),
dynamic(['C4', 'C7', 'C1', 'C4'])
]
| extend Out=set_union(Arr1, Arr1)
Output
Arr1 | Out |
---|---|
[“A4”,“A2”,“A7”,“A2”] | [“A4”,“A2”,“A7”] |
[“C4”,“C7”,“C1”,“C4”] | [“C4”,“C7”,“C1”] |
Related content
12.202 - sign()
Returns the sign of the numeric expression.
Syntax
sign(
number)
Parameters
Name | Type | Required | Description |
---|---|---|---|
number | real | ✔️ | The number for which to return the sign. |
Returns
The positive (+1), zero (0), or negative (-1) sign of the specified expression.
Examples
print s1 = sign(-42), s2 = sign(0), s3 = sign(11.2)
Output
s1 | s2 | s3 |
---|---|---|
-1 | 0 | 1 |
12.203 - sin()
Returns the sine function value of the specified angle. The angle is specified in radians.
Syntax
sin(
number)
Parameters
Name | Type | Required | Description |
---|---|---|---|
number | real | ✔️ | The value in radians for which to calculate the sine. |
Returns
The sine of number of radians.
Example
print sin(1)
Output
result |
---|
0.841470984807897 |
12.204 - split()
The split()
function takes a string and splits it into substrings based on a specified delimiter, returning the substrings in an array. Optionally, you can retrieve a specific substring by specifying its index.
Syntax
split(
source,
delimiter [,
requestedIndex])
Parameters
Name | Type | Required | Description |
---|---|---|---|
source | string | ✔️ | The source string that is split according to the given delimiter. |
delimiter | string | ✔️ | The delimiter that will be used in order to split the source string. |
requestedIndex | int | A zero-based index. If provided, the returned string array contains the requested substring at the index if it exists. |
Returns
An array of substrings obtained by separating the source string by the specified delimiter, or a single substring at the specified requestedIndex.
Examples
print
split("aa_bb", "_"), // ["aa","bb"]
split("aaa_bbb_ccc", "_", 1), // ["bbb"]
split("", "_"), // [""]
split("a__b", "_"), // ["a","","b"]
split("aabbcc", "bb") // ["aa","cc"]
print_0 | print_1 | print_2 | print_3 | print4 |
---|---|---|---|---|
[“aa”,“bb”] | [“bbb”] | [""] | [“a”,"",“b”] | [“aa”,“cc”] |
12.205 - sqrt()
Returns the square root of the input.
Syntax
sqrt(
number)
Parameters
Name | Type | Required | Description |
---|---|---|---|
number | int, long, or real | ✔️ | The number for which to calculate the square root. |
Returns
- A positive number such that
sqrt(x) * sqrt(x) == x
null
if the argument is negative or can’t be converted to areal
value.
12.206 - startofday()
Returns the start of the day containing the date, shifted by an offset, if provided.
Syntax
startofday(
date [,
offset ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
date | datetime | ✔️ | The date for which to find the start. |
offset | int | The number of days to offset from the input date. The default is 0. |
Returns
A datetime representing the start of the day for the given date value, with the offset, if specified.
Example
range offset from -1 to 1 step 1
| project dayStart = startofday(datetime(2017-01-01 10:10:17), offset)
Output
dayStart |
---|
2016-12-31 00:00:00.0000000 |
2017-01-01 00:00:00.0000000 |
2017-01-02 00:00:00.0000000 |
12.207 - startofmonth()
Returns the start of the month containing the date, shifted by an offset, if provided.
Syntax
startofmonth(
date [,
offset ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
date | datetime | ✔️ | The date for which to find the start of month. |
offset | int | The number of months to offset from the input date. The default is 0. |
Returns
A datetime representing the start of the month for the given date value, with the offset, if specified.
Example
range offset from -1 to 1 step 1
| project monthStart = startofmonth(datetime(2017-01-01 10:10:17), offset)
Output
monthStart |
---|
2016-12-01 00:00:00.0000000 |
2017-01-01 00:00:00.0000000 |
2017-02-01 00:00:00.0000000 |
12.208 - startofweek()
Returns the start of the week containing the date, shifted by an offset, if provided.
Start of the week is considered to be a Sunday.
Syntax
startofweek(
date [,
offset ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
date | datetime | ✔️ | The date for which to find the start of week. |
offset | int | The number of weeks to offset from the input date. The default is 0. |
Returns
A datetime representing the start of the week for the given date value, with the offset, if specified.
Example
range offset from -1 to 1 step 1
| project weekStart = startofweek(datetime(2017-01-01 10:10:17), offset)
Output
weekStart |
---|
2016-12-25 00:00:00.0000000 |
2017-01-01 00:00:00.0000000 |
2017-01-08 00:00:00.0000000 |
12.209 - startofyear()
Returns the start of the year containing the date, shifted by an offset, if provided.
Syntax
startofyear(
date [,
offset ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
date | datetime | ✔️ | The date for which to find the start of the year. |
offset | int | The number of years to offset from the input date. The default is 0. |
Returns
A datetime representing the start of the year for the given date value, with the offset, if specified.
Example
range offset from -1 to 1 step 1
| project yearStart = startofyear(datetime(2017-01-01 10:10:17), offset)
Output
yearStart |
---|
2016-01-01 00:00:00.0000000 |
2017-01-01 00:00:00.0000000 |
2018-01-01 00:00:00.0000000 |
12.210 - strcat_array()
Creates a concatenated string of array values using a specified delimiter.
Syntax
strcat_array(
array, delimiter)
Parameters
Name | Type | Required | Description |
---|---|---|---|
array | dynamic | ✔️ | An array of values to be concatenated. |
delimeter | string | ✔️ | The value used to concatenate the values in array. |
Returns
The input array values concatenated to a single string with the specified delimiter.
Examples
Custom delimeter
print str = strcat_array(dynamic([1, 2, 3]), "->")
Output
str |
---|
1->2->3 |
Using quotes as the delimeter
To use quotes as the delimeter, enclose the quotes in single quotes.
print str = strcat_array(dynamic([1, 2, 3]), '"')
Output
str |
---|
1"2"3 |
12.211 - strcat_delim()
Concatenates between 2 and 64 arguments, using a specified delimiter as the first argument.
Syntax
strcat_delim(
delimiter, argument1, argument2[ , argumentN])
Parameters
Name | Type | Required | Description |
---|---|---|---|
delimiter | string | ✔️ | The string to be used as separator in the concatenation. |
argument1 … argumentN | scalar | ✔️ | The expressions to concatenate. |
Returns
The arguments concatenated to a single string with delimiter.
Example
print st = strcat_delim('-', 1, '2', 'A', 1s)
Output
st |
---|
1-2-A-00:00:01 |
12.212 - strcat()
Concatenates between 1 and 64 arguments.
Syntax
strcat(
argument1,
argument2 [,
argument3 … ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
argument1 … argumentN | scalar | ✔️ | The expressions to concatenate. |
Returns
The arguments concatenated to a single string.
Examples
Concatenated string
The following example uses the strcat()
function to concatenate the strings provided to form the string, “hello world.” The results are assigned to the variable str
.
print str = strcat("hello", " ", "world")
Output
str |
---|
hello world |
Concatenated multi-line string
The following example uses the strcat()
function to create a concatenated multi-line string which is saved to the variable, MultiLineString
. It uses the newline character to break the string into new lines.
print MultiLineString = strcat("Line 1\n", "Line 2\n", "Line 3")
Output
The results show the expanded row view with the multiline string.
MultiLineString |
---|
1. “MultiLineString”: Line 1 2. Line 2 3. Line 3 |
12.213 - strcmp()
Compares two strings.
The function starts comparing the first character of each string. If they’re equal to each other, it continues with the following pairs until the characters differ or until the end of shorter string is reached.
Syntax
strcmp(
string1,
string2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
string1 | string | ✔️ | The first input string for comparison. |
string2 | string | ✔️ | The second input string for comparison. |
Returns
Returns an integer value indicating the relationship between the strings:
- <0 - the first character that doesn’t match has a lower value in string1 than in string2
- 0 - the contents of both strings are equal
- >0 - the first character that doesn’t match has a greater value in string1 than in string2
Example
datatable(string1:string, string2:string) [
"ABC","ABC",
"abc","ABC",
"ABC","abc",
"abcde","abc"
]
| extend result = strcmp(string1,string2)
Output
string1 | string2 | result |
---|---|---|
ABC | ABC | 0 |
abc | ABC | 1 |
ABC | abc | -1 |
abcde | abc | 1 |
12.214 - string_size()
Returns the size, in bytes, of the input string.
Syntax
string_size(
source)
Parameters
Name | Type | Required | Description |
---|---|---|---|
source | string | ✔️ | The string for which to return the byte size. |
Returns
Returns the length, in bytes, of the input string.
Examples
String of letters
print size = string_size("hello")
Output
size |
---|
5 |
String of letters and symbols
print size = string_size("⒦⒰⒮⒯⒪")
Output
size |
---|
15 |
12.215 - strlen()
Returns the length, in characters, of the input string.
Syntax
strlen(
source)
Parameters
Name | Type | Required | Description |
---|---|---|---|
source | string | ✔️ | The string for which to return the length. |
Returns
Returns the length, in characters, of the input string.
Examples
String of letters
print length = strlen("hello")
Output
length |
---|
5 |
String of letters and symbols
print length = strlen("⒦⒰⒮⒯⒪")
Output
length |
---|
5 |
String with grapheme
print strlen('Çedilla') // the first character is a grapheme cluster
// that requires 2 code points to represent
Output
length |
---|
8 |
12.216 - strrep()
Replicates a string the number of times specified.
Syntax
strrep(
value,
multiplier,
[ delimiter ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | string | ✔️ | The string to replicate. |
multiplier | int | ✔️ | The amount of times to replicate the string. Must be a value from 1 to 67108864. |
delimiter | string | The delimeter used to separate the string replications. The default delimiter is an empty string. |
Returns
The value string repeated the number of times as specified by multiplier, concatenated with delimiter.
If multiplier is more than the maximal allowed value of 1024, the input string will be repeated 1024 times.
Example
print from_str = strrep('ABC', 2), from_int = strrep(123,3,'.'), from_time = strrep(3s,2,' ')
Output
from_str | from_int | from_time |
---|---|---|
ABCABC | 123.123.123 | 00:00:03 00:00:03 |
12.217 - substring()
Extracts a substring from the source string starting from some index to the end of the string.
Optionally, the length of the requested substring can be specified.
Syntax
substring(
source,
startingIndex [,
length])
Parameters
Name | Type | Required | Description |
---|---|---|---|
source | string | ✔️ | The string from which to take the substring. |
startingIndex | int | ✔️ | The zero-based starting character position of the requested substring. If a negative number, the substring will be retrieved from the end of the source string. |
length | int | The requested number of characters in the substring. The default behavior is to take from startingIndex to the end of the source string. |
Returns
A substring from the given string. The substring starts at startingIndex (zero-based) character position and continues to the end of the string or length characters if specified.
Examples
substring("123456", 1) // 23456
substring("123456", 2, 2) // 34
substring("ABCD", 0, 2) // AB
substring("123456", -2, 2) // 56
12.218 - tan()
Returns the tangent value of the specified number.
Syntax
tan(
x)
Parameters
Name | Type | Required | Description |
---|---|---|---|
x | real | ✔️ | The number for which to calculate the tangent. |
Returns
The result of tan(
x)
12.219 - The has_any_index operator
Searches the string for items specified in the array and returns the position in the array of the first item found in the string.
Syntax
has_any_index
(
source,
values)
Parameters
Name | Type | Required | Description |
---|---|---|---|
source | string | ✔️ | The value to search. |
values | dynamic | ✔️ | An array of scalar or literal expressions to look up. |
Returns
Zero-based index position of the first item in values that is found in source. Returns -1 if none of the array items were found in the string or if values is empty.
Example
print
idx1 = has_any_index("this is an example", dynamic(['this', 'example'])) // first lookup found in input string
, idx2 = has_any_index("this is an example", dynamic(['not', 'example'])) // last lookup found in input string
, idx3 = has_any_index("this is an example", dynamic(['not', 'found'])) // no lookup found in input string
, idx4 = has_any_index("Example number 2", range(1, 3, 1)) // Lookup array of integers
, idx5 = has_any_index("this is an example", dynamic([])) // Empty lookup array
Output
idx1 | idx2 | idx3 | idx4 | idx5 |
---|---|---|---|---|
0 | 1 | -1 | 1 | -1 |
12.220 - tobool()
Convert inputs to boolean (signed 8-bit) representation.
Syntax
tobool(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | string | ✔️ | The value to convert to boolean. |
Returns
If conversion is successful, result will be a boolean.
If conversion isn’t successful, result will be null
.
Example
tobool("true") == true
tobool("false") == false
tobool(1) == true
tobool(123) == true
12.221 - todatetime()
Converts the input to a datetime scalar value.
Syntax
todatetime(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | scalar | ✔️ | The value to convert to datetime. |
Returns
If the conversion is successful, the result will be a datetime value.
Else, the result will be null
.
Example
The following example converts a date and time string into a datetime
value.
print todatetime("2015-12-31 23:59:59.9")
The following example compares a converted date string to a datetime
value.
print todatetime('12-02-2022') == datetime('12-02-2022')
Output
print_0 |
---|
true |
12.222 - todecimal()
Converts the input to a decimal number representation.
Syntax
todecimal(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | scalar | ✔️ | The value to convert to a decimal. |
Returns
If conversion is successful, result will be a decimal number.
If conversion isn’t successful, result will be null
.
Example
print todecimal("123.45678") == decimal(123.45678)
Output
print_0 |
---|
true |
12.223 - toguid()
guid
scalar.Converts a string to a guid
scalar.
Syntax
toguid(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | scalar | ✔️ | The value to convert to guid. |
Returns
The conversion process takes the first 32 characters of the input, ignoring properly located hyphens, validates that the characters are between 0-9 or a-f, and then converts the string into a guid
scalar. The rest of the string is ignored.
If the conversion is successful, the result will be a guid
scalar. Otherwise, the result will be null
.
Example
datatable(str: string)
[
"0123456789abcdef0123456789abcdef",
"0123456789ab-cdef-0123-456789abcdef",
"a string that is not a guid"
]
| extend guid = toguid(str)
Output
str | guid |
---|---|
0123456789abcdef0123456789abcdef | 01234567-89ab-cdef-0123-456789abcdef |
0123456789ab-cdef-0123-456789abcdef | 01234567-89ab-cdef-0123-456789abcdef |
a string that isn’t a guid |
12.224 - tohex()
Converts input to a hexadecimal string.
Syntax
tohex(
value,
[,
minLength ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | int or long | ✔️ | The value that will be converted to a hex string. |
minLength | int | The value representing the number of leading characters to include in the output. Values between 1 and 16 are supported. Values greater than 16 will be truncated to 16. If the string is longer than minLength without leading characters, then minLength is effectively ignored. Negative numbers may only be represented at minimum by their underlying data size, so for an integer (32-bit) the minLength will be at minimum 8, for a long (64-bit) it will be at minimum 16. |
Returns
If conversion is successful, result will be a string value.
If conversion isn’t successful, result will be null
.
Example
print
tohex(256) == '100',
tohex(-256) == 'ffffffffffffff00', // 64-bit 2's complement of -256
tohex(toint(-256), 8) == 'ffffff00', // 32-bit 2's complement of -256
tohex(256, 8) == '00000100',
tohex(256, 2) == '100' // Exceeds min length of 2, so min length is ignored.
Output
print_0 | print_1 | print_2 | print_3 | print_04 |
---|---|---|---|---|
true | true | true | true | true |
12.225 - toint()
Converts the input to an integer value (signed 32-bit) number representation.
Syntax
toint(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | scalar | ✔️ | The value to convert to an integer. |
Returns
If the conversion is successful, the result is an integer. Otherwise, the result is null
. If the input includes a decimal value, the result truncate to only the integer portion.
Example
Convert string to integer
The following example converts a string to an integer and checks if the converted value is equal to a specific integer.
print toint("123") == 123
|project Integer = print_0
Output
Integer |
---|
true |
Truncated integer
The following example inputs a decimal value and returns a truncated integer.
print toint(2.3)
|project Integer = print_0
Output
Integer |
---|
2 |
12.226 - tolong()
Converts the input value to a long (signed 64-bit) number representation.
Syntax
tolong(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | scalar | ✔️ | The value to convert to a long. |
Returns
If conversion is successful, the result is a long number.
If conversion isn’t successful, the result is null
.
Example
tolong("123") == 123
12.227 - tolower()
Converts the input string to lower case.
Syntax
tolower(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | string | ✔️ | The value to convert to a lowercase string. |
Returns
If conversion is successful, result is a lowercase string.
If conversion isn’t successful, result is null
.
Example
tolower("Hello") == "hello"
12.228 - toreal()
real
.Converts the input expression to a value of type real.
Syntax
toreal(
Expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | scalar | ✔️ | The value to convert to real. |
Returns
If conversion is successful, the result is a value of type real
. Otherwise, the returned value will be real(null)
.
Example
toreal("123.4") == 123.4
12.229 - tostring()
Converts the input to a string representation.
Syntax
tostring(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | scalar | ✔️ | The value to convert to a string. |
Returns
If value is non-null, the result is a string representation of value. If value is null, the result is an empty string.
Example
print tostring(123)
12.230 - totimespan()
timespan
scalar value.Converts the input to a timespan scalar value.
Syntax
totimespan(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | string | ✔️ | The value to convert to a timespan. |
Returns
If conversion is successful, result will be a timespan value. Else, result will be null.
Example
totimespan("0.00:01:00") == time(1min)
Related content
12.231 - toupper()
Converts a string to upper case.
Syntax
toupper(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | string | ✔️ | The value to convert to an uppercase string. |
Returns
If conversion is successful, result is an uppercase string.
If conversion isn’t successful, result is null
.
Example
toupper("hello") == "HELLO"
12.232 - translate()
Replaces a set of characters (‘searchList’) with another set of characters (‘replacementList’) in a given a string. The function searches for characters in the ‘searchList’ and replaces them with the corresponding characters in ‘replacementList’
Syntax
translate(
searchList,
replacementList,
source)
Parameters
Name | Type | Required | Description |
---|---|---|---|
searchList | string | ✔️ | The list of characters that should be replaced. |
replacementList | string | ✔️ | The list of characters that should replace the characters in searchList. |
source | string | ✔️ | A string to search. |
Returns
source after replacing all occurrences of characters in ‘replacementList’ with the corresponding characters in ‘searchList’
Examples
Input | Output |
---|---|
translate("abc", "x", "abc") | "xxx" |
translate("abc", "", "ab") | "" |
translate("krasp", "otsku", "spark") | "kusto" |
12.233 - treepath()
Enumerates all the path expressions that identify leaves in a dynamic object.
Syntax
treepath(
object)
Parameters
Name | Type | Required | Description |
---|---|---|---|
object | dynamic | ✔️ | A dynamic property bag object for which to enumerate the path expressions. |
Returns
An array of path expressions.
Examples
Expression | Evaluates to |
---|---|
treepath(parse_json('{"a":"b", "c":123}')) | ["['a']","['c']"] |
treepath(parse_json('{"prop1":[1,2,3,4], "prop2":"value2"}')) | ["['prop1']","['prop1'][0]","['prop2']"] |
treepath(parse_json('{"listProperty":[100,200,300,"abcde",{"x":"y"}]}')) | ["['listProperty']","['listProperty'][0]","['listProperty'][0]['x']"] |
12.234 - trim_end()
Removes trailing match of the specified regular expression.
Syntax
trim_end(
regex,
source)
Parameters
Name | Type | Required | Description |
---|---|---|---|
regex | string | ✔️ | The string or regular expression to be trimmed from the end of source. |
source | string | ✔️ | The source string from which to trim regex. |
Returns
source after trimming matches of regex found in the end of source.
Examples
The following statement trims substring from the end of string_to_trim.
let string_to_trim = @"bing.com";
let substring = ".com";
print string_to_trim = string_to_trim,trimmed_string = trim_end(substring,string_to_trim)
Output
string_to_trim | trimmed_string |
---|---|
bing.com | bing |
Trim non-alphanumeric characters
The following example trims all non-word characters from the end of the string.
print str = strcat("- ","Te st",x,@"// $")
| extend trimmed_str = trim_end(@"[^\w]+",str)
Output
str | trimmed_str |
---|---|
- Te st1// $ | - Te st1 |
- Te st2// $ | - Te st2 |
- Te st3// $ | - Te st3 |
- Te st4// $ | - Te st4 |
- Te st5// $ | - Te st5 |
Trim whitespace
The following example trims all spaces from the end of the string.
let string_to_trim = @" Hello, world! ";
let substring = @"\s+";
print
string_to_trim = string_to_trim,
trimmed_end = trim_end(substring, string_to_trim)
Output
string_to_trim | trimmed_end |
---|---|
Hello, world! | Hello, world! |
| Hello, world! | Hello, world!|
12.235 - trim_start()
Removes leading match of the specified regular expression.
Syntax
trim_start(
regex,
source)
Parameters
Name | Type | Required | Description |
---|---|---|---|
regex | string | ✔️ | The string or regular expression to be trimmed from the beginning of source. |
source | string | ✔️ | The source string from which to trim regex. |
Returns
source after trimming match of regex found in the beginning of source.
Examples
Trim specific substring
The following example trims substring from the start of string_to_trim.
let string_to_trim = @"https://bing.com";
let substring = "https://";
print string_to_trim = string_to_trim,trimmed_string = trim_start(substring,string_to_trim)
Output
string_to_trim | trimmed_string |
---|---|
https://bing.com | bing.com |
Trim non-alphanumeric characters
The following example trims all non-word characters from the beginning of the string.
range x from 1 to 5 step 1
| project str = strcat("- ","Te st",x,@"// $")
| extend trimmed_str = trim_start(@"[^\w]+",str)
Output
str | trimmed_str |
---|---|
- Te st1// $ | Te st1// $ |
- Te st2// $ | Te st2// $ |
- Te st3// $ | Te st3// $ |
- Te st4// $ | Te st4// $ |
- Te st5// $ | Te st5// $ |
Trim whitespace
The following example trims all spaces from the start of the string.
let string_to_trim = @" Hello, world! ";
let substring = @"\s+";
print
string_to_trim = string_to_trim,
trimmed_start = trim_start(substring, string_to_trim)
Output
string_to_trim | trimmed_start |
---|---|
Hello, world! | Hello, world! |
| Hello, world! |Hello, world! |
12.236 - trim()
Removes all leading and trailing matches of the specified regular expression.
Syntax
trim(
regex,
source)
Parameters
Name | Type | Required | Description |
---|---|---|---|
regex | string | ✔️ | The string or regular expression to be trimmed from source. |
source | string | ✔️ | The source string from which to trim regex. |
Returns
source after trimming matches of regex found in the beginning and/or the end of source.
Examples
Trim specific substring
The following example trims substring from the start and the end of the string_to_trim.
let string_to_trim = @"--https://bing.com--";
let substring = "--";
print string_to_trim = string_to_trim, trimmed_string = trim(substring,string_to_trim)
Output
string_to_trim | trimmed_string |
---|---|
--https://bing.com-- | https://bing.com |
Trim non-alphanumeric characters
The following example trims all non-word characters from start and end of the string.
range x from 1 to 5 step 1
| project str = strcat("- ","Te st",x,@"// $")
| extend trimmed_str = trim(@"[^\w]+",str)
Output
str | trimmed_str |
---|---|
- Te st1// $ | Te st1 |
- Te st2// $ | Te st2 |
- Te st3// $ | Te st3 |
- Te st4// $ | Te st4 |
- Te st5// $ | Te st5 |
Trim whitespaces
The next statement trims all spaces from start and end of the string.
let string_to_trim = @" Hello, world! ";
let substring = @"\s+";
print
string_to_trim = string_to_trim,
trimmed_string = trim(substring, string_to_trim)
Output
string_to_trim | trimmed_string |
---|---|
Hello, world! | Hello, world! |
12.237 - unicode_codepoints_from_string()
Returns a dynamic array of the Unicode codepoints of the input string. This function is the inverse operation of unicode_codepoints_to_string()
function.
Syntax
unicode_codepoints_from_string(
value)
Parameters
Name | Type | Required | Description |
---|---|---|---|
value | string | ✔️ | The source string to convert. |
Returns
Returns a dynamic array of the Unicode codepoints of the characters that make up the string provided to this function.
See unicode_codepoints_to_string()
)
Examples
print arr = unicode_codepoints_from_string("⒦⒰⒮⒯⒪")
Output
arr |
---|
[9382, 9392, 9390, 9391, 9386] |
print arr = unicode_codepoints_from_string("קוסטו - Kusto")
Output
arr |
---|
[1511, 1493, 1505, 1496, 1493, 32, 45, 32, 75, 117, 115, 116, 111] |
print str = unicode_codepoints_to_string(unicode_codepoints_from_string("Kusto"))
Output
str |
---|
Kusto |
12.238 - unicode_codepoints_to_string()
Returns the string represented by the Unicode codepoints. This function is the inverse operation of unicode_codepoints_from_string()
function.
Syntax
unicode_codepoints_to_string (
values)
Parameters
Name | Type | Required | Description |
---|---|---|---|
values | int, long, or dynamic | ✔️ | One or more comma-separated values to convert. The values may also be a dynamic array. |
Returns
Returns the string made of the UTF characters whose Unicode codepoint value is provided by the arguments to this function. The input must consist of valid Unicode codepoints.
If any argument isn’t a valid Unicode codepoint, the function returns null
.
Examples
print str = unicode_codepoints_to_string(75, 117, 115, 116, 111)
Output
str |
---|
Kusto |
print str = unicode_codepoints_to_string(dynamic([75, 117, 115, 116, 111]))
Output
str |
---|
Kusto |
print str = unicode_codepoints_to_string(dynamic([75, 117, 115]), 116, 111)
Output
str |
---|
Kusto |
print str = unicode_codepoints_to_string(75, 10, 117, 10, 115, 10, 116, 10, 111)
Output
str |
---|
K u s t o |
print str = unicode_codepoints_to_string(range(48,57), range(65,90), range(97,122))
Output
str |
---|
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz |
12.239 - unixtime_microseconds_todatetime()
Converts unix-epoch microseconds to UTC datetime.
Syntax
unixtime_microseconds_todatetime(
microseconds)
Parameters
Name | Type | Required | Description |
---|---|---|---|
microseconds | real | ✔️ | The epoch timestamp in microseconds. A datetime value that occurs before the epoch time (1970-01-01 00:00:00) has a negative timestamp value. |
Returns
If the conversion is successful, the result is a datetime value. Otherwise, the result is null.
Example
print date_time = unixtime_microseconds_todatetime(1546300800000000)
Output
date_time |
---|
2019-01-01 00:00:00.0000000 |
Related content
- Convert unix-epoch seconds to UTC datetime using unixtime_seconds_todatetime().
- Convert unix-epoch milliseconds to UTC datetime using unixtime_milliseconds_todatetime().
- Convert unix-epoch nanoseconds to UTC datetime using unixtime_nanoseconds_todatetime().
12.240 - unixtime_milliseconds_todatetime()
Converts unix-epoch milliseconds to UTC datetime.
Syntax
unixtime_milliseconds_todatetime(
milliseconds)
Parameters
Name | Type | Required | Description |
---|---|---|---|
milliseconds | real | ✔️ | The epoch timestamp in microseconds. A datetime value that occurs before the epoch time (1970-01-01 00:00:00) has a negative timestamp value. |
Returns
If the conversion is successful, the result is a datetime value. Otherwise, the result is null.
Example
print date_time = unixtime_milliseconds_todatetime(1546300800000)
Output
date_time |
---|
2019-01-01 00:00:00.0000000 |
Related content
- Convert unix-epoch seconds to UTC datetime using unixtime_seconds_todatetime().
- Convert unix-epoch microseconds to UTC datetime using unixtime_microseconds_todatetime().
- Convert unix-epoch nanoseconds to UTC datetime using unixtime_nanoseconds_todatetime().
12.241 - unixtime_nanoseconds_todatetime()
Converts unix-epoch nanoseconds to UTC datetime.
Syntax
unixtime_nanoseconds_todatetime(
nanoseconds)
Parameters
Name | Type | Required | Description |
---|---|---|---|
nanoseconds | real | ✔️ | The epoch timestamp in nanoseconds. A datetime value that occurs before the epoch time (1970-01-01 00:00:00) has a negative timestamp value. |
Returns
If the conversion is successful, the result is a datetime value. Otherwise, the result is null.
Example
print date_time = unixtime_nanoseconds_todatetime(1546300800000000000)
Output
date_time |
---|
2019-01-01 00:00:00.0000000 |
Related content
- Convert unix-epoch seconds to UTC datetime using unixtime_seconds_todatetime().
- Convert unix-epoch milliseconds to UTC datetime using unixtime_milliseconds_todatetime().
- Convert unix-epoch microseconds to UTC datetime using unixtime_microseconds_todatetime().
12.242 - unixtime_seconds_todatetime()
Converts unix-epoch seconds to UTC datetime.
Syntax
unixtime_seconds_todatetime(
seconds)
Parameters
Name | Type | Required | Description |
---|---|---|---|
seconds | real | ✔️ | The epoch timestamp in seconds. A datetime value that occurs before the epoch time (1970-01-01 00:00:00) has a negative timestamp value. |
Returns
If the conversion is successful, the result is a datetime value. Otherwise, the result is null.
Example
print date_time = unixtime_seconds_todatetime(1546300800)
Output
date_time |
---|
2019-01-01 00:00:00.0000000 |
Related content
- Convert unix-epoch milliseconds to UTC datetime using unixtime_milliseconds_todatetime().
- Convert unix-epoch microseconds to UTC datetime using unixtime_microseconds_todatetime().
- Convert unix-epoch nanoseconds to UTC datetime using unixtime_nanoseconds_todatetime().
12.243 - url_decode()
The function converts an encoded URL into a regular URL representation.
For more information about URL encoding and decoding, see Percent-encoding.
Syntax
url_decode(
encoded_url)
Parameters
Name | Type | Required | Description |
---|---|---|---|
encoded_url | string | ✔️ | The encoded URL to decode. |
Returns
URL (string) in a regular representation.
Example
let url = @'https%3a%2f%2fwww.bing.com%2f';
print original = url, decoded = url_decode(url)
Output
original | decoded |
---|---|
https%3a%2f%2fwww.bing.com%2f | https://www.bing.com/ |
12.244 - url_encode_component()
The function converts characters of the input URL into a format that can be transmitted over the internet. Differs from url_encode by encoding spaces as ‘%20’ and not as ‘+’.
For more information about URL encoding and decoding, see Percent-encoding.
Syntax
url_encode_component(
url)
Parameters
Name | Type | Required | Description |
---|---|---|---|
url | string | ✔️ | The URL to encode. |
Returns
URL (string) converted into a format that can be transmitted over the Internet.
Example
let url = @'https://www.bing.com/hello world/';
print original = url, encoded = url_encode_component(url)
Output
original | encoded |
---|---|
https://www.bing.com/hello world/ | https%3a%2f%2fwww.bing.com%2fhello%20world |
12.245 - url_encode()
The function converts characters of the input URL into a format that can be transmitted over the internet. Differs from url_encode_component by encoding spaces as ‘+’ and not as ‘%20’ (see application/x-www-form-urlencoded here).
For more information about URL encoding and decoding, see Percent-encoding.
Syntax
url_encode(
url)
Parameters
Name | Type | Required | Description |
---|---|---|---|
url | string | ✔️ | The URL to encode. |
Returns
URL (string) converted into a format that can be transmitted over the Internet.
Examples
let url = @'https://www.bing.com/hello world';
print original = url, encoded = url_encode(url)
Output
original | encoded |
---|---|
https://www.bing.com/hello world/ | https%3a%2f%2fwww.bing.com%2fhello+world |
12.246 - week_of_year()
Returns an integer that represents the week number. The week number is calculated from the first week of a year, which is the one that includes the first Thursday, according to ISO 8601.
Deprecated aliases: weekofyear()
Syntax
week_of_year(
date)
Parameters
Name | Type | Required | Description |
---|---|---|---|
date | datetime | ✔️ | The date for which to return the week of the year. |
Returns
week number
- The week number that contains the given date.
Examples
Input | Output |
---|---|
week_of_year(datetime(2020-12-31)) | 53 |
week_of_year(datetime(2020-06-15)) | 25 |
week_of_year(datetime(1970-01-01)) | 1 |
week_of_year(datetime(2000-01-01)) | 52 |
The current version of this function, week_of_year()
, is ISO 8601 compliant; the first week of a year is defined as the week with the year’s first Thursday in it.
The current version of this function, week_of_year()
, is ISO 8601 compliant; the first week of a year is defined as the week with the year’s first Thursday in it.
12.247 - welch_test()
Computes the p_value of the Welch-test function
Syntax
welch_test(
mean1,
variance1,
count1,
mean2,
variance2,
count2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
mean1 | real or long | ✔️ | The mean (average) value of the first series. |
variance1 | real or long | ✔️ | The variance value of the first series. |
count1 | real or long | ✔️ | The count of values in the first series. |
mean2 | real or long | ✔️ | The mean (average) value of the second series. |
variance2 | real or long | ✔️ | The variance value of the second series. |
count2 | real or long | ✔️ | The count of values in the second series. |
Returns
From Wikipedia:
In statistics, Welch’s t-test is a two-sample location test that’s used to test the hypothesis that two populations have equal means. Welch’s t-test is an adaptation of Student’s t-test, and is more reliable when the two samples have unequal variances and unequal sample sizes. These tests are often referred to as “unpaired” or “independent samples” t-tests. The tests are typically applied when the statistical units underlying the two samples being compared are non-overlapping. Welch’s t-test is less popular than Student’s t-test, and may be less familiar to readers. The test is also called “Welch’s unequal variances t-test”, or “unequal variances t-test”.
Example
// s1, s2 values are from https://en.wikipedia.org/wiki/Welch%27s_t-test
print
s1 = dynamic([27.5, 21.0, 19.0, 23.6, 17.0, 17.9, 16.9, 20.1, 21.9, 22.6, 23.1, 19.6, 19.0, 21.7, 21.4]),
s2 = dynamic([27.1, 22.0, 20.8, 23.4, 23.4, 23.5, 25.8, 22.0, 24.8, 20.2, 21.9, 22.1, 22.9, 20.5, 24.4])
| mv-expand s1 to typeof(double), s2 to typeof(double)
| summarize m1=avg(s1), v1=variance(s1), c1=count(), m2=avg(s2), v2=variance(s2), c2=count()
| extend pValue=welch_test(m1,v1,c1,m2,v2,c2)
// pValue = 0.021
12.248 - zip()
The zip
function accepts any number of dynamic
arrays, and returns an
array whose elements are each an array holding the elements of the input
arrays of the same index.
Syntax
zip(
arrays)
Parameters
Name | Type | Required | Description |
---|---|---|---|
arrays | dynamic | ✔️ | The dynamic array values to zip. The function accepts between 2-16 arrays. |
Examples
print zip(dynamic([1,3,5]), dynamic([2,4,6]))
Output
print_0 |
---|
[[1,2],[3,4],[5,6]] |
print zip(dynamic(["A", 1, 1.5]), dynamic([{}, "B"]))
Output
print_0 |
---|
[["A",{}], [1,"B"], [1.5, null]] |
datatable(a:int, b:string) [1,"one",2,"two",3,"three"]
| summarize a = make_list(a), b = make_list(b)
| project zip(a, b)
Output
print_0 |
---|
[[1,"one"],[2,"two"],[3,"three"]] |
12.249 - zlib_compress_to_base64_string
Performs zlib compression and encodes the result to base64.
Syntax
zlib_compress_to_base64_string(
string)
Parameters
Name | Type | Required | Description |
---|---|---|---|
string | string | ✔️ | The string to be compressed and base64 encoded. |
Returns
- Returns a
string
that represents zlib-compressed and base64-encoded original string. - Returns an empty result if compression or encoding failed.
Example
Using Kusto Query Language
print zcomp = zlib_compress_to_base64_string("1234567890qwertyuiop")
Output
zcomp |
---|
“eAEBFADr/zEyMzQ1Njc4OTBxd2VydHl1aW9wOAkGdw==” |
Using Python
Compression can be done using other tools, for example Python.
print(base64.b64encode(zlib.compress(b'<original_string>')))
Related content
- Use zlib_decompress_from_base64_string() to retrieve the original uncompressed string.
12.250 - zlib_decompress_from_base64_string()
Decodes the input string from base64 and performs zlib decompression.
Syntax
zlib_decompress_from_base64_string(
string)
Parameters
Name | Type | Required | Description |
---|---|---|---|
string | string | ✔️ | The string to decode. The string should have been compressed with zlib and then base64-encoded. |
Returns
- Returns a
string
that represents the original string. - Returns an empty result if decompression or decoding failed.
- For example, invalid zlib-compressed and base 64-encoded strings will return an empty output.
Examples
Valid input
print zcomp = zlib_decompress_from_base64_string("eJwLSS0uUSguKcrMS1cwNDIGACxqBQ4=")
Output
zcomp |
---|
Test string 123 |
Invalid input
print zcomp = zlib_decompress_from_base64_string("x0x0x0")
Output
zcomp |
---|
Related content
- Create a compressed input string with zlib_compress_to_base64_string().
13 - Scalar operators
13.1 - Bitwise (binary) operators
Kusto support several bitwise (binary) operators between integers:
13.2 - Datetime / timespan arithmetic
Kusto supports performing arithmetic operations on values of types datetime
and timespan
.
Supported operations
One can subtract (but not add) two
datetime
values to get atimespan
value expressing their difference. For example,datetime(1997-06-25) - datetime(1910-06-11)
is how old was Jacques-Yves Cousteau when he died.One can add or subtract two
timespan
values to get atimespan
value which is their sum or difference. For example,1d + 2d
is three days.One can add or subtract a
timespan
value from adatetime
value. For example,datetime(1910-06-11) + 1d
is the date Cousteau turned one day old.One can divide two
timespan
values to get their quotient. For example,1d / 5h
gives4.8
. This gives one the ability to express anytimespan
value as a multiple of anothertimespan
value. For example, to express an hour in seconds, simply divide1h
by1s
:1h / 1s
(with the obvious result,3600
).Conversely, one can multiple a numeric value (such as
double
andlong
) by atimespan
value to get atimespan
value. For example, one can express an hour and a half as1.5 * 1h
.
Examples
Unix time, which is also known as POSIX time or UNIX Epoch time, is a system for describing a point in time as the number of seconds that have elapsed since 00:00:00 Thursday, 1 January 1970, Coordinated Universal Time (UTC), minus leap seconds.
If your data includes representation of Unix time as an integer, or you require converting to it, the following functions are available.
From Unix time
let fromUnixTime = (t: long) {
datetime(1970-01-01) + t * 1sec
};
print result = fromUnixTime(1546897531)
Output
result |
---|
2019-01-07 21:45:31.0000000 |
To Unix time
let toUnixTime = (dt: datetime) {
(dt - datetime(1970-01-01)) / 1s
};
print result = toUnixTime(datetime(2019-01-07 21:45:31.0000000))
Output
result |
---|
1546897531 |
Related content
For unix-epoch time conversions, see the following functions:
13.3 - Logical (binary) operators
The following logical operators can be used to perform comparisons and evaluations:
Operator name | Syntax | Meaning |
---|---|---|
Equality | == | Returns true if both operands are non-null and equal to each other. Otherwise, returns false . |
Inequality | != | Returns true if any of the operands are null or if the operands aren’t equal to each other. Otherwise, returns false . |
Logical and | and | Returns true only if both operands are true . The logical and has higher precedence than the logical or . |
Logical or | or | Returns true if either of the operands is true , regardless of the other operand. |
How logical operators work with null values
Null values adhere to the following rules:
Operation | Result |
---|---|
bool(null) == bool(null) | false |
bool(null) != bool(null) | false |
bool(null) and true | false |
bool(null) or true | true |
Examples
Equality
The following query returns a count of all storm events where the event type is “Tornado”.
StormEvents
| where EventType == "Tornado"
| count
Output
Count |
---|
1238 |
Inequality
The following query returns a count of all storm events where the event type isn’t “Tornado”.
StormEvents
| where EventType != "Tornado"
| count
Output
Count |
---|
57828 |
Logical and
The following query returns a count of all storm events where the event type is “Tornado” and the state is “KANSAS”.
StormEvents
| where EventType == "Tornado" and State == "KANSAS"
| count
Output
Count |
---|
161 |
Logical or
The following query returns a count of all storm events where the event type is “Tornado” or “Thunderstorm Wind”.
StormEvents
| where EventType == "Tornado" or EventType == "Thunderstorm Wind"
| count
Output
Count |
---|
14253 |
Null values
The following query shows that null values are treated as false.
print print=iff(bool(null) and true, true, false)
Output
false |
Related content
13.4 - Numerical operators
The types int
, long
, and real
represent numerical types.
The following operators can be used between pairs of these types:
Operator | Description | Example |
---|---|---|
+ | Add | 3.14 + 3.14 , ago(5m) + 5m |
- | Subtract | 0.23 - 0.22 , |
* | Multiply | 1s * 5 , 2 * 2 |
/ | Divide | 10m / 1s , 4 / 2 |
% | Modulo | 4 % 2 |
< | Less | 1 < 10 , 10sec < 1h , now() < datetime(2100-01-01) |
> | Greater | 0.23 > 0.22 , 10min > 1sec , now() > ago(1d) |
== | Equals | 1 == 1 |
!= | Not equals | 1 != 0 |
<= | Less or Equal | 4 <= 5 |
>= | Greater or Equal | 5 >= 4 |
in | Equals to one of the elements | see here |
!in | Not equals to any of the elements | see here |
Type rules for arithmetic operations
The data type of the result of an arithmetic operation is determined by the data types of the operands. If one of the operands is of type real
, the result will be of type real
. If both operands are of integer types (int
or long
), the result will be of type long
.
Due to these rules, the result of division operations that only involve integers will be truncated to an integer, which might not always be what you want. To avoid truncation, convert at least one of the integer values to real
using the todouble() before performing the operation.
The following examples illustrate how the operand types affect the result type in division operations.
Operation | Result | Description |
---|---|---|
1.0 / 2 | 0.5 | One of the operands is of type real , so the result is real . |
1 / 2.0 | 0.5 | One of the operands is of type real , so the result is real . |
1 / 2 | 0 | Both of the operands are of type int , so the result is int . Integer division occurs and the decimal is truncated, resulting in 0 instead of 0.5 , as one might expect. |
real(1) / 2 | 0.5 | To avoid truncation due to integer division, one of the int operands was first converted to real using the real() function. |
Comment about the modulo operator
The modulo of two numbers always returns in Kusto a “small non-negative number”. Thus, the modulo of two numbers, N % D, is such that: 0 ≤ (N % D) < abs(D).
For example, the following query:
print plusPlus = 14 % 12, minusPlus = -14 % 12, plusMinus = 14 % -12, minusMinus = -14 % -12
Produces this result:
plusPlus | minusPlus | plusMinus | minusMinus |
---|---|---|---|
2 | 10 | 2 | 10 |
13.5 - Between operators
13.5.1 - The !between operator
Matches the input that is outside of the inclusive range.
!between
can operate on any numeric, datetime, or timespan expression.
Syntax
T |
where
expr !between
(
leftRange..
rightRange)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be matched. |
expr | scalar | ✔️ | The expression to filter. |
leftRange | int, long, real, or datetime | ✔️ | The expression of the left range. The range is inclusive. |
rightRange | int, long, real, datetime, or timespan | ✔️ | The expression of the right range. The range is inclusive. This value can only be of type timespan if expr and leftRange are both of type datetime . See example. |
Returns
Rows in T for which the predicate of (expr < leftRange or expr > rightRange) evaluates to true
.
Examples
Filter numeric values
range x from 1 to 10 step 1
| where x !between (5 .. 9)
Output
x |
---|
1 |
2 |
3 |
4 |
10 |
Filter datetime
StormEvents
| where StartTime !between (datetime(2007-07-27) .. datetime(2007-07-30))
| count
Output
Count |
---|
58590 |
Filter datetime using a timespan range
StormEvents
| where StartTime !between (datetime(2007-07-27) .. 3d)
| count
Output
Count |
---|
58590 |
13.5.2 - The between operator
Filters a record set for data matching the values in an inclusive range.
between
can operate on any numeric, datetime, or timespan expression.
Syntax
T |
where
expr between
(
leftRange..
rightRange)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be matched. For example, the table name. |
expr | scalar | ✔️ | The expression used to filter. |
leftRange | int, long, real, or datetime | ✔️ | The expression of the left range. The range is inclusive. |
rightRange | int, long, real, datetime, or timespan | ✔️ | The expression of the right range. The range is inclusive. This value can only be of type timespan if expr and leftRange are both of type datetime . See example. |
Returns
Rows in T for which the predicate of (expr >= leftRange and expr <= rightRange) evaluates to true
.
Examples
Filter numeric values
range x from 1 to 100 step 1
| where x between (50 .. 55)
Output
x |
---|
50 |
51 |
52 |
53 |
54 |
55 |
Filter by date
StormEvents
| where StartTime between (datetime(2007-07-27) .. datetime(2007-07-30))
| count
Output
Count |
---|
476 |
Filter by date and time
StormEvents
| where StartTime between (datetime(2007-12-01T01:30:00) .. datetime(2007-12-01T08:00:00))
| count
Output
Count |
---|
301 |
Filter using a timespan range
StormEvents
| where StartTime between (datetime(2007-07-27) .. 3d)
| count
Output
Count |
---|
476 |
13.6 - in operators
13.6.1 - The case-insensitive !in~ string operator
Filters a record set for data without a case-insensitive string.
Performance tips
When possible, use the case-sensitive !in.
Syntax
T |
where
col !in~
(
expression,
… )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to filter. |
col | string | ✔️ | The column by which to filter. |
expression | scalar or tabular | ✔️ | An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search will consider up to 1,000,000 distinct values. |
Returns
Rows in T for which the predicate is true
.
Example
List of scalars
The following query shows how to use !in~
with a comma-separated list of scalar values.
StormEvents
| where State !in~ ("Florida", "Georgia", "New York")
| count
Output
Count |
---|
54,291 |
Dynamic array
The following query shows how to use !in~
with a dynamic array.
StormEvents
| where State !in~ (dynamic(["Florida", "Georgia", "New York"]))
| count
Output
Count |
---|
54291 |
The same query can also be written with a let statement.
let states = dynamic(["Florida", "Georgia", "New York"]);
StormEvents
| where State !in~ (states)
| summarize count() by State
Output
Count |
---|
54291 |
Tabular expression
The following query shows how to use !in~
with an inline tabular expression. Notice that an inline tabular expression must be enclosed with double parentheses.
StormEvents
| where State !in~ (PopulationData | where Population > 5000000 | project State)
| summarize count() by State
Output
State | count_ |
---|---|
KANSAS | 3166 |
IOWA | 2337 |
NEBRASKA | 1766 |
OKLAHOMA | 1716 |
SOUTH DAKOTA | 1567 |
… | … |
The same query can also be written with a let statement. Notice that the double parentheses as provided in the last example aren’t necessary in this case.
let large_states = PopulationData | where Population > 5000000 | project State;
StormEvents
| where State !in~ (large_states)
| summarize count() by State
Output
State | count_ |
---|---|
KANSAS | 3166 |
IOWA | 2337 |
NEBRASKA | 1766 |
OKLAHOMA | 1716 |
SOUTH DAKOTA | 1567 |
… | … |
13.6.2 - The case-insensitive in~ string operator
Filters a record set for data with a case-insensitive string.
Performance tips
When possible, use the case-sensitive in.
Syntax
T |
where
col in~
(
expression,
… )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to filter. |
col | string | ✔️ | The column by which to filter. |
expression | scalar or tabular | ✔️ | An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search will consider up to 1,000,000 distinct values. |
Returns
Rows in T for which the predicate is true
.
Examples
List of scalars
The following query shows how to use in~
with a comma-separated list of scalar values.
StormEvents
| where State in~ ("FLORIDA", "georgia", "NEW YORK")
| count
Output
Count |
---|
4775 |
Dynamic array
The following query shows how to use in~
with a dynamic array.
StormEvents
| where State in~ (dynamic(["FLORIDA", "georgia", "NEW YORK"]))
| count
Output
Count |
---|
4775 |
The same query can also be written with a let statement.
let states = dynamic(["FLORIDA", "georgia", "NEW YORK"]);
StormEvents
| where State has_any (states)
| summarize count() by State
Output
Count |
---|
4775 |
Tabular expression
The following query shows how to use in~
with an inline tabular expression. Notice that an inline tabular expression must be enclosed with double parentheses.
StormEvents
| where State in~ (PopulationData | where Population > 5000000 | project State)
| summarize count() by State
Output
State | count_ |
---|---|
TEXAS | 4701 |
ILLINOIS | 2022 |
MISSOURI | 2016 |
GEORGIA | 1983 |
MINNESOTA | 1881 |
… | … |
The same query can also be written with a let statement. Notice that the double parentheses as provided in the last example aren’t necessary in this case.
let large_states = PopulationData | where Population > 5000000 | project State;
StormEvents
| where State in~ (large_states)
| summarize count() by State
Output
State | count_ |
---|---|
TEXAS | 4701 |
ILLINOIS | 2022 |
MISSOURI | 2016 |
GEORGIA | 1983 |
MINNESOTA | 1881 |
… | … |
13.6.3 - The case-sensitive !in string operator
Filters a record set for data without a case-sensitive string.
Performance tips
Syntax
T |
where
col !in
(
expression,
… )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to filter. |
col | string | ✔️ | The column by which to filter. |
expression | scalar or tabular | ✔️ | An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search will consider up to 1,000,000 distinct values. |
Returns
Rows in T for which the predicate is true
.
Example
List of scalars
The following query shows how to use !in
with a comma-separated list of scalar values.
StormEvents
| where State !in ("FLORIDA", "GEORGIA", "NEW YORK")
| count
Output
Count |
---|
54291 |
Dynamic array
The following query shows how to use !in
with a dynamic array.
StormEvents
| where State !in (dynamic(["FLORIDA", "GEORGIA", "NEW YORK"]))
| count
Output
Count |
---|
54291 |
The same query can also be written with a let statement.
let states = dynamic(["FLORIDA", "GEORGIA", "NEW YORK"]);
StormEvents
| where State !in (states)
| summarize count() by State
Output
Count |
---|
54291 |
Tabular expression
The following query shows how to use !in
with an inline tabular expression. Notice that an inline tabular expression must be enclosed with double parentheses.
StormEvents
| where State !in (PopulationData | where Population > 5000000 | project State)
| summarize count() by State
Output
State | Count |
---|---|
KANSAS | 3166 |
IOWA | 2337 |
NEBRASKA | 1766 |
OKLAHOMA | 1716 |
SOUTH DAKOTA | 1567 |
… | … |
The same query can also be written with a let statement. Notice that the double parentheses as provided in the last example aren’t necessary in this case.
let large_states = PopulationData | where Population > 5000000 | project State;
StormEvents
| where State !in (large_states)
| summarize count() by State
Output
State | Count |
---|---|
KANSAS | 3166 |
IOWA | 2337 |
NEBRASKA | 1766 |
OKLAHOMA | 1716 |
SOUTH DAKOTA | 1567 |
… | … |
13.6.4 - The case-sensitive in string operator
Filters a record set for data with a case-sensitive string.
Performance tips
Syntax
T |
where
col in
(
expression,
… )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to filter. |
col | string | ✔️ | The column by which to filter. |
expression | scalar or tabular | ✔️ | An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search considers up to 1,000,000 distinct values. |
Returns
Rows in T for which the predicate is true
.
Examples
List of scalars
The following query shows how to use in
with a list of scalar values.
StormEvents
| where State in ("FLORIDA", "GEORGIA", "NEW YORK")
| count
Output
Count |
---|
4775 |
Dynamic array
The following query shows how to use in
with a dynamic array.
let states = dynamic(['FLORIDA', 'ATLANTIC SOUTH', 'GEORGIA']);
StormEvents
| where State in (states)
| count
Output
Count |
---|
3218 |
Tabular expression
The following query shows how to use in
with a tabular expression.
let Top_5_States =
StormEvents
| summarize count() by State
| top 5 by count_;
StormEvents
| where State in (Top_5_States)
| count
The same query can be written with an inline tabular expression statement.
StormEvents
| where State in (
StormEvents
| summarize count() by State
| top 5 by count_
)
| count
Output
Count |
---|
14242 |
Top with other example
The following example identifies the top five states with lightning events and uses the iff()
function and in
operator to classify lightning events by the top five states, labeled by state name, and all others labeled as “Other.”
let Lightning_By_State = materialize(StormEvents
| summarize lightning_events = countif(EventType == 'Lightning') by State);
let Top_5_States = Lightning_By_State | top 5 by lightning_events | project State;
Lightning_By_State
| extend State = iff(State in (Top_5_States), State, "Other")
| summarize sum(lightning_events) by State
Output
State | sum_lightning_events |
---|---|
ALABAMA | 29 |
WISCONSIN | 31 |
TEXAS | 55 |
FLORIDA | 85 |
GEORGIA | 106 |
Other | 415 |
Use a static list returned by a function
The following example counts events from the StormEvents
table based on a predefined list of interesting states. The interesting states are defined by the InterestingStates()
function.
StormEvents
| where State in (InterestingStates())
| count
Output
Count |
---|
4775 |
The following query displays which states are considered interesting by the InterestingStates()
function.
.show function InterestingStates
Output
Name | Parameters | Body | Folder | DocString |
---|---|---|---|---|
InterestingStates | () | { dynamic([“WASHINGTON”, “FLORIDA”, “GEORGIA”, “NEW YORK”]) } |
Related content
13.7 - String operators
13.7.1 - matches regex operator
Filters a record set based on a case-sensitive regular expression value.
For more information about other operators and to determine which operator is most appropriate for your query, see datatype string operators.
Syntax
T |
where
col matches
regex
(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
col | string | ✔️ | The column by which to filter. |
expression | scalar | ✔️ | The regular expression /Query/Data%20types/real.md used to filter. The maximum number of regex groups is 16. For more information about the regex syntax supported by Kusto, see regular expression. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where State matches regex "K.*S"
| where event_count > 10
| project State, event_count
Output
State | event_count |
---|---|
KANSAS | 3166 |
ARKANSAS | 1028 |
LAKE SUPERIOR | 34 |
LAKE ST CLAIR | 32 |
13.7.2 - String operators
Kusto Query Language (KQL) offers various query operators for searching string data types. The following article describes how string terms are indexed, lists the string query operators, and gives tips for optimizing performance.
Understanding string terms
Kusto indexes all columns, including columns of type string
. Multiple indexes are built for such columns, depending on the actual data. These indexes aren’t directly exposed, but are used in queries with the string
operators that have has
as part of their name, such as has
, !has
, hasprefix
, !hasprefix
. The semantics of these operators are dictated by the way the column is encoded. Instead of doing a “plain” substring match, these operators match terms.
What is a term?
By default, each string
value is broken into maximal sequences of alphanumeric characters, and each of those sequences is made into a term.
For example, in the following string
, the terms are Kusto
, KustoExplorerQueryRun
, and the following substrings: ad67d136
, c1db
, 4f9f
, 88ef
, d94f3b6b0b5a
.
Kusto: ad67d136-c1db-4f9f-88ef-d94f3b6b0b5a;KustoExplorerQueryRun
Kusto builds a term index consisting of all terms that are three characters or more, and this index is used by string operators such as has
, !has
, and so on. If the query looks for a term that is smaller than three characters, or uses a contains
operator, then the query will revert to scanning the values in the column. Scanning is much slower than looking up the term in the term index.
Operators on strings
The following abbreviations are used in this article:
- RHS = right hand side of the expression
- LHS = left hand side of the expression
Operators with an _cs
suffix are case sensitive.
Operator | Description | Case-Sensitive | Example (yields true ) |
---|---|---|---|
== | Equals | Yes | "aBc" == "aBc" |
!= | Not equals | Yes | "abc" != "ABC" |
=~ | Equals | No | "abc" =~ "ABC" |
!~ | Not equals | No | "aBc" !~ "xyz" |
contains | RHS occurs as a subsequence of LHS | No | "FabriKam" contains "BRik" |
!contains | RHS doesn’t occur in LHS | No | "Fabrikam" !contains "xyz" |
contains_cs | RHS occurs as a subsequence of LHS | Yes | "FabriKam" contains_cs "Kam" |
!contains_cs | RHS doesn’t occur in LHS | Yes | "Fabrikam" !contains_cs "Kam" |
endswith | RHS is a closing subsequence of LHS | No | "Fabrikam" endswith "Kam" |
!endswith | RHS isn’t a closing subsequence of LHS | No | "Fabrikam" !endswith "brik" |
endswith_cs | RHS is a closing subsequence of LHS | Yes | "Fabrikam" endswith_cs "kam" |
!endswith_cs | RHS isn’t a closing subsequence of LHS | Yes | "Fabrikam" !endswith_cs "brik" |
has | Right-hand-side (RHS) is a whole term in left-hand-side (LHS) | No | "North America" has "america" |
!has | RHS isn’t a full term in LHS | No | "North America" !has "amer" |
has_all | Same as has but works on all of the elements | No | "North and South America" has_all("south", "north") |
has_any | Same as has but works on any of the elements | No | "North America" has_any("south", "north") |
has_cs | RHS is a whole term in LHS | Yes | "North America" has_cs "America" |
!has_cs | RHS isn’t a full term in LHS | Yes | "North America" !has_cs "amer" |
hasprefix | RHS is a term prefix in LHS | No | "North America" hasprefix "ame" |
!hasprefix | RHS isn’t a term prefix in LHS | No | "North America" !hasprefix "mer" |
hasprefix_cs | RHS is a term prefix in LHS | Yes | "North America" hasprefix_cs "Ame" |
!hasprefix_cs | RHS isn’t a term prefix in LHS | Yes | "North America" !hasprefix_cs "CA" |
hassuffix | RHS is a term suffix in LHS | No | "North America" hassuffix "ica" |
!hassuffix | RHS isn’t a term suffix in LHS | No | "North America" !hassuffix "americ" |
hassuffix_cs | RHS is a term suffix in LHS | Yes | "North America" hassuffix_cs "ica" |
!hassuffix_cs | RHS isn’t a term suffix in LHS | Yes | "North America" !hassuffix_cs "icA" |
in | Equals to any of the elements | Yes | "abc" in ("123", "345", "abc") |
!in | Not equals to any of the elements | Yes | "bca" !in ("123", "345", "abc") |
in~ | Equals to any of the elements | No | "Abc" in~ ("123", "345", "abc") |
!in~ | Not equals to any of the elements | No | "bCa" !in~ ("123", "345", "ABC") |
matches regex | LHS contains a match for RHS | Yes | "Fabrikam" matches regex "b.*k" |
startswith | RHS is an initial subsequence of LHS | No | "Fabrikam" startswith "fab" |
!startswith | RHS isn’t an initial subsequence of LHS | No | "Fabrikam" !startswith "kam" |
startswith_cs | RHS is an initial subsequence of LHS | Yes | "Fabrikam" startswith_cs "Fab" |
!startswith_cs | RHS isn’t an initial subsequence of LHS | Yes | "Fabrikam" !startswith_cs "fab" |
Performance tips
For better performance, when there are two operators that do the same task, use the case-sensitive one. For example:
- Use
==
, not=~
- Use
in
, notin~
- Use
hassuffix_cs
, nothassuffix
For faster results, if you’re testing for the presence of a symbol or alphanumeric word that is bound by non-alphanumeric characters, or the start or end of a field, use has
or in
.
has
works faster than contains
, startswith
, or endswith
.
To search for IPv4 addresses or their prefixes, use one of special operators on IPv4 addresses, which are optimized for this purpose.
For more information, see Query best practices.
For example, the first of these queries will run faster:
StormEvents | where State has "North" | count;
StormEvents | where State contains "nor" | count
Operators on IPv4 addresses
The following group of operators provide index accelerated search on IPv4 addresses or their prefixes.
Operator | Description | Example (yields true ) |
---|---|---|
has_ipv4 | LHS contains IPv4 address represented by RHS | has_ipv4("Source address is 10.1.2.3:1234", "10.1.2.3") |
has_ipv4_prefix | LHS contains an IPv4 address that matches a prefix represented by RHS | has_ipv4_prefix("Source address is 10.1.2.3:1234", "10.1.2.") |
has_any_ipv4 | LHS contains one of IPv4 addresses provided by RHS | has_any_ipv4("Source address is 10.1.2.3:1234", dynamic(["10.1.2.3", "127.0.0.1"])) |
has_any_ipv4_prefix | LHS contains an IPv4 address that matches one of prefixes provided by RHS | has_any_ipv4_prefix("Source address is 10.1.2.3:1234", dynamic(["10.1.2.", "127.0.0."])) |
13.7.3 - The case-insensitive !~ (not equals) string operator
Filters a record set for data that doesn’t match a case-insensitive string.
The following table provides a comparison of the ==
(equals) operators:
Operator | Description | Case-Sensitive | Example (yields true ) |
---|---|---|---|
== | Equals | Yes | "aBc" == "aBc" |
!= | Not equals | Yes | "abc" != "ABC" |
=~ | Equals | No | "abc" =~ "ABC" |
!~ | Not equals | No | "aBc" !~ "xyz" |
For more information about other operators and to determine which operator is most appropriate for your query, see datatype string operators.
Performance tips
When possible, use the case-sensitive !=.
Syntax
T |
where
column !~
(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
column | string | ✔️ | The column by which to filter. |
expression | scalar | ✔️ | The scalar or literal expression for which to search. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where (State !~ "texas") and (event_count > 3000)
| project State, event_count
Output
State | event_count |
---|---|
KANSAS | 3,166 |
13.7.4 - The case-insensitive !contains string operator
Filters a record set for data that doesn’t include a case-sensitive string. !contains
searches for characters rather than terms of three or more characters. The query scans the values in the column, which is slower than looking up a term in a term index.
Performance tips
When possible, use the case-sensitive !contains_cs.
Use !has
if you’re looking for a term.
Syntax
Case insensitive syntax
T |
where
Column !contains
(
Expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
Column | string | ✔️ | The column by which to filter. |
Expression | scalar | ✔️ | The scalar or literal expression for which to search. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where State !contains "kan"
| where event_count > 3000
| project State, event_count
Output
State | event_count |
---|---|
TEXAS | 4701 |
13.7.5 - The case-insensitive !endswith string operator
Filters a record set for data that excludes a case-insensitive ending string.
Performance tips
When possible, use the case-sensitive !endswith_cs.
Syntax
T |
where
col !endswith
(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
col | string | ✔️ | The column to filter. |
expression | string | ✔️ | The expression used to filter. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize Events=count() by State
| where State !endswith "is"
| where Events > 2000
| project State, Events
Output
State | Events |
---|---|
TEXAS | 4701 |
KANSAS | 3166 |
IOWA | 2337 |
MISSOURI | 2016 |
13.7.6 - The case-insensitive !has string operators
Filters a record set for data that doesn’t have a matching case-insensitive string. !has
searches for indexed terms, where an indexed term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.
Performance tips
When possible, use the case-sensitive !has_cs.
Syntax
T |
where
column !has
(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
column | string | ✔️ | The column by which to filter. |
expression | scalar | ✔️ | The scalar or literal expression for which to search. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where State !has "NEW"
| where event_count > 3000
| project State, event_count
Output
State | event_count |
---|---|
TEXAS | 4,701 |
KANSAS | 3,166 |
13.7.7 - The case-insensitive !hasprefix string operator
Filters a record set for data that doesn’t include a case-insensitive starting string.
For best performance, use strings of three characters or more. !hasprefix
searches for indexed terms, where an indexed term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.
Performance tips
When possible, use the case-sensitive !hasprefix_cs.
Syntax
T |
where
Column !hasprefix
(
Expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
Column | string | ✔️ | The column used to filter. |
Expression | string | ✔️ | The expression for which to search. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where State !hasprefix "N"
| where event_count > 2000
| project State, event_count
State | event_count |
---|---|
TEXAS | 4701 |
KANSAS | 3166 |
IOWA | 2337 |
ILLINOIS | 2022 |
MISSOURI | 2016 |
13.7.8 - The case-insensitive !hassuffix string operator
Filters a record set for data that doesn’t have a case-insensitive ending string. !hassuffix
returns true
if there’s no term inside string column ending with the specified string expression.
Performance tips
When possible, use !hassuffix_cs - a case-sensitive version of the operator.
Syntax
T |
where
column !hassuffix
(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
column | string | ✔️ | The column by which to filter. |
expression | scalar | ✔️ | The scalar or literal expression for which to search. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where State !hassuffix "A"
| where event_count > 2000
| project State, event_count
Output
State | event_count |
---|---|
TEXAS | 4701 |
KANSAS | 3166 |
ILLINOIS | 2022 |
MISSOURI | 2016 |
13.7.9 - The case-insensitive !in~ string operator
Filters a record set for data without a case-insensitive string.
Performance tips
When possible, use the case-sensitive !in.
Syntax
T |
where
col !in~
(
expression,
… )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to filter. |
col | string | ✔️ | The column by which to filter. |
expression | scalar or tabular | ✔️ | An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search will consider up to 1,000,000 distinct values. |
Returns
Rows in T for which the predicate is true
.
Example
List of scalars
The following query shows how to use !in~
with a comma-separated list of scalar values.
StormEvents
| where State !in~ ("Florida", "Georgia", "New York")
| count
Output
Count |
---|
54,291 |
Dynamic array
The following query shows how to use !in~
with a dynamic array.
StormEvents
| where State !in~ (dynamic(["Florida", "Georgia", "New York"]))
| count
Output
Count |
---|
54291 |
The same query can also be written with a let statement.
let states = dynamic(["Florida", "Georgia", "New York"]);
StormEvents
| where State !in~ (states)
| summarize count() by State
Output
Count |
---|
54291 |
Tabular expression
The following query shows how to use !in~
with an inline tabular expression. Notice that an inline tabular expression must be enclosed with double parentheses.
StormEvents
| where State !in~ (PopulationData | where Population > 5000000 | project State)
| summarize count() by State
Output
State | count_ |
---|---|
KANSAS | 3166 |
IOWA | 2337 |
NEBRASKA | 1766 |
OKLAHOMA | 1716 |
SOUTH DAKOTA | 1567 |
… | … |
The same query can also be written with a let statement. Notice that the double parentheses as provided in the last example aren’t necessary in this case.
let large_states = PopulationData | where Population > 5000000 | project State;
StormEvents
| where State !in~ (large_states)
| summarize count() by State
Output
State | count_ |
---|---|
KANSAS | 3166 |
IOWA | 2337 |
NEBRASKA | 1766 |
OKLAHOMA | 1716 |
SOUTH DAKOTA | 1567 |
… | … |
13.7.10 - The case-insensitive !startswith string operators
Filters a record set for data that doesn’t start with a case-insensitive search string.
Performance tips
When possible, use the case-sensitive !startswith_cs.
Syntax
T |
where
column !startswith
(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
column | string | ✔️ | The column by which to filter. |
expression | scalar | ✔️ | The scalar or literal expression for which to search. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where State !startswith "i"
| where event_count > 2000
| project State, event_count
Output
State | event_count |
---|---|
TEXAS | 4701 |
KANSAS | 3166 |
MISSOURI | 2016 |
13.7.11 - The case-insensitive =~ (equals) string operator
Filters a record set for data with a case-insensitive string.
The following table provides a comparison of the ==
(equals) operators:
Operator | Description | Case-Sensitive | Example (yields true ) |
---|---|---|---|
== | Equals | Yes | "aBc" == "aBc" |
!= | Not equals | Yes | "abc" != "ABC" |
=~ | Equals | No | "abc" =~ "ABC" |
!~ | Not equals | No | "aBc" !~ "xyz" |
For more information about other operators and to determine which operator is most appropriate for your query, see datatype string operators.
Performance tips
When possible, use == - a case-sensitive version of the operator.
Syntax
T |
where
col =~
(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
col | string | ✔️ | The column to filter. |
expression | string | ✔️ | The expression used to filter. |
Returns
Rows in T for which the predicate is true
.
Example
The State
values in the StormEvents
table are capitalized. The following query matches
columns with the value “KANSAS”.
StormEvents
| where State =~ "kansas"
| project EventId, State
The following table only shows the first 10 results. To see the full output, run the query.
EventId | State |
---|---|
70787 | KANSAS |
43450 | KANSAS |
43451 | KANSAS |
38844 | KANSAS |
18463 | KANSAS |
18464 | KANSAS |
18495 | KANSAS |
43466 | KANSAS |
43467 | KANSAS |
43470 | KANSAS |
13.7.12 - The case-insensitive contains string operator
Filters a record set for data containing a case-insensitive string. contains
searches for arbitrary sub-strings rather than terms.
Performance tips
When possible, use contains_cs - a case-sensitive version of the operator.
If you’re looking for a term, use has
for faster results.
Syntax
T |
where
col contains_cs
(
string)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
col | string | ✔️ | The name of the column to check for string. |
string | string | ✔️ | The case-sensitive string by which to filter the data. |
Returns
Rows in T for which string is in col.
Example
StormEvents
| summarize event_count=count() by State
| where State contains "enn"
| where event_count > 10
| project State, event_count
| render table
Output
State | event_count |
---|---|
PENNSYLVANIA | 1687 |
TENNESSEE | 1125 |
13.7.13 - The case-insensitive endswith string operator
Filters a record set for data with a case-insensitive ending string.
Performance tips
For faster results, use the case-sensitive version of an operator. For example, use endswith_cs
instead of endswith
.
Syntax
T |
where
col endswith
(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
col | string | ✔️ | The column to filter. |
expression | string | ✔️ | The expression used to filter. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize Events=count() by State
| where State endswith "sas"
| where Events > 10
| project State, Events
Output
State | Events |
---|---|
KANSAS | 3166 |
ARKANSAS | 1028 |
13.7.14 - The case-insensitive has string operator
Filters a record set for data with a case-insensitive string. has
searches for indexed terms, where an indexed term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.
Performance tips
When possible, use the case-sensitive has_cs.
Syntax
T |
where
Column has
(
Expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
Column | string | ✔️ | The column used to filter the records. |
Expression | scalar or tabular | ✔️ | An expression for which to search. If the value is a tabular expression and has multiple columns, the first column is used. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where State has "New"
| where event_count > 10
| project State, event_count
Output
State | event_count |
---|---|
NEW YORK | 1,750 |
NEW JERSEY | 1,044 |
NEW MEXICO | 527 |
NEW HAMPSHIRE | 394 |
13.7.15 - The case-insensitive has_all string operator
Filters a record set for data with one or more case-insensitive search strings. has_all
searches for indexed terms, where an indexed term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.
For more information about other operators and to determine which operator is most appropriate for your query, see datatype string operators.
Syntax
T |
where
col has_all
(
expression,
… )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to filter. |
col | string | ✔️ | The column by which to filter. |
expression | scalar or tabular | ✔️ | An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search will consider up to 256 distinct values. |
Returns
Rows in T for which the predicate is true
.
Examples
Set of scalars
The following query shows how to use has_all
with a comma-separated set of scalar values.
StormEvents
| where EpisodeNarrative has_all ("cold", "strong", "afternoon", "hail")
| summarize Count=count() by EventType
| top 3 by Count
Output
EventType | Count |
---|---|
Thunderstorm Wind | 517 |
Hail | 392 |
Flash Flood | 24 |
Dynamic array
The same result can be achieved using a dynamic array notation.
StormEvents
| where EpisodeNarrative has_all (dynamic(["cold", "strong", "afternoon", "hail"]))
| summarize Count=count() by EventType
| top 3 by Count
Output
EventType | Count |
---|---|
Thunderstorm Wind | 517 |
Hail | 392 |
Flash Flood | 24 |
The same query can also be written with a let statement.
let criteria = dynamic(["cold", "strong", "afternoon", "hail"]);
StormEvents
| where EpisodeNarrative has_all (criteria)
| summarize Count=count() by EventType
| top 3 by Count
EventType | Count |
---|---|
Thunderstorm Wind | 517 |
Hail | 392 |
Flash Flood | 24 |
13.7.16 - The case-insensitive has_any string operator
Filters a record set for data with any set of case-insensitive strings. has_any
searches for indexed terms, where an indexed term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.
For more information about other operators and to determine which operator is most appropriate for your query, see datatype string operators.
Performance tips
Syntax
T |
where
col has_any
(
expression,
… )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to filter. |
col | string | ✔️ | The column by which to filter. |
expression | scalar or tabular | ✔️ | An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search will consider up to 10,000 distinct values. |
Returns
Rows in T for which the predicate is true
.
Examples
List of scalars
The following query shows how to use has_any
with a comma-separated list of scalar values.
StormEvents
| where State has_any ("CAROLINA", "DAKOTA", "NEW")
| summarize count() by State
Output
State | count_ |
---|---|
NEW YORK | 1750 |
NORTH CAROLINA | 1721 |
SOUTH DAKOTA | 1567 |
NEW JERSEY | 1044 |
SOUTH CAROLINA | 915 |
NORTH DAKOTA | 905 |
NEW MEXICO | 527 |
NEW HAMPSHIRE | 394 |
Dynamic array
The following query shows how to use has_any
with a dynamic array.
StormEvents
| where State has_any (dynamic(['south', 'north']))
| summarize count() by State
Output
State | count_ |
---|---|
NORTH CAROLINA | 1721 |
SOUTH DAKOTA | 1567 |
SOUTH CAROLINA | 915 |
NORTH DAKOTA | 905 |
ATLANTIC SOUTH | 193 |
ATLANTIC NORTH | 188 |
The same query can also be written with a let statement.
let areas = dynamic(['south', 'north']);
StormEvents
| where State has_any (areas)
| summarize count() by State
Output
State | count_ |
---|---|
NORTH CAROLINA | 1721 |
SOUTH DAKOTA | 1567 |
SOUTH CAROLINA | 915 |
NORTH DAKOTA | 905 |
ATLANTIC SOUTH | 193 |
ATLANTIC NORTH | 188 |
Tabular expression
The following query shows how to use has_any
with an inline tabular expression. Notice that an inline tabular expression must be enclosed with double parentheses.
StormEvents
| where State has_any ((PopulationData | where Population > 5000000 | project State))
| summarize count() by State
Output
State | count_ |
---|---|
TEXAS | 4701 |
ILLINOIS | 2022 |
MISSOURI | 2016 |
GEORGIA | 1983 |
MINNESOTA | 1881 |
… | … |
The same query can also be written with a let statement. Notice that the double parentheses as provided in the last example aren’t necessary in this case.
let large_states = PopulationData | where Population > 5000000 | project State;
StormEvents
| where State has_any (large_states)
| summarize count() by State
Output
State | count_ |
---|---|
TEXAS | 4701 |
ILLINOIS | 2022 |
MISSOURI | 2016 |
GEORGIA | 1983 |
MINNESOTA | 1881 |
… | … |
|…|…|
13.7.17 - The case-insensitive hasprefix string operator
Filters a record set for data with a case-insensitive starting string.
For best performance, use strings of three characters or more. hasprefix
searches for indexed terms, where a term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.
Performance tips
When possible, use the case-sensitive hasprefix_cs.
Syntax
T |
where
Column hasprefix
(
Expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
Column | string | ✔️ | The column used to filter. |
Expression | string | ✔️ | The expression for which to search. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where State hasprefix "la"
| project State, event_count
State | event_count |
---|---|
LAKE MICHIGAN | 182 |
LAKE HURON | 63 |
LAKE SUPERIOR | 34 |
LAKE ST CLAIR | 32 |
LAKE ERIE | 27 |
LAKE ONTARIO | 8 |
13.7.18 - The case-insensitive hassuffix string operator
Filters a record set for data with a case-insensitive ending string. hassuffix
returns true
if there is a term inside the filtered string column ending with the specified string expression.
Performance tips
When possible, use the case-sensitive hassuffix_cs.
Syntax
T |
where
Column hassuffix
(
Expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | The tabular input whose records are to be filtered. | |
Column | string | The column by which to filter. | |
Expression | scalar | The scalar or literal expression for which to search. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where State hassuffix "o"
| project State, event_count
Output
State | event_count |
---|---|
COLORADO | 1654 |
OHIO | 1233 |
GULF OF MEXICO | 577 |
NEW MEXICO | 527 |
IDAHO | 247 |
PUERTO RICO | 192 |
LAKE ONTARIO | 8 |
13.7.19 - The case-insensitive in~ string operator
Filters a record set for data with a case-insensitive string.
Performance tips
When possible, use the case-sensitive in.
Syntax
T |
where
col in~
(
expression,
… )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to filter. |
col | string | ✔️ | The column by which to filter. |
expression | scalar or tabular | ✔️ | An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search will consider up to 1,000,000 distinct values. |
Returns
Rows in T for which the predicate is true
.
Examples
List of scalars
The following query shows how to use in~
with a comma-separated list of scalar values.
StormEvents
| where State in~ ("FLORIDA", "georgia", "NEW YORK")
| count
Output
Count |
---|
4775 |
Dynamic array
The following query shows how to use in~
with a dynamic array.
StormEvents
| where State in~ (dynamic(["FLORIDA", "georgia", "NEW YORK"]))
| count
Output
Count |
---|
4775 |
The same query can also be written with a let statement.
let states = dynamic(["FLORIDA", "georgia", "NEW YORK"]);
StormEvents
| where State has_any (states)
| summarize count() by State
Output
Count |
---|
4775 |
Tabular expression
The following query shows how to use in~
with an inline tabular expression. Notice that an inline tabular expression must be enclosed with double parentheses.
StormEvents
| where State in~ (PopulationData | where Population > 5000000 | project State)
| summarize count() by State
Output
State | count_ |
---|---|
TEXAS | 4701 |
ILLINOIS | 2022 |
MISSOURI | 2016 |
GEORGIA | 1983 |
MINNESOTA | 1881 |
… | … |
The same query can also be written with a let statement. Notice that the double parentheses as provided in the last example aren’t necessary in this case.
let large_states = PopulationData | where Population > 5000000 | project State;
StormEvents
| where State in~ (large_states)
| summarize count() by State
Output
State | count_ |
---|---|
TEXAS | 4701 |
ILLINOIS | 2022 |
MISSOURI | 2016 |
GEORGIA | 1983 |
MINNESOTA | 1881 |
… | … |
13.7.20 - The case-insensitive startswith string operator
Filters a record set for data with a case-insensitive string starting sequence.
Performance tips
When possible, use the case-sensitive startswith_cs.
Syntax
T |
where
col startswith
(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to filter. |
col | string | ✔️ | The column used to filter. |
expression | string | ✔️ | The expression by which to filter. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where State startswith "Lo"
| where event_count > 10
| project State, event_count
Output
State | event_count |
---|---|
LOUISIANA | 463 |
13.7.21 - The case-sensitive != (not equals) string operator
Filters a record set for data that doesn’t match a case-sensitive string.
The following table provides a comparison of the ==
(equals) operators:
Operator | Description | Case-Sensitive | Example (yields true ) |
---|---|---|---|
== | Equals | Yes | "aBc" == "aBc" |
!= | Not equals | Yes | "abc" != "ABC" |
=~ | Equals | No | "abc" =~ "ABC" |
!~ | Not equals | No | "aBc" !~ "xyz" |
For more information about other operators and to determine which operator is most appropriate for your query, see datatype string operators.
Performance tips
Syntax
T |
where
column !=
(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
column | string | ✔️ | The column by which to filter. |
expression | scalar | ✔️ | The scalar or literal expression for which to search. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where (State != "FLORIDA") and (event_count > 4000)
| project State, event_count
Output
State | event_count |
---|---|
TEXAS | 4,701 |
13.7.22 - The case-sensitive !contains_cs string operator
Filters a record set for data that doesn’t include a case-sensitive string. !contains_cs
searches for characters rather than terms of three or more characters. The query scans the values in the column, which is slower than looking up a term in a term index.
Performance tips
If you’re looking for a term, use !has_cs
for faster results.
Syntax
Case-sensitive syntax
T |
where
Column !contains_cs
(
Expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
Column | string | ✔️ | The column by which to filter. |
Expression | scalar | ✔️ | The scalar or literal expression for which to search. |
Returns
Rows in T for which the predicate is true
.
Examples
StormEvents
| summarize event_count=count() by State
| where State !contains_cs "AS"
| count
Output
Count |
---|
59 |
StormEvents
| summarize event_count=count() by State
| where State !contains_cs "TEX"
| where event_count > 3000
| project State, event_count
Output
State | event_count |
---|---|
KANSAS | 3,166 |
13.7.23 - The case-sensitive !endswith_cs string operator
Filters a record set for data that doesn’t contain a case-insensitive ending string.
Performance tips
Syntax
T |
where
col !endswith_cs
(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
col | string | ✔️ | The column to filter. |
expression | string | ✔️ | The expression used to filter. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize Events=count() by State
| where State !endswith_cs "A"
The following table only shows the first 10 results. To see the full output, run the query.
State | Events |
---|---|
TEXAS | 4701 |
KANSAS | 3166 |
ILLINOIS | 2022 |
MISSOURI | 2016 |
WISCONSIN | 1850 |
NEW YORK | 1750 |
COLORADO | 1654 |
MICHIGAN | 1637 |
KENTUCKY | 1391 |
OHIO | 1233 |
13.7.24 - The case-sensitive !has_cs string operator
Filters a record set for data that doesn’t have a matching case-sensitive string. !has_cs
searches for indexed terms, where an indexed term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.
Performance tips
Syntax
T |
where
column !has_cs
(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
column | string | ✔️ | The column by which to filter. |
expression | scalar | ✔️ | The scalar or literal expression for which to search. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where State !has_cs "new"
| count
Output
Count |
---|
67 |
13.7.25 - The case-sensitive !hasprefix_cs string operator
Filters a record set for data that doesn’t have a case-sensitive starting string. !hasprefix_cs
searches for indexed terms, where an indexed term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.
Operator | Description | Case-Sensitive | Example (yields true ) |
---|---|---|---|
hasprefix | RHS is a term prefix in LHS | No | "North America" hasprefix "ame" |
!hasprefix | RHS isn’t a term prefix in LHS | No | "North America" !hasprefix "mer" |
hasprefix_cs | RHS is a term prefix in LHS | Yes | "North America" hasprefix_cs "Ame" |
!hasprefix_cs | RHS isn’t a term prefix in LHS | Yes | "North America" !hasprefix_cs "CA" |
For more information about other operators and to determine which operator is most appropriate for your query, see datatype string operators.
Performance tips
Syntax
T |
where
column !hasprefix_cs
(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
column | string | ✔️ | The column by which to filter. |
expression | scalar | ✔️ | The scalar or literal expression for which to search. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where State !hasprefix_cs "P"
| count
Output
Count |
---|
64 |
13.7.26 - The case-sensitive !hassuffix_cs string operator
Filters a record set for data that doesn’t have a case-sensitive ending string. !hassuffix_cs
returns true
if there is no term inside string column ending with the specified string expression.
Performance tips
Syntax
T |
where
column !hassuffix_cs
(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
column | string | ✔️ | The column by which to filter. |
expression | scalar | ✔️ | The scalar or literal expression for which to search. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where State !hassuffix_cs "AS"
| where event_count > 2000
| project State, event_count
Output
State | event_count |
---|---|
IOWA | 2337 |
ILLINOIS | 2022 |
MISSOURI | 2016 |
13.7.27 - The case-sensitive !in string operator
Filters a record set for data without a case-sensitive string.
Performance tips
Syntax
T |
where
col !in
(
expression,
… )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to filter. |
col | string | ✔️ | The column by which to filter. |
expression | scalar or tabular | ✔️ | An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search will consider up to 1,000,000 distinct values. |
Returns
Rows in T for which the predicate is true
.
Example
List of scalars
The following query shows how to use !in
with a comma-separated list of scalar values.
StormEvents
| where State !in ("FLORIDA", "GEORGIA", "NEW YORK")
| count
Output
Count |
---|
54291 |
Dynamic array
The following query shows how to use !in
with a dynamic array.
StormEvents
| where State !in (dynamic(["FLORIDA", "GEORGIA", "NEW YORK"]))
| count
Output
Count |
---|
54291 |
The same query can also be written with a let statement.
let states = dynamic(["FLORIDA", "GEORGIA", "NEW YORK"]);
StormEvents
| where State !in (states)
| summarize count() by State
Output
Count |
---|
54291 |
Tabular expression
The following query shows how to use !in
with an inline tabular expression. Notice that an inline tabular expression must be enclosed with double parentheses.
StormEvents
| where State !in (PopulationData | where Population > 5000000 | project State)
| summarize count() by State
Output
State | Count |
---|---|
KANSAS | 3166 |
IOWA | 2337 |
NEBRASKA | 1766 |
OKLAHOMA | 1716 |
SOUTH DAKOTA | 1567 |
… | … |
The same query can also be written with a let statement. Notice that the double parentheses as provided in the last example aren’t necessary in this case.
let large_states = PopulationData | where Population > 5000000 | project State;
StormEvents
| where State !in (large_states)
| summarize count() by State
Output
State | Count |
---|---|
KANSAS | 3166 |
IOWA | 2337 |
NEBRASKA | 1766 |
OKLAHOMA | 1716 |
SOUTH DAKOTA | 1567 |
… | … |
13.7.28 - The case-sensitive !startswith_cs string operator
Filters a record set for data that doesn’t start with a case-sensitive search string.
Performance tips
Syntax
T |
where
column !startswith_cs
(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
column | string | ✔️ | The column by which to filter. |
expression | scalar | ✔️ | The scalar or literal expression for which to search. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where State !startswith_cs "I"
| where event_count > 2000
| project State, event_count
Output
State | event_count |
---|---|
TEXAS | 4701 |
KANSAS | 3166 |
MISSOURI | 2016 |
13.7.29 - The case-sensitive == (equals) string operator
Filters a record set for data matching a case-sensitive string.
The following table provides a comparison of the ==
operators:
Operator | Description | Case-Sensitive | Example (yields true ) |
---|---|---|---|
== | Equals | Yes | "aBc" == "aBc" |
!= | Not equals | Yes | "abc" != "ABC" |
=~ | Equals | No | "abc" =~ "ABC" |
!~ | Not equals | No | "aBc" !~ "xyz" |
For more information about other operators and to determine which operator is most appropriate for your query, see datatype string operators.
Performance tips
Syntax
T |
where
col ==
(
expression,
… )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
col | string | ✔️ | The column to filter. |
expression | string | ✔️ | The expression used to filter. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| where State == "kansas"
| count
Count |
---|
0 |
StormEvents
| where State == "KANSAS"
| count
Count |
---|
3,166 |
13.7.30 - The case-sensitive contains_cs string operator
Filters a record set for data containing a case-sensitive string. contains_cs
searches for arbitrary sub-strings rather than terms.
Performance tips
If you’re looking for a term, use has_cs
for faster results.
Syntax
T |
where
col contains_cs
(
string)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
col | string | ✔️ | The name of the column to check for string. |
string | string | ✔️ | The case-sensitive string by which to filter the data. |
Returns
Rows in T for which string is in col.
Example
StormEvents
| summarize event_count=count() by State
| where State contains_cs "AS"
Output
Count |
---|
8 |
13.7.31 - The case-sensitive endswith_cs string operator
Filters a record set for data with a case-sensitive ending string.
Performance tips
Syntax
T |
where
col endswith_cs
(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
col | string | ✔️ | The column to filter. |
expression | string | ✔️ | The expression used to filter. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize Events = count() by State
| where State endswith_cs "NA"
Output
State | Events |
---|---|
NORTH CAROLINA | 1721 |
MONTANA | 1230 |
INDIANA | 1164 |
SOUTH CAROLINA | 915 |
LOUISIANA | 463 |
ARIZONA | 340 |
13.7.32 - The case-sensitive has_cs string operator
Filters a record set for data with a case-sensitive search string. has_cs
searches for indexed terms, where an indexed term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.
Performance tips
Syntax
T |
where
Column has_cs
(
Expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
Column | string | ✔️ | The column used to filter the records. |
Expression | scalar or tabular | ✔️ | An expression for which to search. If the value is a tabular expression and has multiple columns, the first column is used. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where State has_cs "FLORIDA"
Output
State | event_count |
---|---|
FLORIDA | 1042 |
Since all State
values are capitalized, searching for a lowercase string with the same value, such as “florida”, won’t yield any results.
StormEvents
| summarize event_count=count() by State
| where State has_cs "florida"
Output
State | event_count |
---|---|
13.7.33 - The case-sensitive hasprefix_cs string operator
Filters a record set for data with a case-sensitive starting string.
For best performance, use strings of three characters or more. hasprefix_cs
searches for indexed terms, where a term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.
Performance tips
Syntax
T |
where
Column hasprefix_cs
(
Expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
Column | string | ✔️ | The column used to filter. |
Expression | string | ✔️ | The expression for which to search. |
Returns
Rows in T for which the predicate is true
.
Examples
StormEvents
| summarize event_count=count() by State
| where State hasprefix_cs "P"
| count
Count |
---|
3 |
StormEvents
| summarize event_count=count() by State
| where State hasprefix_cs "P"
| project State, event_count
State | event_count |
---|---|
PENNSYLVANIA | 1687 |
PUERTO RICO | 192 |
E PACIFIC | 10 |
13.7.34 - The case-sensitive hassuffix_cs string operator
Filters a record set for data with a case-insensitive ending string. hassuffix_cs
returns true
if there is a term inside the filtered string column ending with the specified string expression.
Performance tips
Syntax
T |
where
column hassuffix_cs
(
expression )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be filtered. |
column | string | ✔️ | The column by which to filter. |
expression | scalar | ✔️ | The scalar or literal expression for which to search. |
Returns
Rows in T for which the predicate is true
.
Examples
StormEvents
| summarize event_count=count() by State
| where State hassuffix_cs "AS"
| where event_count > 2000
| project State, event_count
Output
State | event_count |
---|---|
TEXAS | 4701 |
KANSAS | 3166 |
13.7.35 - The case-sensitive in string operator
Filters a record set for data with a case-sensitive string.
Performance tips
Syntax
T |
where
col in
(
expression,
… )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to filter. |
col | string | ✔️ | The column by which to filter. |
expression | scalar or tabular | ✔️ | An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search considers up to 1,000,000 distinct values. |
Returns
Rows in T for which the predicate is true
.
Examples
List of scalars
The following query shows how to use in
with a list of scalar values.
StormEvents
| where State in ("FLORIDA", "GEORGIA", "NEW YORK")
| count
Output
Count |
---|
4775 |
Dynamic array
The following query shows how to use in
with a dynamic array.
let states = dynamic(['FLORIDA', 'ATLANTIC SOUTH', 'GEORGIA']);
StormEvents
| where State in (states)
| count
Output
Count |
---|
3218 |
Tabular expression
The following query shows how to use in
with a tabular expression.
let Top_5_States =
StormEvents
| summarize count() by State
| top 5 by count_;
StormEvents
| where State in (Top_5_States)
| count
The same query can be written with an inline tabular expression statement.
StormEvents
| where State in (
StormEvents
| summarize count() by State
| top 5 by count_
)
| count
Output
Count |
---|
14242 |
Top with other example
The following example identifies the top five states with lightning events and uses the iff()
function and in
operator to classify lightning events by the top five states, labeled by state name, and all others labeled as “Other.”
let Lightning_By_State = materialize(StormEvents
| summarize lightning_events = countif(EventType == 'Lightning') by State);
let Top_5_States = Lightning_By_State | top 5 by lightning_events | project State;
Lightning_By_State
| extend State = iff(State in (Top_5_States), State, "Other")
| summarize sum(lightning_events) by State
Output
State | sum_lightning_events |
---|---|
ALABAMA | 29 |
WISCONSIN | 31 |
TEXAS | 55 |
FLORIDA | 85 |
GEORGIA | 106 |
Other | 415 |
Use a static list returned by a function
The following example counts events from the StormEvents
table based on a predefined list of interesting states. The interesting states are defined by the InterestingStates()
function.
StormEvents
| where State in (InterestingStates())
| count
Output
Count |
---|
4775 |
The following query displays which states are considered interesting by the InterestingStates()
function.
.show function InterestingStates
Output
Name | Parameters | Body | Folder | DocString |
---|---|---|---|---|
InterestingStates | () | { dynamic([“WASHINGTON”, “FLORIDA”, “GEORGIA”, “NEW YORK”]) } |
Related content
13.7.36 - The case-sensitive startswith string operator
Filters a record set for data with a case-sensitive string starting sequence.
Performance tips
Syntax
T |
where
col startswith_cs
(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to filter. |
col | string | ✔️ | The column used to filter. |
expression | string | ✔️ | The expression by which to filter. |
Returns
Rows in T for which the predicate is true
.
Example
StormEvents
| summarize event_count=count() by State
| where State startswith_cs "I"
| where event_count > 2000
| project State, event_count
Output
State | event_count |
---|---|
IOWA | 2337 |
ILLINOIS | 2022 |
14 - Special functions
14.1 - cluster()
Changes the reference of the query to a remote cluster. To access a database within the same cluster, use the database() function. For more information, see cross-database and cross-cluster queries.
Changes the reference of the query to a remote Eventhouse. To access a database within the same Eventhouse, use the database() function. For more information, see cross-database and cross-cluster queries.
Syntax
cluster(
name)
Parameters
Name | Type | Required | Description |
---|---|---|---|
name | string | ✔️ | The name of the cluster to reference. The value can be specified as a fully qualified domain name, or the name of the cluster without the .kusto.windows.net suffix. The cluster name is treated as case-insenstive and the recommendation is to provide it lower-case. The value can’t be the result of subquery evaluation. |
Name | Type | Required | Description |
---|---|---|---|
name | string | ✔️ | The full URL of the Eventhouse to reference. The value can be specified as a fully qualified domain name, or the name of the Eventhouse. The Eventhouse name is treated as case-insenstive and the recommendation is to provide it lower-case. The value can’t be the result of subquery evaluation. |
Examples
Use cluster() to access remote cluster
The following query can be run on any cluster.
cluster('help').database('Samples').StormEvents | count
cluster('help.kusto.windows.net').database('Samples').StormEvents | count
Use cluster() to access remote Eventhouse
The following query can be run on any Eventhouse.
cluster('help').database('Samples').StormEvents | count
cluster('help.kusto.windows.net').database('Samples').StormEvents | count
Output
Count |
---|
59066 |
Use cluster() inside let statements
The previous query can be rewritten to use a query-defined function (let
statement) that takes a parameter called clusterName
and passes it to the cluster()
function.
let foo = (clusterName:string)
{
cluster(clusterName).database('Samples').StormEvents | count
};
foo('help')
Output
Count |
---|
59066 |
Use cluster() inside Functions
The same query as above can be rewritten to be used in a function that receives a parameter clusterName
- which is passed into the cluster() function.
.create function foo(clusterName:string)
{
cluster(clusterName).database('Samples').StormEvents | count
};
14.2 - Cross-cluster and cross-database queries
Queries run with a particular database designated as the database in context. This database acts as the default for permission checking. If an entity is referenced in a query without specifying the cluster or database, it’s resolved against this database. Queries run with a particular database designated as the database in context. This database acts as the default for permission checking. If an entity is referenced in a query without specifying the context, it’s resolved against this database.
This article explains how to execute queries that involve entities located outside the current context database.
Prerequisites
- If the clusters are in different tenants, follow the instructions in Allow cross-tenant queries and commands.
Identify the cluster and database in context
Identify the eventhouse and database in context
The following table explains how to identify the database in context by query environment.
Environment | Database in context |
---|---|
Kusto Explorer | The default database is the one selected in the connections panel, and the current cluster is the cluster containing that database. |
Azure Data Explorer web UI | The default database is the one selected in the connection pane, and the current cluster is the cluster containing that database. |
Client libraries | Specify the default database and cluster by the Data Source and Initial Catalog properties of the Kusto connection strings. |
Environment | Database/Eventhouse in context |
---|---|
Kusto Explorer | The default database is the one selected in the connections panel and the current eventhouse is the eventhouse containing that database. |
Real-Time Intelligence KQL queryset | The default database is the current database selected either directly or through an eventhouse. |
Client libraries | Specify the default database with the database URI, used for the Data Source properties of the Kusto connection strings. For the eventhouse, use its cluster URI. You can find it by selecting System Overview in the Eventhouse details section for the selected eventhouse. |
Perform cross-cluster or cross-database queries
Perform cross-eventhouse or cross-database queries
To access entities outside the database in context, use the cluster() and database() functions to qualify the entity name.
For a table in a different database within the same cluster:
database("<DatabaseName>").<TableName>
For a table in a remote cluster:
cluster("<ClusterName>").database("<DatabaseName>").<TableName>
For a table in a different database within the same eventhouse:
database("<DatabaseName>").<TableName>
For a table in a remote eventhouse or remote service (like Azure Data Explorer) cluster:
cluster("<EventhouseClusterURI>").database("<DatabaseName>").<TableName>
Qualified names and the union operator
When a qualified name appears as an operand of the union operator, then wildcards can be used to specify multiple tables and multiple databases. Wildcards aren’t permitted in cluster names.
union withsource=TableName *, database("OtherDb*").*Table, cluster("OtherCluster").database("*").*
When a qualified name appears as an operand of the union operator, then wildcards can be used to specify multiple tables and multiple databases. Wildcards aren’t permitted in eventhouse names.
union withsource=TableName *, database("OtherDb*").*Table, cluster("OtherEventhouseClusterURI").database("*").*
Qualified names and restrict access statements
Qualified names or patterns can also be included in restrict access statement. Wildcards in cluster names aren’t permitted. Wildcards in eventhouse names aren’t permitted.
The following query restricts query access to the following entities:
- Any entity name starting with my… in the default database.
- Any table in all the databases named MyOther… of the current cluster.
- Any table in all the databases named my2… in the cluster OtherCluster.kusto.windows.net.
restrict access to (my*, database("MyOther*").*, cluster("OtherCluster").database("my2*").*);
- Any entity name starting with event… in the default database.
- Any table in all the databases named EventOther… of the current eventhouse.
- Any table in all the databases named event2… in the eventhouse OtherEventhouse.kusto.data.microsoft.com.
restrict access to (event*, database("EventOther*").*, cluster("OtherEventhouseClusterURI").database("event2*").*);
Handle schema changes of remote entities
To process a cross-cluster query, the cluster that performs the initial query interpretation needs to have the schema of the entities referenced on remote clusters. To obtain this information, a command is sent to retrieve the schemas, which are then stored in a cache.
If there’s a schema change in the remote cluster, a cached schema might become outdated. This can lead to undesired effects, including scenarios where new or deleted columns cause a Partial query failure
. To solve such issues, manually refresh the schema with the .clear cache remote-schema command.
To process a cross-eventhouse or eventhouse-to-ADX cluster query, the eventhouse that performs the initial query interpretation needs to have the schema of the entities referenced on remote eventhouses or clusters. To obtain this information, a command is sent to retrieve the schemas, which are then stored in a cache.
If there’s a remote schema change, a cached schema might become outdated. This can lead to undesired effects, including scenarios where new or deleted columns cause a Partial query failure
. To solve such issues, manually refresh the schema with the .clear cache remote-schema command.
Functions and views
Functions and views (persistent and created inline) can reference tables across database and cluster boundaries. The following code is valid.
let MyView = Table1 join database("OtherDb").Table2 on Key | join cluster("OtherCluster").database("SomeDb").Table3 on Key;
MyView | where ...
Persistent functions and views can be accessed from another database in the same cluster.
For example, say you create the following tabular function (view) in a database OtherDb
:
.create function MyView(v:string) { Table1 | where Column1 has v ... }
Then, you create the following scalar function in a database OtherDb
:
.create function MyCalc(a:double, b:double, c:double) { (a + b) / c }
In default database, these entities can be referenced as follows:
database("OtherDb").MyView("exception") | extend CalCol=database("OtherDb").MyCalc(Col1, Col2, Col3) | take 10
Functions and views (persistent and created inline) can reference tables across database and eventhouse boundaries. The following code is valid.
let EventView = Table1 join database("OtherDb").Table2 on Key | join cluster("OtherEventhouseClusterURI").database("SomeDb").Table3 on Key;
EventView | where ...
Persistent functions and views can be accessed from another database in the same eventhouse.
For example, say you create the following tabular function (view) in a database OtherDb
:
.create function EventView(v:string) { Table1 | where Column1 has v ... }
Then, you create the following scalar function in a database OtherDb
:
.create function EventCalc(a:double, b:double, c:double) { (a + b) / c }
For example, say you create the following tabular function (view) in a database OtherDb
:
.create function EventView(v:string) { Table1 | where Column1 has v ... }
Then, you create the following scalar function in a database OtherDb
:
.create function EventCalc(a:double, b:double, c:double) { (a + b) / c }
In default database, these entities can be referenced as follows:
database("OtherDb").EventView("exception") | extend CalCol=database("OtherDb").EventCalc(Col1, Col2, Col3) | take 10
Limitations of cross-cluster function calls
Tabular functions or views can be referenced across clusters. The following limitations apply:
- Remote functions must return tabular schema. Scalar functions can only be accessed in the same cluster.
- Remote functions can accept only scalar arguments. Functions that get one or more table arguments can only be accessed in the same cluster.
- Remote functions’ result schema must be fixed (known in advance without executing parts of the query). So query constructs such as the
pivot
plugin can’t be used. Some plugins, such as thebag_unpack
plugin, support a way to indicate the result schema statically, and in this form it can be used in cross-cluster function calls. - For performance reasons, the calling cluster caches the schema of remote entities after the initial call. Therefore, changes made to the remote entity might result in a mismatch with the cached schema information, potentially leading to query failures. For more information, see Cross-cluster queries and schema changes.
Limitations of cross-eventhouse function calls
Tabular functions or views can be referenced across eventhouses. The following limitations apply:
- Remote functions must return tabular schema. Scalar functions can only be accessed in the same eventhouse.
- Remote functions can accept only scalar arguments. Functions that get one or more table arguments can only be accessed in the same eventhouse.
- Remote functions’ result schema must be fixed (known in advance without executing parts of the query). So query constructs such as the
pivot
plugin can’t be used. Some plugins, such as thebag_unpack
plugin, support a way to indicate the result schema statically, and in this form it can be used in cross-eventhouse function calls. - For performance reasons, the calling eventhouse caches the schema of remote entities after the initial call. Therefore, changes made to the remote entity might result in a mismatch with the cached schema information, potentially leading to query failures. For more information, see Cross-cluster queries and schema changes.
Examples
The following cross-cluster call is valid.
cluster("OtherCluster").database("SomeDb").MyView("exception") | count
The following query calls a remote scalar function MyCalc
.
This call violates rule #1, so it’s not valid.
MyTable | extend CalCol=cluster("OtherCluster").database("OtherDb").MyCalc(Col1, Col2, Col3) | take 10
The following query calls remote function MyCalc
and provides a tabular parameter.
This call violates rule #2, so it’s not valid.
cluster("OtherCluster").database("OtherDb").MyCalc(datatable(x:string, y:string)["x","y"] )
The following cross-eventhouse call is valid.
cluster("OtherEventhouseURI").database("SomeDb").EventView("exception") | count
The following query calls a remote scalar function EventCalc
.
This call violates rule #1, so it’s not valid.
Eventtable | extend CalCol=cluster("OtherEventhouseClusterURI").database("OtherDb").MyCalc(Col1, Col2, Col3) | take 10
The following query calls remote function EventCalc
and provides a tabular parameter.
This call violates rule #2, so it’s not valid.
cluster("EventhouseClusterURI").database("OtherDb").MyCalc(datatable(x:string, y:string)["x","y"] )
The following query calls remote function SomeTable
that has a variable schema output based on the parameter tablename
.
This call violates rule #3, so it’s not valid.
Tabular function in OtherDb
.
.create function SomeTable(tablename:string) { table(tablename) }
In default database.
cluster("OtherCluster").database("OtherDb").SomeTable("MyTable")
cluster("OtherEventhouseClusterURI").database("OtherDb").SomeTable("EventTable")
The following query calls remote function GetDataPivot
that has a variable schema output based on the data (pivot() plugin has dynamic output).
This call violates rule #3, so it’s not valid.
Tabular function in OtherDb
.
.create function GetDataPivot() { T | evaluate pivot(PivotColumn) }
Tabular function in the default database.
cluster("OtherCluster").database("OtherDb").GetDataPivot()
cluster("OtherEventhouseClusterURI").database("OtherDb").GetDataPivot()
Related content
14.3 - database()
Changes the reference of the query to a specific database within the cluster scope.
Changes the reference of the query to a specific database within the Eventhouse scope.
``
Syntax
database(
databaseName)
Parameters
Name | Type | Required | Description |
---|---|---|---|
databaseName | string | The name of the database to reference. The databaseName can be either the DatabaseName or PrettyName . The argument must be a constant value and can’t come from a subquery evaluation. |
Examples
Use database() to access table of other database
database('Samples').StormEvents | count
Output
Count |
---|
59066 |
Use database() inside let statements
The query above can be rewritten as a query-defined function (let statement) that
receives a parameter dbName
- which is passed into the database() function.
let foo = (dbName:string)
{
database(dbName).StormEvents | count
};
foo('help')
Output
Count |
---|
59066 |
Use database() inside stored functions
The same query as above can be rewritten to be used in a function that
receives a parameter dbName
- which is passed into the database() function.
.create function foo(dbName:string)
{
database(dbName).StormEvents | count
};
14.4 - external_table()
References an external table by name.
To accelerate queries over external delta tables, see Query acceleration policy.
Syntax
external_table(
TableName [,
MappingName ] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
TableName | string | ✔️ | The name of the external table being queried. Must reference an external table of kind blob , adl , or sql . |
MappingName | string | A name of a mapping object that maps fields in the external data shards to columns output. |
Authentication and authorization
The authentication method to access an external table is based on the connection string provided during its creation, and the permissions required to access the table vary depending on the authentication method. For more information, see Azure Storage external table or SQL Server external table.
Related content
14.5 - materialize()
Captures the value of a tabular expression for the duration of the query execution so that it can be referenced multiple times by the query without recalculation.
Syntax
materialize(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expression | string | ✔️ | The tabular expression to be evaluated and cached during query execution. |
Remarks
The materialize()
function is useful in the following scenarios:
- To speed up queries that perform heavy calculations whose results are used multiple times in the query.
- To evaluate a tabular expression only once and use it many times in a query. This is commonly required if the tabular expression is non-deterministic. For example, if the expression uses the
rand()
or thedcount()
functions.
Examples of query performance improvement
The following example shows how materialize()
can be used to improve performance of the query.
The expression _detailed_data
is defined using materialize()
function and therefore is calculated only once.
let _detailed_data = materialize(StormEvents | summarize Events=count() by State, EventType);
_detailed_data
| summarize TotalStateEvents=sum(Events) by State
| join (_detailed_data) on State
| extend EventPercentage = Events*100.0 / TotalStateEvents
| project State, EventType, EventPercentage, Events
| top 10 by EventPercentage
Output
State | EventType | EventPercentage | Events |
---|---|---|---|
HAWAII WATERS | Waterspout | 100 | 2 |
LAKE ONTARIO | Marine Thunderstorm Wind | 100 | 8 |
GULF OF ALASKA | Waterspout | 100 | 4 |
ATLANTIC NORTH | Marine Thunderstorm Wind | 95.2127659574468 | 179 |
LAKE ERIE | Marine Thunderstorm Wind | 92.5925925925926 | 25 |
E PACIFIC | Waterspout | 90 | 9 |
LAKE MICHIGAN | Marine Thunderstorm Wind | 85.1648351648352 | 155 |
LAKE HURON | Marine Thunderstorm Wind | 79.3650793650794 | 50 |
GULF OF MEXICO | Marine Thunderstorm Wind | 71.7504332755633 | 414 |
HAWAII | High Surf | 70.0218818380744 | 320 |
The following example generates a set of random numbers and calculates:
- How many distinct values in the set (
Dcount
) - The top three values in the set
- The sum of all these values in the set
This operation can be done using batches and materialize:
let randomSet =
materialize(
range x from 1 to 3000000 step 1
| project value = rand(10000000));
randomSet | summarize Dcount=dcount(value);
randomSet | top 3 by value;
randomSet | summarize Sum=sum(value)
Result set 1:
Dcount |
---|
2578351 |
Result set 2:
value |
---|
9999998 |
9999998 |
9999997 |
Result set 3:
Sum |
---|
15002960543563 |
Examples of using materialize()
To use the let
statement with a value that you use more than once, use the materialize() function. Try to push all possible operators that will reduce the materialized dataset and still keep the semantics of the query. For example, use filters, or project only required columns.
let materializedData = materialize(Table
| where Timestamp > ago(1d));
union (materializedData
| where Text !has "somestring"
| summarize dcount(Resource1)), (materializedData
| where Text !has "somestring"
| summarize dcount(Resource2))
The filter on Text
is mutual and can be pushed to the materialize expression.
The query only needs columns Timestamp
, Text
, Resource1
, and Resource2
. Project these columns inside the materialized expression.
let materializedData = materialize(Table
| where Timestamp > ago(1d)
| where Text !has "somestring"
| project Timestamp, Resource1, Resource2, Text);
union (materializedData
| summarize dcount(Resource1)), (materializedData
| summarize dcount(Resource2))
If the filters aren’t identical, as in the following query:
let materializedData = materialize(Table
| where Timestamp > ago(1d));
union (materializedData
| where Text has "String1"
| summarize dcount(Resource1)), (materializedData
| where Text has "String2"
| summarize dcount(Resource2))
When the combined filter reduces the materialized result drastically, combine both filters on the materialized result by a logical or
expression as in the following query. However, keep the filters in each union leg to preserve the semantics of the query.
let materializedData = materialize(Table
| where Timestamp > ago(1d)
| where Text has "String1" or Text has "String2"
| project Timestamp, Resource1, Resource2, Text);
union (materializedData
| where Text has "String1"
| summarize dcount(Resource1)), (materializedData
| where Text has "String2"
| summarize dcount(Resource2))
14.6 - materialized_view()
References the materialized part of a materialized view.
The materialized_view()
function supports a way of querying the materialized part only of the view, while specifying the max latency the user is willing to tolerate. This option isn’t guaranteed to return the most up-to-date records, but should always be more performant than querying the entire view. This function is useful for scenarios in which you’re willing to sacrifice some freshness for performance, for example in telemetry dashboards.
Syntax
materialized_view(
ViewName,
[ max_age ] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
ViewName | string | ✔️ | The name of the materialized view. |
max_age | timespan | If not provided, only the materialized part of the view is returned. If provided, the function will return the materialized part of the view if last materialization time is greater than @now - max_age . Otherwise, the entire view is returned, which is identical to querying ViewName directly. |
Examples
Query the materialized part of the view only, independent on when it was last materialized.
materialized_view("ViewName")
Query the materialized part only if it was materialized in the last 10 minutes. If the materialized part is older than 10 minutes, return the full view. This option is expected to be less performant than querying the materialized part.
materialized_view("ViewName", 10m)
Notes
- Once a view is created, it can be queried just as any other table in the database, including participate in cross-cluster / cross-database queries.
- Materialized views aren’t included in wildcard unions or searches.
- Syntax for querying the view is the view name (like a table reference).
- Querying the materialized view will always return the most up-to-date results, based on all records ingested to the source table. The query combines the materialized part of the view with all unmaterialized records in the source table. For more information, see how materialized views work for details.
14.7 - Query results cache
Kusto includes a query results cache. You can choose to get cached results when issuing a query. You’ll experience better query performance and lower resource consumption if your query’s results can be returned by the cache. However, this performance comes at the expense of some “staleness” in the results.
Use the cache
Set the query_results_cache_max_age
option as part of the query to use the query results cache. You can set this option in the query text or as a client request property. For example:
set query_results_cache_max_age = time(5m);
GithubEvent
| where CreatedAt > ago(180d)
| summarize arg_max(CreatedAt, Type) by Id
The option value is a timespan
that indicates the maximum “age” of the results cache, measured from the query start time. Beyond the set timespan, the cache entry is obsolete and won’t be used again. Setting a value of 0 is equivalent to not setting the option.
Compatibility between queries
Identical queries
The query results cache returns results only for queries that are considered “identical” to a previous cached query. Two queries are considered identical if all of the following conditions are met:
- The two queries have the same representation (as UTF-8 strings).
- The two queries are made to the same database.
- The two queries share the same client request properties. The following properties are ignored for caching purposes:
- ClientRequestId
- Application
- User
Incompatible queries
The query results won’t be cached if any of the following conditions is true:
- The query references a table that has the RestrictedViewAccess policy enabled.
- The query references a table that has the RowLevelSecurity policy enabled.
- The query uses any of the following functions:
- The query accesses an external table or an external data.
- The query uses the evaluate plugin operator.
No valid cache entry
If a cached result satisfying the time constraints couldn’t be found, or there isn’t a cached result from an “identical” query in the cache, the query will be executed and its results cached, as long as:
- The query execution completes successfully, and
- The query results size doesn’t exceed 16 MB.
Results from the cache
How does the service indicate that the query results are being served from the cache?
When responding to a query, Kusto sends another ExtendedProperties response table that includes a Key
column and a Value
column.
Cached query results will have another row appended to that table:
- The row’s
Key
column will contain the stringServerCache
- The row’s
Value
column will contain a property bag with two fields:OriginalClientRequestId
- Specifies the original request’s ClientRequestId.OriginalStartedOn
- Specifies the original request’s execution start time.
Query consistency
Queries using weak consistency can be processed on different cluster nodes. The cache isn’t shared by cluster nodes, every node has a dedicated cache in its own private storage. Therefore, if two identical queries land on different nodes, the query will be executed and cached on both nodes. By setting query consistency to affinitizedweakconsistency
, you can ensure that weak consistency queries that are identical land on the same query head, and thus increase the cache hit rate. This is not relevant when using strong consistency.
Management
The following management and observability commands are supported:
- Show query results cache: Returns statistics related to the query results cache.
- Clear query results cache: Clears query results cache.
- Refresh query cache entry: a specific query cache entry can be refreshed using
query_results_cache_force_refresh
(OptionQueryResultsCacheForceRefresh)client request property. When set totrue
, this command will force query results cache to be refreshed also when an existing cache is present. This process is useful in scenarios that require queries results to be available for querying. This property must be used in combination with ‘query_results_cache_max_age’, and sent via ClientRequestProperties object. The property can’t be part of a ‘set’ statement.
Capacity
The cache capacity is currently fixed at 1 GB per cluster node. The eviction policy is LRU.
Shard level query results cache
You can use shard-level query results cache for scenarios that require the most up-to-date results, such as a live dashboard. For example, a query that runs every 10 seconds and spans the last 1 hour can benefit from caching intermediate query results at the storage (shard) level.
The shard level query results cache is automatically enabled when the Query results cache
is in use. Because it shares the same cache as Query results cache
, the same capacity and eviction policies apply.
Syntax
set
query_results_cache_per_shard
; Query
Example
set query_results_cache_per_shard;
GithubEvent
| where CreatedAt > ago(180d)
| summarize arg_max(CreatedAt, Type) by Id
14.8 - stored_query_result()
stored_query_result()
function to reference a stored query result.Retrieves a previously created stored query result.
To set a stored query result, see .set stored_query_result command.
Syntax
stored_query_result(
StoredQueryResultName )
Parameters
Name | Type | Required | Description |
---|---|---|---|
StoredQueryResultName | string | ✔️ | The name of the stored query result. |
Examples
References the stored query result named Numbers
.
stored_query_result("Numbers")
Output
X |
---|
1 |
2 |
3 |
… |
Pagination
The following example retrieves clicks by Ad network and day, for the last seven days:
.set stored_query_result DailyClicksByAdNetwork7Days with (previewCount = 100) <|
Events
| where Timestamp > ago(7d)
| where EventType == 'click'
| summarize Count=count() by Day=bin(Timestamp, 1d), AdNetwork
| order by Count desc
| project Num=row_number(), Day, AdNetwork, Count
Output
Num | Day | AdNetwork | Count |
---|---|---|---|
1 | 2020-01-01 00:00:00.0000000 | NeoAds | 1002 |
2 | 2020-01-01 00:00:00.0000000 | HighHorizons | 543 |
3 | 2020-01-01 00:00:00.0000000 | PieAds | 379 |
… | … | … | … |
Retrieve the next page:
stored_query_result("DailyClicksByAdNetwork7Days")
| where Num between(100 .. 200)
Output
Num | Day | AdNetwork | Count |
---|---|---|---|
100 | 2020-01-01 00:00:00.0000000 | CoolAds | 301 |
101 | 2020-01-01 00:00:00.0000000 | DreamAds | 254 |
102 | 2020-01-02 00:00:00.0000000 | SuperAds | 123 |
… | … | … | … |
Related content
14.9 - table()
The table() function references a table by providing its name as an expression of type string
.
Syntax
table(
TableName [,
DataScope] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
TableName | string | ✔️ | The name of the table being referenced. The value of this expression must be constant at the point of call to the function, meaning it cannot vary by the data context. |
DataScope | string | Used to restrict the table reference to data according to how this data falls under the table’s effective cache policy. If used, the actual argument must be one of the Valid data scope values. |
Valid data scope values
Value | Description |
---|---|
hotcache | Only data that is categorized as hot cache will be referenced. |
all | All the data in the table will be referenced. |
default | The default is all , except if it has been set to hotcache by the cluster admin. |
Returns
table(T)
returns:
- Data from table T if a table named T exists.
- Data returned by function T if a table named T doesn’t exist but a function named T exists. Function T must take no arguments and must return a tabular result.
- A semantic error is raised if there’s no table named T and no function named T.
Examples
Use table() to access table of the current database
table('StormEvents') | count
Output
Count |
---|
59066 |
Use table() inside let statements
The query above can be rewritten as a query-defined function (let statement) that receives a parameter tableName
- which is passed into the table() function.
let foo = (tableName:string)
{
table(tableName) | count
};
foo('StormEvents')
Output
Count |
---|
59066 |
Use table() inside Functions
The same query as above can be rewritten to be used in a function that
receives a parameter tableName
- which is passed into the table() function.
.create function foo(tableName:string)
{
table(tableName) | count
};
Use table() with non-constant parameter
A parameter, which isn’t a scalar constant string, can’t be passed as a parameter to the table()
function.
Below, given an example of workaround for such case.
let T1 = print x=1;
let T2 = print x=2;
let _choose = (_selector:string)
{
union
(T1 | where _selector == 'T1'),
(T2 | where _selector == 'T2')
};
_choose('T2')
Output
x |
---|
2 |
14.10 - toscalar()
Returns a scalar constant value of the evaluated expression.
This function is useful for queries that require staged calculations. For example, calculate a total count of events, and then use the result to filter groups that exceed a certain percent of all events.
Any two statements must be separated by a semicolon.
Syntax
toscalar(
expression)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expression | string | ✔️ | The value to convert to a scalar value. |
Returns
A scalar constant value of the evaluated expression. If the result is a tabular, then the first column and first row will be taken for conversion.
Limitations
toscalar()
can’t be applied on a scenario that applies the function on each row. This is because the function can only be calculated a constant number of times during the query execution.
Usually, when this limitation is hit, the following error will be returned: can't use '<column name>' as it is defined outside its row-context scope.
In the following example, the query fails with the error:
let _dataset1 = datatable(x:long)[1,2,3,4,5];
let _dataset2 = datatable(x:long, y:long) [ 1, 2, 3, 4, 5, 6];
let tg = (x_: long)
{
toscalar(_dataset2| where x == x_ | project y);
};
_dataset1
| extend y = tg(x)
This failure can be mitigated by using the join
operator, as in the following example:
let _dataset1 = datatable(x: long)[1, 2, 3, 4, 5];
let _dataset2 = datatable(x: long, y: long) [1, 2, 3, 4, 5, 6];
_dataset1
| join (_dataset2) on x
| project x, y
Output
x | y |
---|---|
1 | 2 |
3 | 4 |
5 | 6 |
Examples
Evaluate Start
, End
, and Step
as scalar constants, and use the result for range
evaluation.
let Start = toscalar(print x=1);
let End = toscalar(range x from 1 to 9 step 1 | count);
let Step = toscalar(2);
range z from Start to End step Step | extend start=Start, end=End, step=Step
Output
z | start | end | step |
---|---|---|---|
1 | 1 | 9 | 2 |
3 | 1 | 9 | 2 |
5 | 1 | 9 | 2 |
7 | 1 | 9 | 2 |
9 | 1 | 9 | 2 |
The following example shows how toscalar
can be used to “fix” an expression
so that it will be calculated precisely once. In this case, the expression being
calculated returns a different value per evaluation.
let g1 = toscalar(new_guid());
let g2 = new_guid();
range x from 1 to 2 step 1
| extend x=g1, y=g2
Output
x | y |
---|---|
e6a15e72-756d-4c93-93d3-fe85c18d19a3 | c2937642-0d30-4b98-a157-a6706e217620 |
e6a15e72-756d-4c93-93d3-fe85c18d19a3 | c6a48cb3-9f98-4670-bf5b-589d0e0dcaf5 |
15 - Tabular operators
15.1 - Join operator
15.1.1 - join flavors
15.1.1.1 - fullouter join
A fullouter
join combines the effect of applying both left and right outer-joins. For columns of the table that lack a matching row, the result set contains null
values. For those records that do match, a single row is produced in the result set containing fields populated from both tables.
Syntax
LeftTable |
join
kind=fullouter
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from both tables, including the matching keys.
Rows: All records from both tables with unmatched cells populated with null.
Example
This example query combines rows from both tables X and Y, filling in missing values with NULL where there’s no match in the other table. This allows you to see all possible combinations of keys from both tables.
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40
];
X | join kind=fullouter Y on Key
Output
Key | Value1 | Key1 | Value2 |
---|---|---|---|
b | 3 | b | 10 |
b | 2 | b | 10 |
c | 4 | c | 20 |
c | 4 | c | 30 |
d | 40 | ||
a | 1 |
Related content
- Learn about other join flavors
15.1.1.2 - inner join
The inner
join flavor is like the standard inner join from the SQL world. An output record is produced whenever a record on the left side has the same join key as the record on the right side.
Syntax
LeftTable |
join
kind=inner
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from both tables, including the matching keys.
Rows: Only matching rows from both tables.
Example
The example query combines rows from tables X and Y where the keys match, showing only the rows that exist in both tables.
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'k',5,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40,
'k',50
];
X | join kind=inner Y on Key
Output
Key | Value1 | Key1 | Value2 |
---|---|---|---|
b | 3 | b | 10 |
b | 2 | b | 10 |
c | 4 | c | 20 |
c | 4 | c | 30 |
k | 5 | k | 50 |
Related content
- Learn about other join flavors
15.1.1.3 - innerunique join
The innerunique
join flavor removes duplicate keys from the left side. This behavior ensures that the output contains a row for every combination of unique left and right keys.
By default, the innerunique
join flavor is used if the kind
parameter isn’t specified. This default implementation is useful in log/trace analysis scenarios, where you aim to correlate two events based on a shared correlation ID. It allows you to retrieve all instances of the phenomenon while disregarding duplicate trace records that contribute to the correlation.
Syntax
LeftTable |
join
kind=innerunique
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from both tables, including the matching keys.
Rows: All deduplicated rows from the left table that match rows from the right table.
Examples
Review the examples and run them in your Data Explorer query page.
Use the default innerunique join
The example query combines rows from tables X and Y where the keys match, showing only the rows that exist in both tables
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40
];
X | join Y on Key
Output
Key | Value1 | Key1 | Value2 |
---|---|---|---|
b | 2 | b | 10 |
c | 4 | c | 20 |
c | 4 | c | 30 |
The query executed the default join, which is an inner join after deduplicating the left side based on the join key. The deduplication keeps only the first record. The resulting left side of the join after deduplication is:
Key | Value1 |
---|---|
a | 1 |
b | 2 |
c | 4 |
Two possible outputs from innerunique join
let t1 = datatable(key: long, value: string)
[
1, "val1.1",
1, "val1.2"
];
let t2 = datatable(key: long, value: string)
[
1, "val1.3",
1, "val1.4"
];
t1
| join kind = innerunique
t2
on key
Output
key | value | key1 | value1 |
---|---|---|---|
1 | val1.1 | 1 | val1.3 |
1 | val1.1 | 1 | val1.4 |
let t1 = datatable(key: long, value: string)
[
1, "val1.1",
1, "val1.2"
];
let t2 = datatable(key: long, value: string)
[
1, "val1.3",
1, "val1.4"
];
t1
| join kind = innerunique
t2
on key
Output
key | value | key1 | value1 |
---|---|---|---|
1 | val1.2 | 1 | val1.3 |
1 | val1.2 | 1 | val1.4 |
- Kusto is optimized to push filters that come after the
join
, towards the appropriate join side, left or right, when possible. - Sometimes, the flavor used is innerunique and the filter is propagated to the left side of the join. The flavor is automatically propagated and the keys that apply to that filter appear in the output.
- Use the previous example and add a filter
where value == "val1.2"
. It gives the second result and will never give the first result for the datasets:
let t1 = datatable(key: long, value: string)
[
1, "val1.1",
1, "val1.2"
];
let t2 = datatable(key: long, value: string)
[
1, "val1.3",
1, "val1.4"
];
t1
| join kind = innerunique
t2
on key
| where value == "val1.2"
Output
key | value | key1 | value1 |
---|---|---|---|
1 | val1.2 | 1 | val1.3 |
1 | val1.2 | 1 | val1.4 |
Get extended sign-in activities
Get extended activities from a login
that some entries mark as the start and end of an activity.
let Events = MyLogTable | where type=="Event" ;
Events
| where Name == "Start"
| project Name, City, ActivityId, StartTime=timestamp
| join (Events
| where Name == "Stop"
| project StopTime=timestamp, ActivityId)
on ActivityId
| project City, ActivityId, StartTime, StopTime, Duration = StopTime - StartTime
let Events = MyLogTable | where type=="Event" ;
Events
| where Name == "Start"
| project Name, City, ActivityIdLeft = ActivityId, StartTime=timestamp
| join (Events
| where Name == "Stop"
| project StopTime=timestamp, ActivityIdRight = ActivityId)
on $left.ActivityIdLeft == $right.ActivityIdRight
| project City, ActivityId, StartTime, StopTime, Duration = StopTime - StartTime
Related content
- Learn about other join flavors
15.1.1.4 - leftanti join
The leftanti
join flavor returns all records from the left side that don’t match any record from the right side. The anti join models the “NOT IN” query.
Syntax
LeftTable |
join
kind=leftanti
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from the left table.
Rows: All records from the left table that don’t match records from the right table.
Example
The example query combines rows from tables X and Y where there is no match in Y for the keys in X, effectively filtering out any rows in X that have corresponding rows in Y.
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40
];
X | join kind=leftanti Y on Key
Output
Key | Value1 |
---|---|
a | 1 |
Related content
- Learn about other join flavors
15.1.1.5 - leftouter join
The leftouter
join flavor returns all the records from the left side table and only matching records from the right side table.
Syntax
LeftTable |
join
kind=leftouter
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from both tables, including the matching keys.
Rows: All records from the left table and only matching rows from the right table.
Example
The result of a left outer join for tables X and Y always contains all records of the left table (X), even if the join condition doesn’t find any matching record in the right table (Y).
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40
];
X | join kind=leftouter Y on Key
Output
Key | Value1 | Key1 | Value2 |
---|---|---|---|
a | 1 | ||
b | 2 | b | 10 |
b | 3 | b | 10 |
c | 4 | c | 20 |
c | 4 | c | 30 |
Related content
- Learn about other join flavors
15.1.1.6 - leftsemi join
The leftsemi
join flavor returns all records from the left side that match a record from the right side. Only columns from the left side are returned.
Syntax
LeftTable |
join
kind=leftsemi
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from the left table.
Rows: All records from the left table that match records from the right table.
Example
This query filters and returns only those rows from table X that have a matching key in table Y.
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40
];
X | join kind=leftsemi Y on Key
Output
Key | Value1 |
---|---|
b | 2 |
b | 3 |
c | 4 |
Related content
- Learn about other join flavors
15.1.1.7 - rightanti join
The rightanti
join flavor returns all records from the right side that don’t match any record from the left side. The anti join models the “NOT IN” query.
Syntax
LeftTable |
join
kind=rightanti
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from the right table.
Rows: All records from the right table that don’t match records from the left table.
Example
This query filters and returns only those rows from table Y that do not have a matching key in table X.
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40
];
X | join kind=rightanti Y on Key
Output
Key | Value1 |
---|---|
d | 40 |
Related content
- Learn about other join flavors
15.1.1.8 - rightouter join
The rightouter
join flavor returns all the records from the right side and only matching records from the left side. This join flavor resembles the leftouter
join flavor, but the treatment of the tables is reversed.
Syntax
LeftTable |
join
kind=rightouter
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from both tables, including the matching keys.
Rows: All records from the right table and only matching rows from the left table.
Example
This query returns all rows from table Y and any matching rows from table X, filling in NULL values where there is no match from X.
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40
];
X | join kind=rightouter Y on Key
Output
Key | Value1 | Key1 | Value2 |
---|---|---|---|
b | 2 | b | 10 |
b | 3 | b | 10 |
c | 4 | c | 20 |
c | 4 | c | 30 |
d | 40 |
Related content
- Learn about other join flavors
15.1.1.9 - rightsemi join
The rightsemi
join flavor returns all records from the right side that match a record from the left side. Only columns from the right side are returned.
Syntax
LeftTable |
join
kind=rightsemi
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from the right table.
Rows: All records from the right table that match records from the left table.
Example
This query filters and returns only those rows from table Y that have a matching key in table X.
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40
];
X | join kind=rightsemi Y on Key
Output
Key | Value2 |
---|---|
b | 10 |
c | 20 |
c | 30 |
Related content
- Learn about other join flavors
15.1.2 - Broadcast join
Today, regular joins are executed on a cluster single node. Broadcast join is an execution strategy of join that distributes the join over cluster nodes. This strategy is useful when the left side of the join is small (up to several tens of MBs). In this case, a broadcast join is more performant than a regular join.
Today, regular joins are executed on an Eventhouse single node. Broadcast join is an execution strategy of join that distributes the join over Eventhouse nodes. This strategy is useful when the left side of the join is small (up to several tens of MBs). In this case, a broadcast join is more performant than a regular join.
Use the lookup operator if the right side is smaller than the left side. The lookup operator runs in broadcast strategy by default when the right side is smaller than the left.
If left side of the join is a small dataset, then you may run join in broadcast mode using the following syntax (hint.strategy = broadcast):
leftSide
| join hint.strategy = broadcast (factTable) on key
The performance improvement is more noticeable in scenarios where the join is followed by other operators such as summarize
. See the following query for example:
leftSide
| join hint.strategy = broadcast (factTable) on Key
| summarize dcount(Messages) by Timestamp, Key
Related content
15.1.3 - Cross-cluster join
A cross-cluster join involves joining data from datasets that reside in different clusters.
In a cross-cluster join, the query can be executed in three possible locations, each with a specific designation for reference throughout this document:
- Local cluster: The cluster to which the request is sent, which is also known as the cluster hosting the database in context.
- Left cluster: The cluster hosting the data on the left side of the join operation.
- Right cluster: The cluster hosting the data on the right side of the join operation.
The cluster that runs the query fetches the data from the other cluster.
Syntax
[ cluster(
ClusterName).database(
DatabaseName).
]LeftTable |
…|
join
[ hint.remote=
Strategy ] (
[ cluster(
ClusterName).database(
DatabaseName).
]RightTable |
…
)
on Conditions
Parameters
Name | Type | Required | Description |
---|---|---|---|
LeftTable | string | ✔️ | The left table or tabular expression whose rows are to be merged. Denoted as $left . |
Strategy | string | Determines the cluster on which to execute the join. Supported values are: left , right , local , and auto . For more information, see Strategies. | |
ClusterName | string | If the data for the join resides outside of the local cluster, use the cluster() function to specify the cluster. | |
DatabaseName | string | If the data for the join resides outside of the local database context, use the database() function to specify the database. | |
RightTable | string | ✔️ | The right table or tabular expression whose rows are to be merged. Denoted as $right . |
Conditions | string | ✔️ | Determines how rows from LeftTable are matched with rows from RightTable. If the columns you want to match have the same name in both tables, use the syntax ON ColumnName. Otherwise, use the syntax ON $left. LeftColumn == $right. RightColumn. To specify multiple conditions, you can either use the “and” keyword or separate them with commas. If you use commas, the conditions are evaluated using the “and” logical operator. |
Strategies
The following list explains the supported values for the Strategy parameter:
left
: Execute join on the cluster of the left table, or left cluster.right
: Execute join on the cluster of the right table, or right cluster.local
: Execute join on the cluster of the current cluster, or local cluster.auto
: (Default) Kusto makes the remoting decision.
How the auto strategy works
By default, the auto
strategy determines where the cross-cluster join is executed based on the following rules:
If one of the tables is hosted in the local cluster, then the join is performed on the local cluster. For example, with the auto strategy, this query is executed on the local cluster:
T | ... | join (cluster("B").database("DB").T2 | ...) on Col1
If both tables are hosted outside of the local cluster, then join is performed on the right cluster. For example, assuming neither cluster is the local cluster, the join would be executed on the right cluster:
cluster("B").database("DB").T | ... | join (cluster("C").database("DB2").T2 | ...) on Col1
Performance considerations
For optimal performance, we recommend running the query on the cluster that contains the largest table.
In the following example, if the dataset produced by T | ...
is smaller than one produced by cluster("B").database("DB").T2 | ...
then it would be more efficient to execute the join operation on cluster B
, in this case the right cluster instead of on the local cluster.
T | ... | join (cluster("B").database("DB").T2 | ...) on Col1
You can rewrite the query to use hint.remote=right
to optimize the performance. In this way, the join operation is performed on the right cluster, even if left table is in the local cluster.
T | ... | join hint.remote=right (cluster("B").database("DB").T2 | ...) on Col1
Related content
15.1.4 - join operator
Merge the rows of two tables to form a new table by matching values of the specified columns from each table.
Kusto Query Language (KQL) offers many kinds of joins that each affect the schema and rows in the resultant table in different ways. For example, if you use an inner
join, the table has the same columns as the left table, plus the columns from the right table. For best performance, if one table is always smaller than the other, use it as the left side of the join
operator.
The following image provides a visual representation of the operation performed by each join. The color of the shading represents the columns returned, and the areas shaded represent the rows returned.
Syntax
LeftTable |
join
[ kind
=
JoinFlavor ] [ Hints ] (
RightTable)
on
Conditions
Parameters
Name | Type | Required | Description |
---|---|---|---|
LeftTable | string | ✔️ | The left table or tabular expression, sometimes called the outer table, whose rows are to be merged. Denoted as $left . |
JoinFlavor | string | The type of join to perform: innerunique , inner , leftouter , rightouter , fullouter , leftanti , rightanti , leftsemi , rightsemi . The default is innerunique . For more information about join flavors, see Returns. | |
Hints | string | Zero or more space-separated join hints in the form of Name = Value that control the behavior of the row-match operation and execution plan. For more information, see Hints. | |
RightTable | string | ✔️ | The right table or tabular expression, sometimes called the inner table, whose rows are to be merged. Denoted as $right . |
Conditions | string | ✔️ | Determines how rows from LeftTable are matched with rows from RightTable. If the columns you want to match have the same name in both tables, use the syntax ON ColumnName. Otherwise, use the syntax ON $left. LeftColumn == $right. RightColumn. To specify multiple conditions, you can either use the “and” keyword or separate them with commas. If you use commas, the conditions are evaluated using the “and” logical operator. |
Hints
Hint key | Values | Description |
---|---|---|
hint.remote | auto , left , local , right | See Cross-Cluster Join |
hint.strategy=broadcast | Specifies the way to share the query load on cluster nodes. | See broadcast join |
hint.shufflekey=<key> | The shufflekey query shares the query load on cluster nodes, using a key to partition data. | See shuffle query |
hint.strategy=shuffle | The shuffle strategy query shares the query load on cluster nodes, where each node processes one partition of the data. | See shuffle query |
Name | Values | Description |
---|---|---|
hint.remote | auto , left , local , right | |
hint.strategy=broadcast | Specifies the way to share the query load on cluster nodes. | See broadcast join |
hint.shufflekey=<key> | The shufflekey query shares the query load on cluster nodes, using a key to partition data. | See shuffle query |
hint.strategy=shuffle | The shuffle strategy query shares the query load on cluster nodes, where each node processes one partition of the data. | See shuffle query |
Returns
The return schema and rows depend on the join flavor. The join flavor is specified with the kind keyword. The following table shows the supported join flavors. To see examples for a specific join flavor, select the link in the Join flavor column.
Join flavor | Returns | Illustration |
---|---|---|
innerunique (default) | Inner join with left side deduplication Schema: All columns from both tables, including the matching keys Rows: All deduplicated rows from the left table that match rows from the right table | :::image type=“icon” source=“media/joinoperator/join-innerunique.png” border=“false”::: |
inner | Standard inner join Schema: All columns from both tables, including the matching keys Rows: Only matching rows from both tables | :::image type=“icon” source=“media/joinoperator/join-inner.png” border=“false”::: |
leftouter | Left outer join Schema: All columns from both tables, including the matching keys Rows: All records from the left table and only matching rows from the right table | :::image type=“icon” source=“media/joinoperator/join-leftouter.png” border=“false”::: |
rightouter | Right outer join Schema: All columns from both tables, including the matching keys Rows: All records from the right table and only matching rows from the left table | :::image type=“icon” source=“media/joinoperator/join-rightouter.png” border=“false”::: |
fullouter | Full outer join Schema: All columns from both tables, including the matching keys Rows: All records from both tables with unmatched cells populated with null | :::image type=“icon” source=“media/joinoperator/join-fullouter.png” border=“false”::: |
leftsemi | Left semi join Schema: All columns from the left table Rows: All records from the left table that match records from the right table | :::image type=“icon” source=“media/joinoperator/join-leftsemi.png” border=“false”::: |
leftanti , anti , leftantisemi | Left anti join and semi variant Schema: All columns from the left table Rows: All records from the left table that don’t match records from the right table | :::image type=“icon” source=“media/joinoperator/join-leftanti.png” border=“false”::: |
rightsemi | Right semi join Schema: All columns from the right table Rows: All records from the right table that match records from the left table | :::image type=“icon” source=“media/joinoperator/join-rightsemi.png” border=“false”::: |
rightanti , rightantisemi | Right anti join and semi variant Schema: All columns from the right table Rows: All records from the right table that don’t match records from the left table | :::image type=“icon” source=“media/joinoperator/join-rightanti.png” border=“false”::: |
Cross-join
KQL doesn’t provide a cross-join flavor. However, you can achieve a cross-join effect by using a placeholder key approach.
In the following example, a placeholder key is added to both tables and then used for the inner join operation, effectively achieving a cross-join-like behavior:
X | extend placeholder=1 | join kind=inner (Y | extend placeholder=1) on placeholder
Related content
15.1.5 - Joining within time window
It’s often useful to join between two large datasets on some high-cardinality key, such as an operation ID or a session ID, and further limit the right-hand-side ($right) records that need to match up with each left-hand-side ($left) record by adding a restriction on the “time-distance” between datetime
columns on the left and on the right.
The above operation differs from the usual join operation, since for the equi-join
part of matching the high-cardinality key between the left and right datasets, the system can also apply a distance function and use it to considerably speed up the join.
Example to identify event sequences without time window
To identify event sequences within a relatively small time window, this example uses a table T
with the following schema:
SessionId
: A column of typestring
with correlation IDs.EventType
: A column of typestring
that identifies the event type of the record.Timestamp
: A column of typedatetime
indicates when the event described by the record happened.
SessionId | EventType | Timestamp |
---|---|---|
0 | A | 2017-10-01T00:00:00Z |
0 | B | 2017-10-01T00:01:00Z |
1 | B | 2017-10-01T00:02:00Z |
1 | A | 2017-10-01T00:03:00Z |
3 | A | 2017-10-01T00:04:00Z |
3 | B | 2017-10-01T00:10:00Z |
The following query creates the dataset and then identifies all the session IDs in which event type A
was followed by an event type B
within a 1min
time window.
let T = datatable(SessionId:string, EventType:string, Timestamp:datetime)
[
'0', 'A', datetime(2017-10-01 00:00:00),
'0', 'B', datetime(2017-10-01 00:01:00),
'1', 'B', datetime(2017-10-01 00:02:00),
'1', 'A', datetime(2017-10-01 00:03:00),
'3', 'A', datetime(2017-10-01 00:04:00),
'3', 'B', datetime(2017-10-01 00:10:00),
];
T
| where EventType == 'A'
| project SessionId, Start=Timestamp
| join kind=inner
(
T
| where EventType == 'B'
| project SessionId, End=Timestamp
) on SessionId
| where (End - Start) between (0min .. 1min)
| project SessionId, Start, End
Output
SessionId | Start | End |
---|---|---|
0 | 2017-10-01 00:00:00.0000000 | 2017-10-01 00:01:00.0000000 |
Example optimized with time window
To optimize this query, we can rewrite it to account for the time window. THe time window is expressed as a join key. Rewrite the query so that the datetime
values are “discretized” into buckets whose size is half the size of the time window. Use equi-join
to compare the bucket IDs.
The query finds pairs of events within the same session (SessionId) where an ‘A’ event is followed by a ‘B’ event within 1 minute. It projects the session ID, the start time of the ‘A’ event, and the end time of the ‘B’ event.
let T = datatable(SessionId:string, EventType:string, Timestamp:datetime)
[
'0', 'A', datetime(2017-10-01 00:00:00),
'0', 'B', datetime(2017-10-01 00:01:00),
'1', 'B', datetime(2017-10-01 00:02:00),
'1', 'A', datetime(2017-10-01 00:03:00),
'3', 'A', datetime(2017-10-01 00:04:00),
'3', 'B', datetime(2017-10-01 00:10:00),
];
let lookupWindow = 1min;
let lookupBin = lookupWindow / 2.0;
T
| where EventType == 'A'
| project SessionId, Start=Timestamp, TimeKey = bin(Timestamp, lookupBin)
| join kind=inner
(
T
| where EventType == 'B'
| project SessionId, End=Timestamp,
TimeKey = range(bin(Timestamp-lookupWindow, lookupBin),
bin(Timestamp, lookupBin),
lookupBin)
| mv-expand TimeKey to typeof(datetime)
) on SessionId, TimeKey
| where (End - Start) between (0min .. lookupWindow)
| project SessionId, Start, End
Output
SessionId | Start | End |
---|---|---|
0 | 2017-10-01 00:00:00.0000000 | 2017-10-01 00:01:00.0000000 |
5 million data query
The next query emulates an extensive dataset of 5M records and approximately 1M Session IDs and runs the query with the time window technique.
let T = range x from 1 to 5000000 step 1
| extend SessionId = rand(1000000), EventType = rand(3), Time=datetime(2017-01-01)+(x * 10ms)
| extend EventType = case(EventType < 1, "A",
EventType < 2, "B",
"C");
let lookupWindow = 1min;
let lookupBin = lookupWindow / 2.0;
T
| where EventType == 'A'
| project SessionId, Start=Time, TimeKey = bin(Time, lookupBin)
| join kind=inner
(
T
| where EventType == 'B'
| project SessionId, End=Time,
TimeKey = range(bin(Time-lookupWindow, lookupBin),
bin(Time, lookupBin),
lookupBin)
| mv-expand TimeKey to typeof(datetime)
) on SessionId, TimeKey
| where (End - Start) between (0min .. lookupWindow)
| project SessionId, Start, End
| count
Output
Count |
---|
3344 |
Related content
15.2 - Render operator
15.2.1 - visualizations
15.2.1.1 - Anomaly chart visualization
The anomaly chart visualization is similar to a timechart, but highlights anomalies using the series_decompose_anomalies function.
Syntax
T |
render
anomalychart
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ysplit | How to split the visualization into multiple y-axis values. For more information, see Multiple y-axes. |
ytitle | The title of the y-axis (of type string ). |
anomalycolumns | Comma-delimited list of columns, which will be considered as anomaly series and displayed as points on the chart |
ysplit
property
This visualization supports splitting into multiple y-axis values. The supported values of this property are:
ysplit | Description |
---|---|
none | A single y-axis is displayed for all series data. (Default) |
axes | A single chart is displayed with multiple y-axes (one per series). |
panels | One chart is rendered for each ycolumn value. Maximum five panels. |
Example
The example in this section shows how to use the syntax to help you get started.
let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid
| where sid == 'TS1' // select a single time series for a cleaner visualization
| extend (anomalies, score, baseline) = series_decompose_anomalies(num, 1.5, -1, 'linefit')
| render anomalychart with(anomalycolumns=anomalies, title='Web app. traffic of a month, anomalies') //use "| render anomalychart with anomalycolumns=anomalies" to render the anomalies as bold points on the series charts.
15.2.1.2 - Area chart visualization
The area chart visual shows a time-series relationship. The first column of the query should be numeric and is used as the x-axis. Other numeric columns are the y-axes. Unlike line charts, area charts also visually represent volume. Area charts are ideal for indicating the change among different datasets.
Syntax
T |
render
areachart
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
kind | Further elaboration of the visualization kind. For more information, see kind property. |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ysplit | How to split the y-axis values for multiple visualizations. |
ytitle | The title of the y-axis (of type string ). |
ysplit
property
This visualization supports splitting into multiple y-axis values:
ysplit | Description |
---|---|
none | A single y-axis is displayed for all series data. (Default) |
axes | A single chart is displayed with multiple y-axes (one per series). |
panels | One chart is rendered for each ycolumn value. Maximum five panels. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
kind | Further elaboration of the visualization kind. For more information, see kind property. |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
kind
property
This visualization can be further elaborated by providing the kind
property.
The supported values of this property are:
kind value | Description |
---|---|
default | Each “area” stands on its own. |
unstacked | Same as default . |
stacked | Stack “areas” to the right. |
stacked100 | Stack “areas” to the right and stretch each one to the same width as the others. |
Examples
The example in this section shows how to use the syntax to help you get started.
Simple area chart
The following example shows a basic area chart visualization.
demo_series3
| render areachart
Area chart using properties
The following example shows an area chart using multiple property settings.
OccupancyDetection
| summarize avg_temp= avg(Temperature), avg_humidity= avg(Humidity) by bin(Timestamp, 1h)
| render areachart
with (
kind = unstacked,
legend = visible,
ytitle ="Sample value",
ymin = 10,
ymax =100,
xtitle = "Time",
title ="Humidity and temperature"
)
Area chart using split panels
The following example shows an area chart using split panels. In this example, the ysplit
property is set to panels
.
StormEvents
| where State in ("TEXAS", "NEBRASKA", "KANSAS") and EventType == "Hail"
| summarize count=count() by State, bin(StartTime, 1d)
| render areachart
with (
ysplit= panels,
legend = visible,
ycolumns=count,
yaxis =log,
ytitle ="Count",
ymin = 0,
ymax =100,
xaxis = linear,
xcolumn = StartTime,
xtitle = "Date",
title ="Hail events"
)
15.2.1.3 - Bar chart visualization
The bar chart visual needs a minimum of two columns in the query result. By default, the first column is used as the y-axis. This column can contain text, datetime, or numeric data types. The other columns are used as the x-axis and contain numeric data types to be displayed as horizontal lines. Bar charts are used mainly for comparing numeric and nominal discrete values, where the length of each line represents its value.
Syntax
T |
render
barchart
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors (true or false ). |
kind | Further elaboration of the visualization kind. For more information, see kind property. |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ytitle | The title of the y-axis (of type string ). |
ysplit | How to split the visualization into multiple y-axis values. For more information, see ysplit property. |
ysplit
property
This visualization supports splitting into multiple y-axis values:
ysplit | Description |
---|---|
none | A single y-axis is displayed for all series data. This is the default. |
axes | A single chart is displayed with multiple y-axes (one per series). |
panels | One chart is rendered for each ycolumn value. Maximum five panels. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
kind | Further elaboration of the visualization kind. For more information, see kind property. |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
kind
property
This visualization can be further elaborated by providing the kind
property.
The supported values of this property are:
kind value | Description |
---|---|
default | Each “bar” stands on its own. |
unstacked | Same as default . |
stacked | Stack “bars”. |
stacked100 | Stack “bars” and stretch each one to the same width as the others. |
Examples
The example in this section shows how to use the syntax to help you get started.
Render a bar chart
The following query creates a bar chart displaying the number of storm events for each state, filtering only those states with more than 10 events. The chart provides a visual representation of the event distribution across different states.
StormEvents
| summarize event_count=count() by State
| project State, event_count
| render barchart
with (
title="Storm count by state",
ytitle="Storm count",
xtitle="State",
legend=hidden
)
Render a stacked
bar chart
The following query creates a stacked
bar chart that shows the total count of storm events by their type for selected states of Texas, California, and Florida. Each bar represents a storm event type, and the stacked bars show the breakdown of storm events by state within each type.
StormEvents
| where State in ("TEXAS", "CALIFORNIA", "FLORIDA")
| summarize EventCount = count() by EventType, State
| order by EventType asc, State desc
| render barchart with (kind=stacked)
Render a stacked100
bar chart
The following query creates a stacked100
bar chart that shows the total count of storm events by their type for selected states of Texas, California, and Florida. The chart shows the distribution of storm events across states within each type. Although the stacks visually sum up to 100, the values actually represent the number of events, not percentages. This visualization is helpful for understanding both the percentages and the actual event counts.
StormEvents
| where State in ("TEXAS", "CALIFORNIA", "FLORIDA")
| summarize EventCount = count() by EventType, State
| order by EventType asc, State desc
| render barchart with (kind=stacked100)
Use the ysplit
property
The following query provides a daily summary of storm-related injuries and deaths, visualized as a bar chart with split axes/panels for better comparison.
StormEvents
| summarize
TotalInjuries = sum(InjuriesDirect) + sum(InjuriesIndirect),
TotalDeaths = sum(DeathsDirect) + sum(DeathsIndirect)
by bin(StartTime, 1d)
| project StartTime, TotalInjuries, TotalDeaths
| render barchart with (ysplit=axes)
To split the view into separate panels, specify panels
instead of axes
:
StormEvents
| summarize
TotalInjuries = sum(InjuriesDirect) + sum(InjuriesIndirect),
TotalDeaths = sum(DeathsDirect) + sum(DeathsIndirect)
by bin(StartTime, 1d)
| project StartTime, TotalInjuries, TotalDeaths
| render barchart with (ysplit=panels)
15.2.1.4 - Card visualization
The card visual only shows one element. If there are multiple columns and rows in the output, the first result record is treated as set of scalar values and shows as a card.
Syntax
T |
render
card
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
title | The title of the visualization (of type string ). |
Example
This query provides a count of flood events in Virginia and displays the result in a card format.
StormEvents
| where State=="VIRGINIA" and EventType=="Flood"
| count
| render card with (title="Floods in Virginia")
15.2.1.5 - Column chart visualization
The column chart visual needs a minimum of two columns in the query result. By default, the first column is used as the x-axis. This column can contain text, datetime, or numeric data types. The other columns are used as the y-axis and contain numeric data types to be displayed as vertical lines. Column charts are used for comparing specific sub category items in a main category range, where the length of each line represents its value.
Syntax
T |
render
columnchart
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
kind | Further elaboration of the visualization kind. For more information, see kind property. |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ytitle | The title of the y-axis (of type string ). |
ysplit | How to split the visualization into multiple y-axis values. For more information, see ysplit property. |
ysplit
property
This visualization supports splitting into multiple y-axis values:
ysplit | Description |
---|---|
none | A single y-axis is displayed for all series data. This is the default. |
axes | A single chart is displayed with multiple y-axes (one per series). |
panels | One chart is rendered for each ycolumn value. Maximum five panels. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
kind | Further elaboration of the visualization kind. For more information, see kind property. |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
kind
property
This visualization can be further elaborated by providing the kind
property.
The supported values of this property are:
kind value | Definition |
---|---|
default | Each “column” stands on its own. |
unstacked | Same as default . |
stacked | Stack “columns” one atop the other. |
stacked100 | Stack “columns” and stretch each one to the same height as the others. |
Examples
The example in this section shows how to use the syntax to help you get started.
Render a column chart
This query provides a visual representation of states with a high frequency of storm events, specifically those with more than 10 events, using a column chart.
StormEvents
| summarize event_count=count() by State
| where event_count > 10
| project State, event_count
| render columnchart
Use the ysplit
property
This query provides a daily summary of storm-related injuries and deaths, visualized as a column chart with split axes/panels for better comparison.
StormEvents
| summarize
TotalInjuries = sum(InjuriesDirect) + sum(InjuriesIndirect),
TotalDeaths = sum(DeathsDirect) + sum(DeathsIndirect)
by bin(StartTime, 1d)
| project StartTime, TotalInjuries, TotalDeaths
| render columnchart with (ysplit=axes)
To split the view into separate panels, specify panels
instead of axes
:
StormEvents
| summarize
TotalInjuries = sum(InjuriesDirect) + sum(InjuriesIndirect),
TotalDeaths = sum(DeathsDirect) + sum(DeathsIndirect)
by bin(StartTime, 1d)
| project StartTime, TotalInjuries, TotalDeaths
| render columnchart with (ysplit=panels)
Example
This query helps you identify states with a significant number of storm events and presents the information in a clear, visual format.
StormEvents
| summarize event_count=count() by State
| where event_count > 10
| project State, event_count
| render columnchart
15.2.1.6 - Ladder chart visualization
The last two columns are the x-axis, and the other columns are the y-axis.
Syntax
T |
render
ladderchart
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ytitle | The title of the y-axis (of type string ). |
Examples
The example in this section shows how to use the syntax to help you get started.
The examples in this article use publicly available tables in the help cluster, such as the StormEvents table in the Samples database.
Dates of storms by state
This query outputs a state-wise visualization of the duration of rain-related storm events, displayed as a ladder chart to help you analyze the temporal distribution of these events.
StormEvents
| where EventType has "rain"
| summarize min(StartTime), max(EndTime) by State
| render ladderchart
Dates of storms by event type
This query outputs a visualization of the duration of various storm events in Washington, displayed as a ladder chart to help you analyze the temporal distribution of these events by type.
StormEvents
| where State == "WASHINGTON"
| summarize min(StartTime), max(EndTime) by EventType
| render ladderchart
Dates of storms by state and event type
This query outputs a visualization of the duration of various storm events in states starting with “W”, displayed as a ladder chart to help you analyze the temporal distribution of these events by state and event type.
StormEvents
| where State startswith "W"
| summarize min(StartTime), max(EndTime) by State, EventType
| render ladderchart with (series=State, EventType)
15.2.1.7 - Line chart visualization
The line chart visual is the most basic type of chart. The first column of the query should be numeric and is used as the x-axis. Other numeric columns are the y-axes. Line charts track changes over short and long periods of time. When smaller changes exist, line graphs are more useful than bar graphs.
Syntax
T |
render
linechart
[with
(
propertyName =
propertyValue [,
…] )
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors (true or false ). |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ysplit | How to split the visualization into multiple y-axis values. For more information, see ysplit property. |
ytitle | The title of the y-axis (of type string ). |
ysplit
property
This visualization supports splitting into multiple y-axis values:
ysplit | Description |
---|---|
none | A single y-axis is displayed for all series data. (Default) |
axes | A single chart is displayed with multiple y-axes (one per series). |
panels | One chart is rendered for each ycolumn value. Maximum five panels. |
Examples
The example in this section shows how to use the syntax to help you get started.
Render a line chart
This query retrieves storm events in Virginia, focusing on the start time and property damage, and then displays this information in a line chart.
StormEvents
| where State=="VIRGINIA"
| project StartTime, DamageProperty
| render linechart
Label a line chart
This query retrieves storm events in Virginia, focusing on the start time and property damage, and then displays this information in a line chart with specified titles for better clarity and presentation.
StormEvents
| where State=="VIRGINIA"
| project StartTime, DamageProperty
| render linechart
with (
title="Property damage from storms in Virginia",
xtitle="Start time of storm",
ytitle="Property damage"
)
Limit values displayed on the y-axis
This query retrieves storm events in Virginia, focusing on the start time and property damage, and then displays this information in a line chart with specified y-axis limits for better visualization of the data.
StormEvents
| where State=="VIRGINIA"
| project StartTime, DamageProperty
| render linechart with (ymin=7000, ymax=300000)
View multiple y-axes
This query retrieves hail events in Texas, Nebraska, and Kansas. It counts the number of hail events per day for each state, and then displays this information in a line chart with separate panels for each state.
StormEvents
| where State in ("TEXAS", "NEBRASKA", "KANSAS") and EventType == "Hail"
| summarize count() by State, bin(StartTime, 1d)
| render linechart with (ysplit=panels)
Related content
15.2.1.8 - Pie chart visualization
The pie chart visual needs a minimum of two columns in the query result. By default, the first column is used as the color axis. This column can contain text, datetime, or numeric data types. Other columns will be used to determine the size of each slice and contain numeric data types. Pie charts are used for presenting a composition of categories and their proportions out of a total.
The pie chart visual can also be used in the context of Geospatial visualizations.
Syntax
T |
render
piechart
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
kind | Further elaboration of the visualization kind. For more information, see kind property. |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ytitle | The title of the y-axis (of type string ). |
PropertyName | PropertyValue |
---|---|
kind | Further elaboration of the visualization kind. For more information, see kind property. |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
kind
property
This visualization can be further elaborated by providing the kind
property.
The supported values of this property are:
kind value | Description |
---|---|
map | Expected columns are [Longitude, Latitude] or GeoJSON point, color-axis and numeric. Supported in Kusto Explorer desktop. For more information, see Geospatial visualizations |
Example
This query provides a visual representation of the top 10 states with the highest number of storm events, displayed as a pie chart
StormEvents
| summarize statecount=count() by State
| sort by statecount
| limit 10
| render piechart with(title="Storm Events by State")
15.2.1.9 - Pivot chart visualization
Displays a pivot table and chart. You can interactively select data, columns, rows, and various chart types.
Syntax
T |
render
pivotchart
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
Example
This query provides a detailed analysis of sales for Contoso computer products within the specified date range, visualized as a pivot chart.
SalesFact
| join kind= inner Products on ProductKey
| where ProductCategoryName has "Computers" and ProductName has "Contoso"
| where DateKey between (datetime(2006-12-31) .. datetime(2007-02-01))
| project SalesAmount, ProductName, DateKey
| render pivotchart
Output
15.2.1.10 - Plotly visualization
The Plotly graphics library supports ~80 chart types that are useful for advanced charting including geographic, scientific, machine learning, 3d, animation, and many other chart types. For more information, see Plotly.
To render a Plotly visual in Kusto Query Language, the query must generate a table with a single string cell containing Plotly JSON. This Plotly JSON string can be generated by one of the following two methods:
Write your own Plotly visualization in Python
In this method, you dynamically create the Plotly JSON string in Python using the Plotly package. This process requires use of the python() plugin. The Python script is run on the existing nodes using the inline python() plugin. It generates a Plotly JSON that is rendered by the client application.
All types of Plotly visualizations are supported.
Example
The following query uses inline Python to create a 3D scatter chart:
OccupancyDetection
| project Temperature, Humidity, CO2, Occupancy
| where rand() < 0.1
| evaluate python(typeof(plotly:string),
```if 1:
import plotly.express as px
fig = px.scatter_3d(df, x='Temperature', y='Humidity', z='CO2', color='Occupancy')
fig.update_layout(title=dict(text="Occupancy detection, plotly 5.11.0"))
plotly_obj = fig.to_json()
result = pd.DataFrame(data = [plotly_obj], columns = ["plotly"])
```)
The Plotly graphics library supports ~80 chart types including basic charts, scientific, statistical, financial, maps, 3D, animations, and more. To render a Plotly visual in KQL, the query must generate a table with a single string cell containing Plotly JSON.
Since python isn’t available in this service, you create this Plotly JSON using a preprepared template.
Use a preprepared Plotly template
In this method, a preprepared Plotly JSON for specific visualization can be reused by replacing the data objects with the required data to be rendered. The templates can be stored in a standard table, and the data replacement logic can be packed in a stored function.
Currently, the supported templates are: plotly_anomaly_fl() and plotly_scatter3d_fl(). Refer to these documents for syntax and usage.
Example
let plotly_scatter3d_fl=(tbl:(*), x_col:string, y_col:string, z_col:string, aggr_col:string='', chart_title:string='3D Scatter chart')
{
let scatter3d_chart = toscalar(PlotlyTemplate | where name == "scatter3d" | project plotly);
let tbl_ex = tbl | extend _x = column_ifexists(x_col, 0.0), _y = column_ifexists(y_col, 0.0), _z = column_ifexists(z_col, 0.0), _aggr = column_ifexists(aggr_col, 'ALL');
tbl_ex
| serialize
| summarize _x=pack_array(make_list(_x)), _y=pack_array(make_list(_y)), _z=pack_array(make_list(_z)) by _aggr
| summarize _aggr=make_list(_aggr), _x=make_list(_x), _y=make_list(_y), _z=make_list(_z)
| extend plotly = scatter3d_chart
| extend plotly=replace_string(plotly, '$CLASS1$', tostring(_aggr[0]))
| extend plotly=replace_string(plotly, '$CLASS2$', tostring(_aggr[1]))
| extend plotly=replace_string(plotly, '$CLASS3$', tostring(_aggr[2]))
| extend plotly=replace_string(plotly, '$X_NAME$', x_col)
| extend plotly=replace_string(plotly, '$Y_NAME$', y_col)
| extend plotly=replace_string(plotly, '$Z_NAME$', z_col)
| extend plotly=replace_string(plotly, '$CLASS1_X$', tostring(_x[0]))
| extend plotly=replace_string(plotly, '$CLASS1_Y$', tostring(_y[0]))
| extend plotly=replace_string(plotly, '$CLASS1_Z$', tostring(_z[0]))
| extend plotly=replace_string(plotly, '$CLASS2_X$', tostring(_x[1]))
| extend plotly=replace_string(plotly, '$CLASS2_Y$', tostring(_y[1]))
| extend plotly=replace_string(plotly, '$CLASS2_Z$', tostring(_z[1]))
| extend plotly=replace_string(plotly, '$CLASS3_X$', tostring(_x[2]))
| extend plotly=replace_string(plotly, '$CLASS3_Y$', tostring(_y[2]))
| extend plotly=replace_string(plotly, '$CLASS3_Z$', tostring(_z[2]))
| extend plotly=replace_string(plotly, '$TITLE$', chart_title)
| project plotly
};
Iris
| invoke plotly_scatter3d_fl(x_col='SepalLength', y_col='PetalLength', z_col='SepalWidth', aggr_col='Class', chart_title='3D scatter chart using plotly_scatter3d_fl()')
| render plotly
Related content
15.2.1.11 - Scatter chart visualization
In a scatter chart visual, the first column is the x-axis and should be a numeric column. Other numeric columns are y-axes. Scatter plots are used to observe relationships between variables. The scatter chart visual can also be used in the context of Geospatial visualizations.
Syntax
T |
render
scatterchart
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
kind | Further elaboration of the visualization kind. For more information, see kind property. |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ytitle | The title of the y-axis (of type string ). |
PropertyName | PropertyValue |
---|---|
kind | Further elaboration of the visualization kind. For more information, see kind property. |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
kind
property
This visualization can be further elaborated by providing the kind
property.
The supported values of this property are:
kind value | Description |
---|---|
map | Expected columns are [Longitude, Latitude] or GeoJSON point. Series column is optional. For more information, see Geospatial visualizations. |
Example
This query provides a scatter chart that helps you analyze the correlation between state populations and the total property damage caused by storm events.
StormEvents
| summarize sum(DamageProperty)by State
| lookup PopulationData on State
| project-away State
| render scatterchart with (xtitle="State population", title="Property damage by state", legend=hidden)
15.2.1.12 - Stacked area chart visualization
The stacked area chart visual shows a continuous relationship. This visual is similar to the Area chart, but shows the area under each element of a series. The first column of the query should be numeric and is used as the x-axis. Other numeric columns are the y-axes. Unlike line charts, area charts also visually represent volume. Area charts are ideal for indicating the change among different datasets.
Syntax
T |
render
stackedareachart
[with
(
propertyName =
propertyValue [,
…])
]
Supported parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ytitle | The title of the y-axis (of type string ). |
Example
The following query summarizes data from the nyc_taxi
table by number of passengers and visualizes the data in a stacked area chart. The x-axis shows the pickup time in two day intervals, and the stacked areas represent different passenger counts.
nyc_taxi
| summarize count() by passenger_count, bin(pickup_datetime, 2d)
| render stackedareachart with (xcolumn=pickup_datetime, series=passenger_count)
Output
Related content
15.2.1.13 - Table visualization
Default - results are shown as a table.
Syntax
T |
render
table
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ytitle | The title of the y-axis (of type string ). |
PropertyName | PropertyValue |
---|---|
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
Example
This query outputs a snapshot of the first 10 storm event records, displayed in a table format.
StormEvents
| take 10
| render table
15.2.1.14 - Time chart visualization
A time chart visual is a type of line graph. The first column of the query is the x-axis, and should be a datetime. Other numeric columns are y-axes. One string column values are used to group the numeric columns and create different lines in the chart. Other string columns are ignored. The time chart visual is like a line chart except the x-axis is always time.
Syntax
T |
render
timechart
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors (true or false ). |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ysplit | How to split the visualization into multiple y-axis values. For more information, see ysplit property. |
ytitle | The title of the y-axis (of type string ). |
ysplit
property
This visualization supports splitting into multiple y-axis values:
ysplit | Description |
---|---|
none | A single y-axis is displayed for all series data. (Default) |
axes | A single chart is displayed with multiple y-axes (one per series). |
panels | One chart is rendered for each ycolumn value. Maximum five panels. |
Examples
The example in this section shows how to use the syntax to help you get started.
Render a timechart
The following example renders a timechart with a title “Web app. traffic over a month, decomposing” that decomposes the data into baseline, seasonal, trend, and residual components.
let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid
| where sid == 'TS1' // select a single time series for a cleaner visualization
| extend (baseline, seasonal, trend, residual) = series_decompose(num, -1, 'linefit') // decomposition of a set of time series to seasonal, trend, residual, and baseline (seasonal+trend)
| render timechart with(title='Web app. traffic over a month, decomposition')
Label a timechart
The following example renders a timechart that depicts crop damage grouped by week. The timechart x axis label is “Date” and the y axis label is “Crop damage.”
StormEvents
| where StartTime between (datetime(2007-01-01) .. datetime(2007-12-31))
and DamageCrops > 0
| summarize EventCount = count() by bin(StartTime, 7d)
| render timechart
with (
title="Crop damage over time",
xtitle="Date",
ytitle="Crop damage",
legend=hidden
)
View multiple y-axes
The following example renders daily hail events in the states of Texas, Nebraska, and Kansas. The visualization uses the ysplit
property to render each state’s events in separate panels for comparison.
StormEvents
| where State in ("TEXAS", "NEBRASKA", "KANSAS") and EventType == "Hail"
| summarize count() by State, bin(StartTime, 1d)
| render timechart with (ysplit=panels)
Related content
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
Example
The following example renders a timechart with a title “Web app. traffic over a month, decomposing” that decomposes the data into baseline, seasonal, trend, and residual components.
let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid
| where sid == 'TS1' // select a single time series for a cleaner visualization
| extend (baseline, seasonal, trend, residual) = series_decompose(num, -1, 'linefit') // decomposition of a set of time series to seasonal, trend, residual, and baseline (seasonal+trend)
| render timechart with(title='Web app. traffic of a month, decomposition')
15.2.1.15 - Time pivot visualization
The time pivot visualization is an interactive navigation over the events time-line pivoting on time axis.
Syntax
T |
render
timepivot
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ytitle | The title of the y-axis (of type string ). |
Example
This query outputs a visualization of flood events in the specified Midwestern states, displayed as a time pivot chart.
let midwesternStates = dynamic([
"ILLINOIS", "INDIANA", "IOWA", "KANSAS", "MICHIGAN", "MINNESOTA",
"MISSOURI", "NEBRASKA", "NORTH DAKOTA", "OHIO", "SOUTH DAKOTA", "WISCONSIN"
]);
StormEvents
| where EventType == "Flood" and State in (midwesternStates)
| render timepivot with (xcolumn=State)
Output
:::image type=“content” source=“media/visualization-timepivot/time-pivot-visualization.jpg” lightbox=“media/visualization-timepivot/time-pivot-visualization.jpg” alt-text=“Screenshot of timepivot in Kusto.Explorer.”:::
15.2.1.16 - Treemap visualization
Treemaps display hierarchical data as a set of nested rectangles. Each level of the hierarchy is represented by a colored rectangle (branch) containing smaller rectangles (leaves).
Syntax
T |
render
treemap
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
Example
This query counts the number of storm events for each type and state, sorts them in descending order, limits the results to the top 30, and then visualizes the data as a treemap.
StormEvents
| summarize StormEvents=count() by EventType, State
| sort by StormEvents
| limit 30
| render treemap with(title="Storm Events by EventType and State")
15.2.2 - render operator
Instructs the user agent to render a visualization of the query results.
The render operator must be the last operator in the query, and can only be used with queries that produce a single tabular data stream result. The render operator doesn’t modify data. It injects an annotation (“Visualization”) into the result’s extended properties. The annotation contains the information provided by the operator in the query. The interpretation of the visualization information is done by the user agent. Different agents, such as Kusto.Explorer or Azure Data Explorer web UI, may support different visualizations.
The data model of the render operator looks at the tabular data as if it has three kinds of columns:
The x axis column (indicated by the
xcolumn
property).The series columns (any number of columns indicated by the
series
property.) For each record, the combined values of these columns define a single series, and the chart has as many series as there are distinct combined values.The y axis columns (any number of columns indicated by the
ycolumns
property). For each record, the series has as many measurements (“points” in the chart) as there are y-axis columns.by the query. In particular, having “uninteresting” columns in the schema of the result might translate into them guessing wrong. Try projecting-away such columns when that happens.
Syntax
T |
render
visualization [with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
visualization | string | ✔️ | Indicates the kind of visualization to use. Must be one of the supported values in the following list. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Visualization
visualization | Description | Illustration |
---|---|---|
anomalychart | Similar to timechart, but highlights anomalies using series_decompose_anomalies function. | :::image type=“icon” source=“media/renderoperator/anomaly-chart.png” border=“false”::: |
areachart | Area graph. | :::image type=“icon” source=“media/renderoperator/area-chart.png” border=“false”::: |
barchart | displayed as horizontal strips. | :::image type=“icon” source=“media/renderoperator/bar-chart.png” border=“false”::: |
card | First result record is treated as set of scalar values and shows as a card. | :::image type=“icon” source=“media/renderoperator/card.png” border=“false”::: |
columnchart | Like barchart with vertical strips instead of horizontal strips. | :::image type=“icon” source=“media/renderoperator/column-chart.png” border=“false”::: |
ladderchart | Last two columns are the x-axis, other columns are y-axis. | :::image type=“icon” source=“media/renderoperator/ladder-chart.png” border=“false”::: |
linechart | Line graph. | :::image type=“icon” source=“media/renderoperator/line-chart.png” border=“false”::: |
piechart | First column is color-axis, second column is numeric. | :::image type=“icon” source=“media/renderoperator/pie-chart.png” border=“false”::: |
pivotchart | Displays a pivot table and chart. User can interactively select data, columns, rows and various chart types. | :::image type=“icon” source=“media/renderoperator/pivot-chart.png” border=“false”::: |
scatterchart | Points graph. | :::image type=“icon” source=“media/renderoperator/scatter-chart.png” border=“false”::: |
stackedareachart | Stacked area graph. | :::image type=“icon” source=“media/renderoperator/stacked-area-chart.png” border=“false”::: |
table | Default - results are shown as a table. | :::image type=“icon” source=“media/renderoperator/table-visualization.png” border=“false”::: |
timechart | Line graph. First column is x-axis, and must be datetime. Other (numeric) columns are y-axes. | :::image type=“icon” source=“media/renderoperator/visualization-timechart.png” border=“false”::: |
timepivot | Interactive navigation over the events time-line (pivoting on time axis) | :::image type=“icon” source=“media/renderoperator/visualization-time-pivot.png” border=“false”::: |
treemap | Displays hierarchical data as a set of nested rectangles. | :::image type=“icon” source=“media/renderoperator/tree-map.png” border=“false”::: |
Visualization | Description | Illustration |
---|---|---|
areachart | Area graph. First column is the x-axis and should be a numeric column. Other numeric columns are y-axes. | :::image type=“icon” source=“media/renderoperator/area-chart.png” border=“false”::: |
barchart | First column is the x-axis and can be text, datetime or numeric. Other columns are numeric, displayed as horizontal strips. | :::image type=“icon” source=“media/renderoperator/bar-chart.png” border=“false”::: |
columnchart | Like barchart with vertical strips instead of horizontal strips. | :::image type=“icon” source=“media/renderoperator/column-chart.png” border=“false”::: |
piechart | First column is color-axis, second column is numeric. | :::image type=“icon” source=“media/renderoperator/pie-chart.png” border=“false”::: |
scatterchart | Points graph. First column is the x-axis and should be a numeric column. Other numeric columns are y-axes. | :::image type=“icon” source=“media/renderoperator/scatter-chart.png” border=“false”::: |
table | Default - results are shown as a table. | :::image type=“icon” source=“media/renderoperator/table-visualization.png” border=“false”::: |
timechart | Line graph. First column is x-axis, and should be datetime. Other (numeric) columns are y-axes. There’s one string column whose values are used to “group” the numeric columns and create different lines in the chart (further string columns are ignored). | :::image type=“icon” source=“media/renderoperator/visualization-timechart.png” border=“false”::: |
visualization | Description | Illustration |
---|---|---|
anomalychart | Similar to timechart, but highlights anomalies using series_decompose_anomalies function. | :::image type=“icon” source=“media/renderoperator/anomaly-chart.png” border=“false”::: |
areachart | Area graph. | :::image type=“icon” source=“media/renderoperator/area-chart.png” border=“false”::: |
barchart | displayed as horizontal strips. | :::image type=“icon” source=“media/renderoperator/bar-chart.png” border=“false”::: |
card | First result record is treated as set of scalar values and shows as a card. | :::image type=“icon” source=“media/renderoperator/card.png” border=“false”::: |
columnchart | Like barchart with vertical strips instead of horizontal strips. | :::image type=“icon” source=“media/renderoperator/column-chart.png” border=“false”::: |
linechart | Line graph. | :::image type=“icon” source=“media/renderoperator/line-chart.png” border=“false”::: |
piechart | First column is color-axis, second column is numeric. | :::image type=“icon” source=“media/renderoperator/pie-chart.png” border=“false”::: |
scatterchart | Points graph. | :::image type=“icon” source=“media/renderoperator/scatter-chart.png” border=“false”::: |
stackedareachart | Stacked area graph. | :::image type=“icon” source=“media/renderoperator/stacked-area-chart.png” border=“false”::: |
table | Default - results are shown as a table. | :::image type=“icon” source=“media/renderoperator/table-visualization.png” border=“false”::: |
timechart | Line graph. First column is x-axis, and must be datetime. Other (numeric) columns are y-axes. | :::image type=“icon” source=“media/renderoperator/visualization-timechart.png” border=“false”::: |
Supported properties
PropertyName/PropertyValue indicate additional information to use when rendering. All properties are optional. The supported properties are:
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
kind | Further elaboration of the visualization kind. For more information, see kind property. |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ysplit | How to split the visualization into multiple y-axis values. For more information, see y-split property. |
ytitle | The title of the y-axis (of type string ). |
anomalycolumns | Property relevant only for anomalychart . Comma-delimited list of columns, which will be considered as anomaly series and displayed as points on the chart |
PropertyName | PropertyValue |
---|---|
kind | Further elaboration of the visualization kind. For more information, see kind property. |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
kind
property
This visualization can be further elaborated by providing the kind
property.
The supported values of this property are:
Visualization | kind | Description |
---|---|---|
areachart | default | Each “area” stands on its own. |
unstacked | Same as default . | |
stacked | Stack “areas” to the right. | |
stacked100 | Stack “areas” to the right and stretch each one to the same width as the others. | |
barchart | default | Each “bar” stands on its own. |
unstacked | Same as default . | |
stacked | Stack “bars”. | |
stacked100 | Stack “bars” and stretch each one to the same width as the others. | |
columnchart | default | Each “column” stands on its own. |
unstacked | Same as default . | |
stacked | Stack “columns” one atop the other. | |
stacked100 | Stack “columns” and stretch each one to the same height as the others. | |
scatterchart | map | Expected columns are [Longitude, Latitude] or GeoJSON point. Series column is optional. For more information, see Geospatial visualizations. |
piechart | map | Expected columns are [Longitude, Latitude] or GeoJSON point, color-axis and numeric. Supported in Kusto Explorer desktop. For more information, see Geospatial visualizations. |
ysplit
property
Some visualizations support splitting into multiple y-axis values:
ysplit | Description |
---|---|
none | A single y-axis is displayed for all series data. (Default) |
axes | A single chart is displayed with multiple y-axes (one per series). |
panels | One chart is rendered for each ycolumn value. Maximum five panels. |
How to render continuous data
Several visualizations are used for rendering sequences of values, for example, linechart
, timechart
, and areachart
.
These visualizations have the following conceptual model:
- One column in the table represents the x-axis of the data. This column can be explicitly defined using the
xcolumn
property. If not defined, the user agent picks the first column that is appropriate for the visualization.- For example: in the
timechart
visualization, the user agent uses the firstdatetime
column. - If this column is of type
dynamic
and it holds an array, the individual values in the array will be treated as the values of the x-axis.
- For example: in the
- One or more columns in the table represent one or more measures that vary by the x-axis.
These columns can be explicitly defined using the
ycolumns
property. If not defined, the user agent picks all columns that are appropriate for the visualization.- For example: in the
timechart
visualization, the user agent uses all columns with a numeric value that haven’t been specified otherwise. - If the x-axis is an array, the values of each y-axis should also be an array of a similar length, with each y-axis occurring in a single column.
- For example: in the
- Zero or more columns in the table represent a unique set of dimensions that group together the measures. These columns can be specified by the
series
property, or the user agent will pick them automatically from the columns that are otherwise unspecified.
Related content
- Add a query visualization in the web UI
- Customize dashboard visuals
- Rendering examples in the tutorial
- Anomaly detection
three kinds of columns: property). For each record, the series has as many measurements (“points” in the chart) as there are y-axis columns.
Example
InsightsMetrics
| where Computer == "DC00.NA.contosohotels.com"
| where Namespace == "Processor" and Name == "UtilizationPercentage"
| summarize avg(Val) by Computer, bin(TimeGenerated, 1h)
| render timechart
15.3 - Summarize operator
15.3.1 - Kusto partition & compose intermediate aggregation results
Suppose you want to calculate the count of distinct users every day over the last seven days. You can run summarize dcount(user)
once a day with a span filtered to the last seven days. This method is inefficient, because each time the calculation is run, there’s a six-day overlap with the previous calculation. You can also calculate an aggregate for each day, and then combine these aggregates. This method requires you to “remember” the last six results, but it’s much more efficient.
Partitioning queries as described is easy for simple aggregates, such as count()
and sum()
. It can also be useful for complex aggregates, such as dcount()
and percentiles()
. This article explains how Kusto supports such calculations.
The following examples show how to use hll
/tdigest
and demonstrate that using these commands is highly performant in some scenarios:
range x from 1 to 1000000 step 1
| summarize hll(x,4)
| project sizeInMb = estimate_data_size(hll_x) / pow(1024,2)
Output
sizeInMb |
---|
1.0000524520874 |
Ingesting this object into a table before applying this kind of policy will ingest null:
.set-or-append MyTable <| range x from 1 to 1000000 step 1
| summarize hll(x,4)
MyTable
| project isempty(hll_x)
Output
Column1 |
---|
1 |
To avoid ingesting null, use the special encoding policy type bigobject
, which overrides the MaxValueSize
to 2 MB like this:
.alter column MyTable.hll_x policy encoding type='bigobject'
Ingesting a value now to the same table above:
.set-or-append MyTable <| range x from 1 to 1000000 step 1
| summarize hll(x,4)
ingests the second value successfully:
MyTable
| project isempty(hll_x)
Output
Column1 |
---|
1 |
0 |
Example: Count with binned timestamp
There’s a table, PageViewsHllTDigest
, containing hll
values of Pages viewed in each hour. You want these values binned to 12h
. Merge the hll
values using the hll_merge()
aggregate function, with the timestamp binned to 12h
. Use the function dcount_hll
to return the final dcount
value:
PageViewsHllTDigest
| summarize merged_hll = hll_merge(hllPage) by bin(Timestamp, 12h)
| project Timestamp , dcount_hll(merged_hll)
Output
Timestamp | dcount_hll_merged_hll |
---|---|
2016-05-01 12:00:00.0000000 | 20056275 |
2016-05-02 00:00:00.0000000 | 38797623 |
2016-05-02 12:00:00.0000000 | 39316056 |
2016-05-03 00:00:00.0000000 | 13685621 |
To bin timestamp for 1d
:
PageViewsHllTDigest
| summarize merged_hll = hll_merge(hllPage) by bin(Timestamp, 1d)
| project Timestamp , dcount_hll(merged_hll)
Output
Timestamp | dcount_hll_merged_hll |
---|---|
2016-05-01 00:00:00.0000000 | 20056275 |
2016-05-02 00:00:00.0000000 | 64135183 |
2016-05-03 00:00:00.0000000 | 13685621 |
The same query may be done over the values of tdigest
, which represent the BytesDelivered
in each hour:
PageViewsHllTDigest
| summarize merged_tdigests = merge_tdigest(tdigestBytesDel) by bin(Timestamp, 12h)
| project Timestamp , percentile_tdigest(merged_tdigests, 95, typeof(long))
Output
Timestamp | percentile_tdigest_merged_tdigests |
---|---|
2016-05-01 12:00:00.0000000 | 170200 |
2016-05-02 00:00:00.0000000 | 152975 |
2016-05-02 12:00:00.0000000 | 181315 |
2016-05-03 00:00:00.0000000 | 146817 |
Example: Temporary table
Kusto limits are reached with datasets that are too large, where you need to run periodic queries over the dataset, but run the regular queries to calculate percentile()
or dcount()
over large datasets.
To solve this problem, newly added data may be added to a temp table as hll
or tdigest
values using hll()
when the required operation is dcount
or tdigest()
when the required operation is percentile using set/append
or update policy
. In this case, the intermediate results of dcount
or tdigest
are saved into another dataset, which should be smaller than the target large one.
To solve this problem, newly added data may be added to a temp table as hll
or tdigest
values using hll()
when the required operation is dcount
. In this case, the intermediate results of dcount
are saved into another dataset, which should be smaller than the target large one.
When you need to get the final results of these values, the queries may use hll
/tdigest
mergers: hll-merge()
/tdigest_merge()
. Then, after getting the merged values, percentile_tdigest()
/ dcount_hll()
may be invoked on these merged values to get the final result of dcount
or percentiles.
Assuming there’s a table, PageViews, into which data is ingested daily, every day on which you want to calculate the distinct count of pages viewed per minute later than date = datetime(2016-05-01 18:00:00.0000000).
Run the following query:
PageViews
| where Timestamp > datetime(2016-05-01 18:00:00.0000000)
| summarize percentile(BytesDelivered, 90), dcount(Page,2) by bin(Timestamp, 1d)
Output
Timestamp | percentile_BytesDelivered_90 | dcount_Page |
---|---|---|
2016-05-01 00:00:00.0000000 | 83634 | 20056275 |
2016-05-02 00:00:00.0000000 | 82770 | 64135183 |
2016-05-03 00:00:00.0000000 | 72920 | 13685621 |
This query aggregates all the values every time you run this query (for example, if you want to run it many times a day).
If you save the hll
and tdigest
values (which are the intermediate results of dcount
and percentile) into a temp table, PageViewsHllTDigest
, using an update policy or set/append commands, you may only merge the values and then use dcount_hll
/percentile_tdigest
using the following query:
PageViewsHllTDigest
| summarize percentile_tdigest(merge_tdigest(tdigestBytesDel), 90), dcount_hll(hll_merge(hllPage)) by bin(Timestamp, 1d)
Output
Timestamp | percentile_tdigest_merge_tdigests_tdigestBytesDel | dcount_hll_hll_merge_hllPage |
---|---|---|
2016-05-01 00:00:00.0000000 | 84224 | 20056275 |
2016-05-02 00:00:00.0000000 | 83486 | 64135183 |
2016-05-03 00:00:00.0000000 | 72247 | 13685621 |
This query should be more performant, as it runs over a smaller table. In this example, the first query runs over ~215M records, while the second one runs over just 32 records:
Example: Intermediate results
The Retention Query. Assume you have a table that summarizes when each Wikipedia page was viewed (sample size is 10M), and you want to find for each date1 date2 the percentage of pages reviewed in both date1 and date2 relative to the pages viewed on date1 (date1 < date2).
The trivial way uses join and summarize operators:
// Get the total pages viewed each day
let totalPagesPerDay = PageViewsSample
| summarize by Page, Day = startofday(Timestamp)
| summarize count() by Day;
// Join the table to itself to get a grid where
// each row shows foreach page1, in which two dates
// it was viewed.
// Then count the pages between each two dates to
// get how many pages were viewed between date1 and date2.
PageViewsSample
| summarize by Page, Day1 = startofday(Timestamp)
| join kind = inner
(
PageViewsSample
| summarize by Page, Day2 = startofday(Timestamp)
)
on Page
| where Day2 > Day1
| summarize count() by Day1, Day2
| join kind = inner
totalPagesPerDay
on $left.Day1 == $right.Day
| project Day1, Day2, Percentage = count_*100.0/count_1
Output
Day1 | Day2 | Percentage |
---|---|---|
2016-05-01 00:00:00.0000000 | 2016-05-02 00:00:00.0000000 | 34.0645725975255 |
2016-05-01 00:00:00.0000000 | 2016-05-03 00:00:00.0000000 | 16.618368960101 |
2016-05-02 00:00:00.0000000 | 2016-05-03 00:00:00.0000000 | 14.6291376489636 |
The above query took ~18 seconds.
When you use the hll()
, hll_merge()
, and dcount_hll()
functions, the equivalent query will end after ~1.3 seconds and show that the hll
functions speeds up the query above by ~14 times:
let Stats=PageViewsSample | summarize pagehll=hll(Page, 2) by day=startofday(Timestamp); // saving the hll values (intermediate results of the dcount values)
let day0=toscalar(Stats | summarize min(day)); // finding the min date over all dates.
let dayn=toscalar(Stats | summarize max(day)); // finding the max date over all dates.
let daycount=tolong((dayn-day0)/1d); // finding the range between max and min
Stats
| project idx=tolong((day-day0)/1d), day, pagehll
| mv-expand pidx=range(0, daycount) to typeof(long)
// Extend the column to get the dcount value from hll'ed values for each date (same as totalPagesPerDay from the above query)
| extend key1=iff(idx < pidx, idx, pidx), key2=iff(idx < pidx, pidx, idx), pages=dcount_hll(pagehll)
// For each two dates, merge the hll'ed values to get the total dcount over each two dates,
// This helps to get the pages viewed in both date1 and date2 (see the description below about the intersection_size)
| summarize (day1, pages1)=arg_min(day, pages), (day2, pages2)=arg_max(day, pages), union_size=dcount_hll(hll_merge(pagehll)) by key1, key2
| where day2 > day1
// To get pages viewed in date1 and also date2, look at the merged dcount of date1 and date2, subtract it from pages of date1 + pages on date2.
| project pages1, day1,day2, intersection_size=(pages1 + pages2 - union_size)
| project day1, day2, Percentage = intersection_size*100.0 / pages1
Output
day1 | day2 | Percentage |
---|---|---|
2016-05-01 00:00:00.0000000 | 2016-05-02 00:00:00.0000000 | 33.2298494510578 |
2016-05-01 00:00:00.0000000 | 2016-05-03 00:00:00.0000000 | 16.9773830213667 |
2016-05-02 00:00:00.0000000 | 2016-05-03 00:00:00.0000000 | 14.5160020350006 |
15.3.2 - summarize operator
Produces a table that aggregates the content of the input table.
Syntax
T | summarize
[ SummarizeParameters ]
[[Column =
] Aggregation [,
…]]
[by
[Column =
] GroupExpression [,
…]]
Parameters
Name | Type | Required | Description |
---|---|---|---|
Column | string | The name for the result column. Defaults to a name derived from the expression. | |
Aggregation | string | ✔️ | A call to an aggregation function such as count() or avg() , with column names as arguments. |
GroupExpression | scalar | ✔️ | A scalar expression that can reference the input data. The output will have as many records as there are distinct values of all the group expressions. |
SummarizeParameters | string | Zero or more space-separated parameters in the form of Name = Value that control the behavior. See supported parameters. |
Supported parameters
Name | Description |
---|---|
hint.num_partitions | Specifies the number of partitions used to share the query load on cluster nodes. See shuffle query |
hint.shufflekey=<key> | The shufflekey query shares the query load on cluster nodes, using a key to partition data. See shuffle query |
hint.strategy=shuffle | The shuffle strategy query shares the query load on cluster nodes, where each node will process one partition of the data. See shuffle query |
Returns
The input rows are arranged into groups having the same values of the by
expressions. Then the specified aggregation functions are computed over each group, producing a row for each group. The result contains the by
columns and also at least one column for each computed aggregate. (Some aggregation functions return multiple columns.)
The result has as many rows as there are distinct combinations of by
values
(which may be zero). If there are no group keys provided, the result has a single
record.
To summarize over ranges of numeric values, use bin()
to reduce ranges to discrete values.
Default values of aggregations
The following table summarizes the default values of aggregations:
Operator | Default value |
---|---|
count() , countif() , dcount() , dcountif() , count_distinct() , sum() , sumif() , variance() , varianceif() , stdev() , stdevif() | 0 |
make_bag() , make_bag_if() , make_list() , make_list_if() , make_set() , make_set_if() | empty dynamic array ([]) |
All others | null |
Examples
The example in this section shows how to use the syntax to help you get started.
Unique combination
The following query determines what unique combinations of State
and EventType
there are for storms that resulted in direct injury. There are no aggregation functions, just group-by keys. The output will just show the columns for those results.
StormEvents
| where InjuriesDirect > 0
| summarize by State, EventType
Output
The following table shows only the first 5 rows. To see the full output, run the query.
State | EventType |
---|---|
TEXAS | Thunderstorm Wind |
TEXAS | Flash Flood |
TEXAS | Winter Weather |
TEXAS | High Wind |
TEXAS | Flood |
… | … |
Minimum and maximum timestamp
Finds the minimum and maximum heavy rain storms in Hawaii. There’s no group-by clause, so there’s just one row in the output.
StormEvents
| where State == "HAWAII" and EventType == "Heavy Rain"
| project Duration = EndTime - StartTime
| summarize Min = min(Duration), Max = max(Duration)
Output
Min | Max |
---|---|
01:08:00 | 11:55:00 |
Distinct count
The following query calculates the number of unique storm event types for each state and sorts the results by the number of unique storm types:
StormEvents
| summarize TypesOfStorms=dcount(EventType) by State
| sort by TypesOfStorms
Output
The following table shows only the first 5 rows. To see the full output, run the query.
State | TypesOfStorms |
---|---|
TEXAS | 27 |
CALIFORNIA | 26 |
PENNSYLVANIA | 25 |
GEORGIA | 24 |
ILLINOIS | 23 |
… | … |
Histogram
The following example calculates a histogram storm event types that had storms lasting longer than 1 day. Because Duration
has many values, use bin()
to group its values into 1-day intervals.
StormEvents
| project EventType, Duration = EndTime - StartTime
| where Duration > 1d
| summarize EventCount=count() by EventType, Length=bin(Duration, 1d)
| sort by Length
Output
EventType | Length | EventCount |
---|---|---|
Drought | 30.00:00:00 | 1646 |
Wildfire | 30.00:00:00 | 11 |
Heat | 30.00:00:00 | 14 |
Flood | 30.00:00:00 | 20 |
Heavy Rain | 29.00:00:00 | 42 |
… | … | … |
Aggregates default values
When the input of summarize
operator has at least one empty group-by key, its result is empty, too.
When the input of summarize
operator doesn’t have an empty group-by key, the result is the default values of the aggregates used in the summarize
For more information, see Default values of aggregations.
datatable(x:long)[]
| summarize any_x=take_any(x), arg_max_x=arg_max(x, *), arg_min_x=arg_min(x, *), avg(x), buildschema(todynamic(tostring(x))), max(x), min(x), percentile(x, 55), hll(x) ,stdev(x), sum(x), sumif(x, x > 0), tdigest(x), variance(x)
Output
any_x | arg_max_x | arg_min_x | avg_x | schema_x | max_x | min_x | percentile_x_55 | hll_x | stdev_x | sum_x | sumif_x | tdigest_x | variance_x |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NaN | 0 | 0 | 0 | 0 |
The result of avg_x(x)
is NaN
due to dividing by 0.
datatable(x:long)[]
| summarize count(x), countif(x > 0) , dcount(x), dcountif(x, x > 0)
Output
count_x | countif_ | dcount_x | dcountif_x |
---|---|---|---|
0 | 0 | 0 | 0 |
datatable(x:long)[]
| summarize make_set(x), make_list(x)
Output
set_x | list_x |
---|---|
[] | [] |
The aggregate avg sums all the non-nulls and counts only those which participated in the calculation (won’t take nulls into account).
range x from 1 to 4 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize sum(y), avg(y)
Output
sum_y | avg_y |
---|---|
15 | 5 |
The regular count will count nulls:
range x from 1 to 2 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize count(y)
Output
count_y |
---|
2 |
range x from 1 to 2 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize make_set(y), make_set(y)
Output
set_y | set_y1 |
---|---|
[5.0] | [5.0] |
15.4 - as operator
Binds a name to the operator’s input tabular expression. This operator allows the query to reference the value of the tabular expression multiple times without breaking the query and binding a name through the let statement.
To optimize multiple uses of the as
operator within a single query, see Named expressions.
Syntax
T |
as
[hint.materialized
=
Materialized] Name
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular expression to rename. |
Name | string | ✔️ | The temporary name for the tabular expression. |
hint.materialized | bool | If Materialized is set to true , the value of the tabular expression output is wrapped by a materialize() function call. Otherwise, the value is recalculated on every reference. |
Examples
In the following two examples, the generated TableName column consists of ‘T1’ and ‘T2’.
range x from 1 to 5 step 1
| as T1
| union withsource=TableName (range x from 1 to 5 step 1 | as T2)
Alternatively, you can write the same example as follows:
union withsource=TableName (range x from 1 to 5 step 1 | as T1), (range x from 1 to 5 step 1 | as T2)
Output
TableName | x |
---|---|
T1 | 1 |
T1 | 2 |
T1 | 3 |
T1 | 4 |
T1 | 5 |
T2 | 1 |
T2 | 2 |
T2 | 3 |
T2 | 4 |
T2 | 5 |
In the following example, the ’left side’ of the join is:
MyLogTable
filtered by type == "Event"
and Name == "Start"
and the ‘right side’ of the join is:
MyLogTable
filtered by type == "Event"
and Name == "Stop"
MyLogTable
| where type == "Event"
| as T
| where Name == "Start"
| join (
T
| where Name == "Stop"
) on ActivityId
15.5 - consume operator
Consumes the tabular data stream handed to the operator.
The consume
operator is mostly used for triggering the query side-effect without actually returning
the results back to the caller.
The consume
operator can be used for estimating the
cost of a query without actually delivering the results back to the client.
(The estimation isn’t exact for various reasons; for example, consume
is calculated distributively, so T | consume
won’t transmit the table’s
data between the nodes of the cluster.)
Syntax
consume
[decodeblocks
=
DecodeBlocks]
Parameters
Name | Type | Required | Description |
---|---|---|---|
DecodeBlocks | bool | If set to true , or if the request property perftrace is set to true , the consume operator won’t just enumerate the records at its input, but actually force each value in those records to be decompressed and decoded. |
15.6 - count operator
Returns the number of records in the input record set.
Syntax
T |
count
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be counted. |
Returns
This function returns a table with a single record and column of type
long
. The value of the only cell is the number of records in T.
Example
When you use the count operator with a table name, like StormEvents, it will return the total number of records in that table.
StormEvents | count
Output
Count |
---|
59066 |
Related content
For information about the count() aggregation function, see count() (aggregation function).
15.7 - datatable operator
Returns a table whose schema and values are defined in the query itself.
Syntax
datatable(
ColumnName :
ColumnType [,
…])
[
ScalarValue [,
…] ]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ColumnName | string | ✔️ | The name for a column. |
ColumnType | string | ✔️ | The type of data in the column. |
ScalarValue | scalar | ✔️ | The value to insert into the table. The total number of values must be a multiple of the number of columns in the table. Each value is assigned to a column based on its position. Specifically, the n’th value is assigned to the column at position n % NumColumns, where NumColumns is the total number of columns. |
Returns
This operator returns a data table of the given schema and data.
Example
This example creates a table with Date, Event, and MoreData columns, filters rows with Event descriptions longer than 4 characters, and adds a new column key2 to each row from the MoreData dynamic object.
datatable(Date:datetime, Event:string, MoreData:dynamic) [
datetime(1910-06-11), "Born", dynamic({"key1":"value1", "key2":"value2"}),
datetime(1930-01-01), "Enters Ecole Navale", dynamic({"key1":"value3", "key2":"value4"}),
datetime(1953-01-01), "Published first book", dynamic({"key1":"value5", "key2":"value6"}),
datetime(1997-06-25), "Died", dynamic({"key1":"value7", "key2":"value8"}),
]
| where strlen(Event) > 4
| extend key2 = MoreData.key2
Output
Date | Event | MoreData | key2 |
---|---|---|---|
1930-01-01 00:00:00.0000000 | Enters Ecole Navale | { “key1”: “value3”, “key2”: “value4” } | value4 |
1953-01-01 00:00:00.0000000 | Published first book | { “key1”: “value5”, “key2”: “value6” } | value6 |
15.8 - distinct operator
Produces a table with the distinct combination of the provided columns of the input table.
Syntax
T | distinct
ColumnName[,
ColumnName2, ...]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ColumnName | string | ✔️ | The column name to search for distinct values. |
Example
Shows distinct combination of states and type of events that led to over 45 direct injuries.
StormEvents
| where InjuriesDirect > 45
| distinct State, EventType
Output
State | EventType |
---|---|
TEXAS | Winter Weather |
KANSAS | Tornado |
MISSOURI | Excessive Heat |
OKLAHOMA | Thunderstorm Wind |
OKLAHOMA | Excessive Heat |
ALABAMA | Tornado |
ALABAMA | Heat |
TENNESSEE | Heat |
CALIFORNIA | Wildfire |
Related content
If the group by keys are of high cardinalities, try summarize by ...
with the shuffle strategy.
15.9 - evaluate plugin operator
Invokes a service-side query extension (plugin).
The evaluate
operator is a tabular operator that allows you to invoke query language extensions known as plugins. Unlike other language constructs, plugins can be enabled or disabled. Plugins aren’t “bound” by the relational nature of the language. In other words, they may not have a predefined, statically determined, output schema.
Syntax
[T |
] evaluate
[ evaluateParameters ] PluginName (
[ PluginArgs ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | A tabular input to the plugin. Some plugins don’t take any input and act as a tabular data source. | |
evaluateParameters | string | Zero or more space-separated evaluate parameters in the form of Name = Value that control the behavior of the evaluate operation and execution plan. Each plugin may decide differently how to handle each parameter. Refer to each plugin’s documentation for specific behavior. | |
PluginName | string | ✔️ | The mandatory name of the plugin being invoked. |
PluginArgs | string | Zero or more comma-separated arguments to provide to the plugin. |
Evaluate parameters
The following parameters are supported:
Name | Values | Description |
---|---|---|
hint.distribution | single , per_node , per_shard | Distribution hints |
hint.pass_filters | true , false | Allow evaluate operator to passthrough any matching filters before the plugin. Filter is considered as ‘matched’ if it refers to a column existing before the evaluate operator. Default: false |
hint.pass_filters_column | column_name | Allow plugin operator to passthrough filters referring to column_name before the plugin. Parameter can be used multiple times with different column names. |
Plugins
The following plugins are supported:
- autocluster plugin
- azure-digital-twins-query-request plugin
- bag-unpack plugin
- basket plugin
- cosmosdb-sql-request plugin
- dcount-intersect plugin
- diffpatterns plugin
- diffpatterns-text plugin
- infer-storage-schema plugin
- ipv4-lookup plugin
- ipv6-lookup plugin
- mysql-request-plugin
- narrow plugin
- pivot plugin
- preview plugin
- R plugin
- rolling-percentile plugin
- rows-near plugin
- schema-merge plugin
- sql-request plugin
- sequence-detect plugin
Distribution hints
Distribution hints specify how the plugin execution will be distributed across multiple cluster nodes. Each plugin may implement a different support for the distribution. The plugin’s documentation specifies the distribution options supported by the plugin.
Possible values:
single
: A single instance of the plugin will run over the entire query data.per_node
: If the query before the plugin call is distributed across nodes, then an instance of the plugin will run on each node over the data that it contains.per_shard
: If the data before the plugin call is distributed across shards, then an instance of the plugin will run over each shard of the data.
15.10 - extend operator
Creates calculated columns and append them to the result set.
Syntax
T | extend
[ColumnName | (
ColumnName[,
…])
=
] Expression [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Tabular input to extend. |
ColumnName | string | Name of the column to add or update. | |
Expression | string | ✔️ | Calculation to perform over the input. |
- If ColumnName is omitted, the output column name of Expression is automatically generated.
- If Expression returns more than one column, a list of column names can be specified in parentheses. Then, Expression’s output columns is given the specified names. If a list of the column names isn’t specified, all Expression’s output columns with generated names are added to the output.
Returns
A copy of the input tabular result set, such that:
- Column names noted by
extend
that already exist in the input are removed and appended as their new calculated values. - Column names noted by
extend
that don’t exist in the input are appended as their new calculated values.
not have an index. In most cases, if the new column is set to be exactly
the same as an existing table column that has an index, Kusto can automatically
use the existing index. However, in some complex scenarios this propagation is
not done. In such cases, if the goal is to rename a column, use the project-rename
operator instead.
Example
StormEvents
| project EndTime, StartTime
| extend Duration = EndTime - StartTime
The following table shows only the first 10 results. To see the full output, run the query.
EndTime | StartTime | Duration |
---|---|---|
2007-01-01T00:00:00Z | 2007-01-01T00:00:00Z | 00:00:00 |
2007-01-01T00:25:00Z | 2007-01-01T00:25:00Z | 00:00:00 |
2007-01-01T02:24:00Z | 2007-01-01T02:24:00Z | 00:00:00 |
2007-01-01T03:45:00Z | 2007-01-01T03:45:00Z | 00:00:00 |
2007-01-01T04:35:00Z | 2007-01-01T04:35:00Z | 00:00:00 |
2007-01-01T04:37:00Z | 2007-01-01T03:37:00Z | 01:00:00 |
2007-01-01T05:00:00Z | 2007-01-01T00:00:00Z | 05:00:00 |
2007-01-01T05:00:00Z | 2007-01-01T00:00:00Z | 05:00:00 |
2007-01-01T06:00:00Z | 2007-01-01T00:00:00Z | 06:00:00 |
2007-01-01T06:00:00Z | 2007-01-01T00:00:00Z | 06:00:00 |
Related content
- Use series_stats to return multiple columns
15.11 - externaldata operator
The externaldata
operator returns a table whose schema is defined in the query itself, and whose data is read from an external storage artifact, such as a blob in Azure Blob Storage or a file in Azure Data Lake Storage.
Syntax
externaldata
(
columnName:
columnType [,
…] )
[
storageConnectionString [,
…] ]
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
columnName, columnType | string | ✔️ | A list of column names and their types. This list defines the schema of the table. |
storageConnectionString | string | ✔️ | A storage connection string of the storage artifact to query. |
propertyName, propertyValue | string | A list of optional supported properties that determines how to interpret the data retrieved from storage. |
Supported properties
Property | Type | Description |
---|---|---|
format | string | The data format. If unspecified, an attempt is made to detect the data format from file extension. The default is CSV . All ingestion data formats are supported. |
ignoreFirstRecord | bool | If set to true , the first record in every file is ignored. This property is useful when querying CSV files with headers. |
ingestionMapping | string | Indicates how to map data from the source file to the actual columns in the operator result set. See data mappings. |
Returns
The externaldata
operator returns a data table of the given schema whose data was parsed from the specified storage artifact, indicated by the storage connection string.
Examples
The examples query data in an external storage file.
Fetch a list of user IDs stored in Azure Blob Storage
The following example shows how to find all records in a table whose UserID
column falls into a known set of IDs, held (one per line) in an external storage file. Since the data format isn’t specified, the detected data format is TXT
.
Users
| where UserID in ((externaldata (UserID:string) [
@"https://storageaccount.blob.core.windows.net/storagecontainer/users.txt"
h@"?...SAS..." // Secret token needed to access the blob
]))
| ...
Query multiple data files
The following example queries multiple data files stored in external storage.
externaldata(Timestamp:datetime, ProductId:string, ProductDescription:string)
[
h@"https://mycompanystorage.blob.core.windows.net/archivedproducts/2019/01/01/part-00000-7e967c99-cf2b-4dbb-8c53-ce388389470d.csv.gz?...SAS...",
h@"https://mycompanystorage.blob.core.windows.net/archivedproducts/2019/01/02/part-00000-ba356fa4-f85f-430a-8b5a-afd64f128ca4.csv.gz?...SAS...",
h@"https://mycompanystorage.blob.core.windows.net/archivedproducts/2019/01/03/part-00000-acb644dc-2fc6-467c-ab80-d1590b23fc31.csv.gz?...SAS..."
]
with(format="csv")
| summarize count() by ProductId
The above example can be thought of as a quick way to query multiple data files without defining an external table.
Query hierarchical data formats
To query hierarchical data format, such as JSON
, Parquet
, Avro
, or ORC
, ingestionMapping
must be specified in the operator properties.
In this example, there’s a JSON file stored in Azure Blob Storage with the following contents:
{
"timestamp": "2019-01-01 10:00:00.238521",
"data": {
"tenant": "e1ef54a6-c6f2-4389-836e-d289b37bcfe0",
"method": "RefreshTableMetadata"
}
}
{
"timestamp": "2019-01-01 10:00:01.845423",
"data": {
"tenant": "9b49d0d7-b3e6-4467-bb35-fa420a25d324",
"method": "GetFileList"
}
}
...
To query this file using the externaldata
operator, a data mapping must be specified. The mapping dictates how to map JSON fields to the operator result set columns:
externaldata(Timestamp: datetime, TenantId: guid, MethodName: string)
[
h@'https://mycompanystorage.blob.core.windows.net/events/2020/09/01/part-0000046c049c1-86e2-4e74-8583-506bda10cca8.json?...SAS...'
]
with(format='multijson', ingestionMapping='[{"Column":"Timestamp","Properties":{"Path":"$.timestamp"}},{"Column":"TenantId","Properties":{"Path":"$.data.tenant"}},{"Column":"MethodName","Properties":{"Path":"$.data.method"}}]')
The MultiJSON
format is used here because single JSON records are spanned into multiple lines.
Related content
For more info on mapping syntax, see data mappings.
15.12 - facet operator
Returns a set of tables, one for each column specified in the facet clause.
Each table contains the list of values taken by its column.
An additional table can be created by using the with
clause. Facet result tables can’t be renamed or referenced by any additional operators.
Syntax
T | facet by
ColumnName [,
ColumnName2,
…] [with (
filterPipe )
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ColumnName | string | ✔️ | The column name, or list of column names, to be summarized. |
filterPipe | string | A query expression applied to the input table. |
Returns
Multiple tables: one for the with
clause, and one for each column.
Example
StormEvents
| where State startswith "A" and EventType has "Heavy"
| facet by State, EventType
with
(
where StartTime between(datetime(2007-01-04) .. 7d)
| project State, StartTime, Source, EpisodeId, EventType
| take 5
)
The following is the table generated by the with
clause.
State | StartTime | Source | EpisodeId | EventType |
---|---|---|---|---|
ALASKA | 2007-01-04 12:00:00.0000000 | COOP Observer | 2192 | Heavy Snow |
ALASKA | 2007-01-04 15:00:00.0000000 | Trained Spotter | 2192 | Heavy Snow |
ALASKA | 2007-01-04 15:00:00.0000000 | Trained Spotter | 2192 | Heavy Snow |
ALASKA | 2007-01-04 15:00:00.0000000 | Trained Spotter | 2192 | Heavy Snow |
ALASKA | 2007-01-06 18:00:00.0000000 | COOP Observer | 2193 | Heavy Snow |
The following table is the State
facet output table.
State | count_State |
---|---|
ALABAMA | 19 |
ARIZONA | 33 |
ARKANSAS | 1 |
AMERICAN SAMOA | 1 |
ALASKA | 58 |
The following table is the EventType
facet output table.
EventType | count_EventType |
---|---|
Heavy Rain | 34 |
Heavy Snow | 78 |
15.13 - find operator
Finds rows that match a predicate across a set of tables.
The scope of the find
operator can also be cross-database or cross-cluster.
find in (Table1, Table2, Table3) where Fruit=="apple"
find in (database('*').*) where Fruit == "apple"
find in (cluster('cluster_name').database('MyDB*').*) where Fruit == "apple"
find in (Table1, Table2, Table3) where Fruit=="apple"
Syntax
find
[withsource
= ColumnName] [in
(
Tables)
]where
Predicate [project-smart
|project
ColumnName[:
ColumnType,
… ] [,
pack_all()
]]find
Predicate [project-smart
|project
ColumnName[:
ColumnType,
… ] [,
pack_all()
]]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ColumnName | string | By default, the output includes a column called source_ whose values indicate which source table contributed to each row. If specified, ColumnName is used instead of source_. After wildcard matching, if the query references tables from more than one database including the default database, the value of this column has a table name qualified with the database. Similarly cluster and database qualifications are present in the value if more than one cluster is referenced. | |
Predicate | bool | ✔️ | This boolean expression is evaluated for each row in each input table. For more information, see predicate-syntax details. |
Tables | string | Zero or more comma-separated table references. By default, find looks in all the tables in the current database. You can use:1. The name of a table, such as Events 2. A query expression, such as `(Events | |
project-smart or project | string | If not specified, project-smart is used by default. For more information, see output-schema details. |
withsource=
ColumnName: Optional. By default, the output includes a column called source_ whose values indicate which source table contributed each row. If specified, ColumnName is used instead of source_.Predicate: A
boolean
expression over the columns of the input tables Table [,
Table, …]. It’s evaluated for each row in each input table. For more information, see predicate-syntax details.Tables: Optional. Zero or more comma-separated table references. By default find searches all tables for:
- The name of a table, such as
Events
- A query expression, such as
(Events | where id==42)
- A set of tables specified with a wildcard. For example,
E*
would form the union of all the tables whose names begin withE
.
- The name of a table, such as
project-smart
|project
: If not specifiedproject-smart
is used by default. For more information, see output-schema details.
Returns
Transformation of rows in Table [,
Table, …] for which Predicate is true
. The rows are transformed according to the output schema.
Output schema
source_ column
The find
operator output always includes a source_ column with the source table name. The column can be renamed using the withsource
parameter.
results columns
Source tables that don’t contain any column used by the predicate evaluation, are filtered out.
When you use project-smart
, the columns that appear in the output are:
- Columns that appear explicitly in the predicate.
- Columns that are common to all the filtered tables.
The rest of the columns are packed into a property bag and appear in an extra pack
column.
A column that is referenced explicitly by the predicate and appears in multiple tables with multiple types, has a different column in the result schema for each such type. Each of the column names is constructed from the original column name and type, separated by an underscore.
When using project
ColumnName[:
ColumnType ,
… ] [,
pack_all()
]:
- The result table includes the columns specified in the list. If a source table doesn’t contain a certain column, the values in the corresponding rows are null.
- When you specify a ColumnType with a ColumnName, this column in the “result” has the given type, and the values are cast to that type if needed. The casting doesn’t have an effect on the column type when evaluating the Predicate.
- When
pack_all()
is used, all the columns, including the projected columns, are packed into a property bag and appear in an extra column, by default ‘column1’. In the property bag, the source column name serves as the property name and the column’s value serves as the property value.
Predicate syntax
The find
operator supports an alternative syntax for the * has
term, and using just term, searches a term across all input columns.
For a summary of some filtering functions, see where operator.
Considerations
- If the
project
clause references a column that appears in multiple tables and has multiple types, a type must follow this column reference in the project clause - If a column appears in multiple tables and has multiple types and
project-smart
is in use, there’s a corresponding column for each type in thefind
’s result, as described in union - When you use project-smart, changes in the predicate, in the source tables set, or in the tables schema, might result in a change to the output schema. If a constant result schema is needed, use project instead
find
scope can’t include functions. To include a function in thefind
scope, define a let statement with view keyword.
Performance tips
- Use tables as opposed to tabular expressions.
If tabular expression, the find operator falls back to a
union
query that can result in degraded performance. - If a column that appears in multiple tables and has multiple types, is part of the project clause, prefer adding a ColumnType to the project clause over modifying the table before passing it to
find
. - Add time-based filters to the predicate. Use a datetime column value or ingestion_time().
- Search in specific columns rather than a full text search.
- It’s better not to reference columns that appear in multiple tables and have multiple types. If the predicate is valid when resolving such columns type for more than one type, the query falls back to union.
For example, see examples of cases where
find
acts as a union.
Examples
Term lookup across all tables
The query finds all rows from all tables in the current database in which any column includes the word Hernandez
. The resulting records are transformed according to the output schema. The output includes rows from the Customers
table and the SalesTable
table of the ContosoSales
database.
find "Hernandez"
Output
This table shows the first three rows of the output.
source_ | pack_ |
---|---|
Customers | {“CityName”:“Ballard”,“CompanyName”:“NULL”,“ContinentName”:“North America”,“CustomerKey”:5023,“Education”:“Partial High School”,“FirstName”:“Devin”,“Gender”:“M”,“LastName”:“Hernandez”,“MaritalStatus”:“S”,“Occupation”:“Clerical”,“RegionCountryName”:“United States”,“StateProvinceName”:“Washington”} |
Customers | {“CityName”:“Ballard”,“CompanyName”:“NULL”,“ContinentName”:“North America”,“CustomerKey”:7814,“Education”:“Partial College”,“FirstName”:“Kristy”,“Gender”:“F”,“LastName”:“Hernandez”,“MaritalStatus”:“S”,“Occupation”:“Professional”,“RegionCountryName”:“United States”,“StateProvinceName”:“Washington”} |
Customers | {“CityName”:“Ballard”,“CompanyName”:“NULL”,“ContinentName”:“North America”,“CustomerKey”:7888,“Education”:“Partial High School”,“FirstName”:“Kari”,“Gender”:“F”,“LastName”:“Hernandez”,“MaritalStatus”:“S”,“Occupation”:“Clerical”,“RegionCountryName”:“United States”,“StateProvinceName”:“Washington”} |
… | … |
Term lookup across all tables matching a name pattern
The query finds all rows from all tables in the current database whose name starts with C
, and in which any column includes the word Hernandez
. The resulting records are transformed according to the output schema. Now, the output only contains records from the Customers
table.
find in (C*) where * has "Hernandez"
Output
This table shows the first three rows of the output.
source_ | pack_ |
---|---|
ConferenceSessions | {“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”} |
ConferenceSessions | {“conference”:“Ignite 2018”,“sessionid”:“THR3115”,“session_title”:“Azure Log Analytics: Deep dive into the Azure Kusto query language. “,“session_type”:“Theater”,“owner”:“Jean Francois Berenguer”,“participants”:””,“URL”:“https://myignite.techcommunity.microsoft.com/sessions/66329","level":300,"session_location":"","starttime":null,"duration":null,"time_and_duration":"","kusto_affinity":"Focused"} |
ConferenceSessions | {“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”} |
… | … |
Term lookup across the cluster
The query finds all rows from all tables in all databases in the cluster in which any column includes the word Kusto
.
This query is a cross-database query.
The resulting records are transformed according to the output schema.
find in (database('*').*) where * has "Kusto"
Output
This table shows the first three rows of the output.
source_ | pack_ |
---|---|
database(“Samples”).ConferenceSessions | {“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”} |
database(“Samples”).ConferenceSessions | {“conference”:“Ignite 2018”,“sessionid”:“THR3115”,“session_title”:“Azure Log Analytics: Deep dive into the Azure Kusto query language. “,“session_type”:“Theater”,“owner”:“Jean Francois Berenguer”,“participants”:””,“URL”:“https://myignite.techcommunity.microsoft.com/sessions/66329","level":300,"session_location":"","starttime":null,"duration":null,"time_and_duration":"","kusto_affinity":"Focused"} |
database(“Samples”).ConferenceSessions | {“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”} |
… | … |
Term lookup matching a name pattern in the cluster
The query finds all rows from all tables whose name starts with K
in all databases whose name start with B
and in which any column includes the word Kusto
.
The resulting records are transformed according to the output schema.
find in (database("S*").C*) where * has "Kusto"
Output
This table shows the first three rows of the output.
source_ | pack_ |
---|---|
ConferenceSessions | {“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”} |
ConferenceSessions | {“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”} |
ConferenceSessions | {“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”} |
… | … |
Term lookup in several clusters
The query finds all rows from all tables whose name starts with K
in all databases whose name start with B
and in which any column includes the word Kusto
.
The resulting records are transformed according to the output schema.
find in (cluster("cluster1").database("B*").K*, cluster("cluster2").database("C*".*))
where * has "Kusto"
Term lookup across all tables
The query finds all rows from all tables in which any column includes the word Kusto
.
The resulting records are transformed according to the output schema.
find "Kusto"
Examples of find
output results
The following examples show how find
can be used over two tables: EventsTable1 and EventsTable2.
Assume we have the next content of these two tables:
EventsTable1
Session_Id | Level | EventText | Version |
---|---|---|---|
acbd207d-51aa-4df7-bfa7-be70eb68f04e | Information | Some Text1 | v1.0.0 |
acbd207d-51aa-4df7-bfa7-be70eb68f04e | Error | Some Text2 | v1.0.0 |
28b8e46e-3c31-43cf-83cb-48921c3986fc | Error | Some Text3 | v1.0.1 |
8f057b11-3281-45c3-a856-05ebb18a3c59 | Information | Some Text4 | v1.1.0 |
EventsTable2
Session_Id | Level | EventText | EventName |
---|---|---|---|
f7d5f95f-f580-4ea6-830b-5776c8d64fdd | Information | Some Other Text1 | Event1 |
acbd207d-51aa-4df7-bfa7-be70eb68f04e | Information | Some Other Text2 | Event2 |
acbd207d-51aa-4df7-bfa7-be70eb68f04e | Error | Some Other Text3 | Event3 |
15eaeab5-8576-4b58-8fc6-478f75d8fee4 | Error | Some Other Text4 | Event4 |
Search in common columns, project common, and uncommon columns, and pack the rest
The query searches for specific records in EventsTable1 and EventsTable2 based on a given Session_Id and an Error Level. It then projects three specific columns: EventText, Version, and EventName, and packs all other remaining columns into a dynamic object.
find in (EventsTable1, EventsTable2)
where Session_Id == 'acbd207d-51aa-4df7-bfa7-be70eb68f04e' and Level == 'Error'
project EventText, Version, EventName, pack_all()
Output
source_ | EventText | Version | EventName | pack_ |
---|---|---|---|---|
EventsTable1 | Some Text2 | v1.0.0 | {“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Error”} | |
EventsTable2 | Some Other Text3 | Event3 | {“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Error”} |
Search in common and uncommon columns
The query searches for records that either have Version as ‘v1.0.0’ or EventName as ‘Event1’, and then it projects (selects) four specific columns: Session_Id, EventText, Version, and EventName from those filtered results.
find Version == 'v1.0.0' or EventName == 'Event1' project Session_Id, EventText, Version, EventName
Output
source_ | Session_Id | EventText | Version | EventName |
---|---|---|---|---|
EventsTable1 | acbd207d-51aa-4df7-bfa7-be70eb68f04e | Some Text1 | v1.0.0 | |
EventsTable1 | acbd207d-51aa-4df7-bfa7-be70eb68f04e | Some Text2 | v1.0.0 | |
EventsTable2 | f7d5f95f-f580-4ea6-830b-5776c8d64fdd | Some Other Text1 | Event1 |
Use abbreviated notation to search across all tables in the current database
This query searches the database for any records with a Session_Id that matches ‘acbd207d-51aa-4df7-bfa7-be70eb68f04e’. It retrieves records from all tables and columns that contain this specific Session_Id.
find Session_Id == 'acbd207d-51aa-4df7-bfa7-be70eb68f04e'
Output
source_ | Session_Id | Level | EventText | pack_ |
---|---|---|---|---|
EventsTable1 | acbd207d-51aa-4df7-bfa7-be70eb68f04e | Information | Some Text1 | {“Version”:“v1.0.0”} |
EventsTable1 | acbd207d-51aa-4df7-bfa7-be70eb68f04e | Error | Some Text2 | {“Version”:“v1.0.0”} |
EventsTable2 | acbd207d-51aa-4df7-bfa7-be70eb68f04e | Information | Some Other Text2 | {“EventName”:“Event2”} |
EventsTable2 | acbd207d-51aa-4df7-bfa7-be70eb68f04e | Error | Some Other Text3 | {“EventName”:“Event3”} |
Return the results from each row as a property bag
This query searches the database for records with the specified Session_Id and returns all columns of those records as a single dynamic object.
find Session_Id == 'acbd207d-51aa-4df7-bfa7-be70eb68f04e' project pack_all()
Output
source_ | pack_ |
---|---|
EventsTable1 | {“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Information”, “EventText”:“Some Text1”, “Version”:“v1.0.0”} |
EventsTable1 | {“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Error”, “EventText”:“Some Text2”, “Version”:“v1.0.0”} |
EventsTable2 | {“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Information”, “EventText”:“Some Other Text2”, “EventName”:“Event2”} |
EventsTable2 | {“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Error”, “EventText”:“Some Other Text3”, “EventName”:“Event3”} |
Examples of cases where find
acts as union
The find
operator in Kusto can sometimes act like a union
operator, mainly when it’s used to search across multiple tables.
Using a nontabular expression as find operand
The query first creates a view that filters EventsTable1 to only include error-level records. Then, it searches within this filtered view and the EventsTable2 table for records with a specific Session_Id.
let PartialEventsTable1 = view() { EventsTable1 | where Level == 'Error' };
find in (PartialEventsTable1, EventsTable2)
where Session_Id == 'acbd207d-51aa-4df7-bfa7-be70eb68f04e'
Referencing a column that appears in multiple tables and has multiple types
For this example, create two tables by running:
.create tables
Table1 (Level:string, Timestamp:datetime, ProcessId:string),
Table2 (Level:string, Timestamp:datetime, ProcessId:int64)
- The following query is executed as
union
.
find in (Table1, Table2) where ProcessId == 1001
The output result schema is (Level:string, Timestamp, ProcessId_string, ProcessId_int).
- The following query is executed as
union
, but produces a different result schema.
find in (Table1, Table2) where ProcessId == 1001 project Level, Timestamp, ProcessId:string
The output result schema is (Level:string, Timestamp, ProcessId_string)
15.14 - fork operator
Runs multiple consumer operators in parallel.
Syntax
T |
fork
[name=
](
subquery)
[name=
](
subquery)
…
Parameters
Name | Type | Required | Description |
---|---|---|---|
subquery | string | ✔️ | A downstream pipeline of supported query operators. |
name | string | A temporary name for the subquery result table. |
Supported query operators
as
count
extend
parse
where
take
project
project-away
project-keep
project-rename
project-reorder
summarize
top
top-nested
sort
mv-expand
reduce
Returns
Multiple result tables, one for each of the subquery arguments.
Tips
Use
materialize
as a replacement forjoin
orunion
on fork legs. The input stream is cached by materialize and then the cached expression can be used in join/union legs.Use batch with
materialize
of tabular expression statements instead of thefork
operator.
Examples
The examples output multiple tables, with named and umnamed columns.
Unnamed subqueries
StormEvents
| where State == "FLORIDA"
| fork
( where DeathsDirect + DeathsIndirect > 1)
( where InjuriesDirect + InjuriesIndirect > 1)
Output
This output shows the first few rows and columns of the result table.
GenericResult
StartTime | EndTime | EpisodeId | EventId | State | EventType | InjuriesDirect | InjuriesIndirect |
---|---|---|---|---|---|---|---|
2007-02-02T03:17:00Z | 2007-02-02T03:25:00Z | 3464 | 18948 | FLORIDA | Tornado | 10 | 0 |
2007-02-02T03:37:00Z | 2007-02-02T03:55:00Z | 3464 | 18950 | FLORIDA | Tornado | 9 | 0 |
2007-03-13T08:20:00Z | 2007-03-13T08:20:00Z | 4094 | 22961 | FLORIDA | Dense Fog | 3 | 0 |
2007-09-11T15:26:00Z | 2007-09-11T15:26:00Z | 9578 | 53798 | FLORIDA | Rip Current | 0 | 0 |
GenericResult
StartTime | EndTime | EpisodeId | EventId | State | EventType | InjuriesDirect | InjuriesIndirect |
---|---|---|---|---|---|---|---|
2007-02-02T03:10:00Z | 2007-02-02T03:16:00Z | 2545 | 17515 | FLORIDA | Tornado | 15 | 0 |
2007-02-02T03:17:00Z | 2007-02-02T03:25:00Z | 3464 | 18948 | FLORIDA | Tornado | 10 | 0 |
2007-02-02T03:37:00Z | 2007-02-02T03:55:00Z | 3464 | 18950 | FLORIDA | Tornado | 9 | 0 |
2007-02-02T03:55:00Z | 2007-02-02T04:10:00Z | 3464 | 20318 | FLORIDA | Tornado | 42 | 0 |
Named subqueries
In the following examples, the result table is named “StormsWithDeaths” and “StormsWithInjuries”.
StormEvents
| where State == "FLORIDA"
| fork
(where DeathsDirect + DeathsIndirect > 1 | as StormsWithDeaths)
(where InjuriesDirect + InjuriesIndirect > 1 | as StormsWithInjuries)
StormEvents
| where State == "FLORIDA"
| fork
StormsWithDeaths = (where DeathsDirect + DeathsIndirect > 1)
StormsWithInjuries = (where InjuriesDirect + InjuriesIndirect > 1)
Output
This output shows the first few rows and columns of the result table.
StormsWithDeaths
StartTime | EndTime | EpisodeId | EventId | State | EventType | InjuriesDirect | InjuriesIndirect |
---|---|---|---|---|---|---|---|
2007-02-02T03:17:00Z | 2007-02-02T03:25:00Z | 3464 | 18948 | FLORIDA | Tornado | 10 | 0 |
2007-02-02T03:37:00Z | 2007-02-02T03:55:00Z | 3464 | 18950 | FLORIDA | Tornado | 9 | 0 |
2007-03-13T08:20:00Z | 2007-03-13T08:20:00Z | 4094 | 22961 | FLORIDA | Dense Fog | 3 | 0 |
2007-09-11T15:26:00Z | 2007-09-11T15:26:00Z | 9578 | 53798 | FLORIDA | Rip Current | 0 | 0 |
StormsWithInjuries
StartTime | EndTime | EpisodeId | EventId | State | EventType | InjuriesDirect | InjuriesIndirect |
---|---|---|---|---|---|---|---|
2007-02-02T03:10:00Z | 2007-02-02T03:16:00Z | 2545 | 17515 | FLORIDA | Tornado | 15 | 0 |
2007-02-02T03:17:00Z | 2007-02-02T03:25:00Z | 3464 | 18948 | FLORIDA | Tornado | 10 | 0 |
2007-02-02T03:37:00Z | 2007-02-02T03:55:00Z | 3464 | 18950 | FLORIDA | Tornado | 9 | 0 |
2007-02-02T03:55:00Z | 2007-02-02T04:10:00Z | 3464 | 20318 | FLORIDA | Tornado | 42 | 0 |
SamplePowerRequirementHistorizedData
| fork
Dataset2 = (where twinId <> "p_sol_01" | summarize count() by twinId, name)
Dataset3 = (summarize count() by WeekOfYear = week_of_year(timestamp))
It is possible to use almost all the known features of the KQL language inside every single “sub” result set. For instance, the join operator inside a sub-statement does not work. This is not allowed by the engine.
15.15 - getschema operator
Produce a table that represents a tabular schema of the input.
Syntax
T |
getschema
Example
StormEvents
| getschema
Output
ColumnName | ColumnOrdinal | DataType | ColumnType |
---|---|---|---|
StartTime | 0 | System.DateTime | datetime |
EndTime | 1 | System.DateTime | datetime |
EpisodeId | 2 | System.Int32 | int |
EventId | 3 | System.Int32 | int |
State | 4 | System.String | string |
EventType | 5 | System.String | string |
InjuriesDirect | 6 | System.Int32 | int |
InjuriesIndirect | 7 | System.Int32 | int |
DeathsDirect | 8 | System.Int32 | int |
DeathsIndirect | 9 | System.Int32 | int |
DamageProperty | 10 | System.Int32 | int |
DamageCrops | 11 | System.Int32 | int |
Source | 12 | System.String | string |
BeginLocation | 13 | System.String | string |
EndLocation | 14 | System.String | string |
BeginLat | 15 | System.Double | real |
BeginLon | 16 | System.Double | real |
EndLat | 17 | System.Double | real |
EndLon | 18 | System.Double | real |
EpisodeNarrative | 19 | System.String | string |
EventNarrative | 20 | System.String | string |
StormSummary | 21 | System.Object | dynamic |
15.16 - invoke operator
invoke
as a tabular parameter argumentInvokes a lambda expression that receives the source of invoke
as a tabular argument.
Syntax
T | invoke
function(
[param1,
param2])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular source. |
function | string | ✔️ | The name of the lambda let expression or stored function name to be evaluated. |
param1, param2 … | string | Any additional lambda arguments to pass to the function. |
Returns
Returns the result of the evaluated expression.
Example
This example shows how to use the invoke
operator to call lambda let
expression:
// clipped_average(): calculates percentiles limits, and then makes another
// pass over the data to calculate average with values inside the percentiles
let clipped_average = (T:(x: long), lowPercentile:double, upPercentile:double)
{
let high = toscalar(T | summarize percentiles(x, upPercentile));
let low = toscalar(T | summarize percentiles(x, lowPercentile));
T
| where x > low and x < high
| summarize avg(x)
};
range x from 1 to 100 step 1
| invoke clipped_average(5, 99)
Output
avg_x |
---|
52 |
15.17 - lookup operator
Extends the columns of a fact table with values looked-up in a dimension table.
For example, the following query results in a table that extends the FactTable
($left
) with data from the DimensionTable
($right
) by performing a lookup. The lookup matches each pair (CommonColumn
, Col1
) from FactTable
with each pair (CommonColumn
, Col2
) in the DimensionTable
. For the differences between fact and dimension tables, see fact and dimension tables.
FactTable | lookup kind=leftouter (DimensionTable) on CommonColumn, $left.Col1 == $right.Col2
The lookup
operator performs an operation similar to the join operator
with the following differences:
- The result doesn’t repeat columns from the
$right
table that are the basis for the join operation. - Only two kinds of lookup are supported,
leftouter
andinner
, withleftouter
being the default. - In terms of performance, the system by default assumes that the
$left
table is the larger (facts) table, and the$right
table is the smaller (dimensions) table. This is exactly opposite to the assumption used by thejoin
operator. - The
lookup
operator automatically broadcasts the$right
table to the$left
table (essentially, behaves as ifhint.broadcast
was specified). This limits the size of the$right
table.
Syntax
LeftTable |
lookup
[kind
=
(leftouter
|inner
)] (
RightTable)
on
Attributes
Parameters
Name | Type | Required | Description |
---|---|---|---|
LeftTable | string | ✔️ | The table or tabular expression that is the basis for the lookup. Denoted as $left . |
RightTable | string | ✔️ | The table or tabular expression that is used to “populate” new columns in the fact table. Denoted as $right . |
Attributes | string | ✔️ | A comma-delimited list of one or more rules that describe how rows from LeftTable are matched to rows from RightTable. Multiple rules are evaluated using the and logical operator. See Rules. |
kind | string | Determines how to treat rows in LeftTable that have no match in RightTable. By default, leftouter is used, which means all those rows appear in the output with null values used for the missing values of RightTable columns added by the operator. If inner is used, such rows are omitted from the output. Other kinds of join aren’t supported by the lookup operator. |
Rules
Rule kind | Syntax | Predicate |
---|---|---|
Equality by name | ColumnName | where LeftTable.ColumnName == RightTable.ColumnName |
Equality by value | $left. LeftColumn == $right. RightColumn | where $left. LeftColumn == $right. *RightColumn |
Returns
A table with:
- A column for every column in each of the two tables, including the matching keys. The columns of the right side are automatically renamed if there are name conflicts.
- A row for every match between the input tables. A match is a row selected from one table that has the same value for all the
on
fields as a row in the other table. - The Attributes (lookup keys) appear only once in the output table.
- If
kind
is unspecified orkind=leftouter
, then in addition to the inner matches, there’s a row for every row on the left (and/or right), even if it has no match. In that case, the unmatched output cells contain nulls. - If
kind=inner
, then there’s a row in the output for every combination of matching rows from left and right.
Example
The following example shows how to perform a left outer join between the FactTable
and DimTable
, based on matching values in the Personal
and Family
columns.
let FactTable=datatable(Row:string,Personal:string,Family:string) [
"1", "Rowan", "Murphy",
"2", "Ellis", "Turner",
"3", "Ellis", "Turner",
"4", "Maya", "Robinson",
"5", "Quinn", "Campbell"
];
let DimTable=datatable(Personal:string,Family:string,Alias:string) [
"Rowan", "Murphy", "rowanm",
"Ellis", "Turner", "ellist",
"Maya", "Robinson", "mayar",
"Quinn", "Campbell", "quinnc"
];
FactTable
| lookup kind=leftouter DimTable on Personal, Family
Output
Row | Personal | Family | Alias |
---|---|---|---|
1 | Rowan | Murphy | rowanm |
2 | Ellis | Turner | ellist |
3 | Ellis | Turner | ellist |
4 | Maya | Robinson | mayar |
5 | Quinn | Campbell | quinnc |
Related content
15.18 - mv-apply operator
Applies a subquery to each record, and returns the union of the results of all subqueries.
For example, assume a table T
has a column Metric
of type dynamic
whose values are arrays of real
numbers. The following query locates the
two biggest values in each Metric
value, and return the records corresponding
to these values.
T | mv-apply Metric to typeof(real) on
(
top 2 by Metric desc
)
The mv-apply
operator has the following
processing steps:
- Uses the
mv-expand
operator to expand each record in the input into subtables (order is preserved). - Applies the subquery for each of the subtables.
- Adds zero or more columns to the resulting subtable. These columns contain the values of the source columns that aren’t expanded, and are repeated where needed.
- Returns the union of the results.
The mv-apply
operator gets the following inputs:
One or more expressions that evaluate into dynamic arrays to expand. The number of records in each expanded subtable is the maximum length of each of those dynamic arrays. Null values are added where multiple expressions are specified and the corresponding arrays have different lengths.
Optionally, the names to assign the values of the expressions after expansion. These names become the columns names in the subtables. If not specified, the original name of the column is used when the expression is a column reference. A random name is used otherwise.
[!NOTE] It is recommended to use the default column names.
The data types of the elements of those dynamic arrays, after expansion. These become the column types of the columns in the subtables. If not specified,
dynamic
is used.Optionally, the name of a column to add to the subtables that specifies the 0-based index of the element in the array that resulted in the subtable record.
Optionally, the maximum number of array elements to expand.
The mv-apply
operator can be thought of as a generalization of the
mv-expand
operator (in fact, the latter can be implemented
by the former, if the subquery includes only projections.)
Syntax
T |
mv-apply
[ItemIndex] ColumnsToExpand [RowLimit] on
(
SubQuery )
Where ItemIndex has the syntax:
with_itemindex
=
IndexColumnName
ColumnsToExpand is a comma-separated list of one or more elements of the form:
[Name =
] ArrayExpression [to
typeof
(
Typename)
]
RowLimit is simply:
limit
RowLimit
and SubQuery has the same syntax of any query statement.
Parameters
Name | Type | Required | Description |
---|---|---|---|
ItemIndex | string | Indicates the name of a column of type long that’s appended to the input as part of the array-expansion phase and indicates the 0-based array index of the expanded value. | |
Name | string | The name to assign the array-expanded values of each array-expanded expression. If not specified, the name of the column is used if available. A random name is generated if ArrayExpression isn’t a simple column name. | |
ArrayExpression | dynamic | ✔️ | The array whose values are array-expanded. If the expression is the name of a column in the input, the input column is removed from the input and a new column of the same name, or ColumnName if specified, appears in the output. |
Typename | string | The name of the type that the individual elements of the dynamic array ArrayExpression take. Elements that don’t conform to this type are replaced by a null value. If unspecified, dynamic is used by default. | |
RowLimit | int | A limit on the number of records to generate from each record of the input. If unspecified, 2147483647 is used. | |
SubQuery | string | A tabular query expression with an implicit tabular source that gets applied to each array-expanded subtable. |
Examples
Review the examples and run them in your Data Explorer query page.
Getting the largest element from the array
The query outputs the smallest even number (2) and the smallest odd number (1).
let _data =
range x from 1 to 8 step 1
| summarize l=make_list(x) by xMod2 = x % 2;
_data
| mv-apply element=l to typeof(long) on
(
top 1 by element
)
Output
xMod2 | l | element |
---|---|---|
1 | [1, 3, 5, 7] | 7 |
0 | [2, 4, 6, 8] | 8 |
Calculating the sum of the largest two elements in an array
The query outputs the sum of the top 2 even numbers (6 + 8 = 14) and the sum of the top 2 odd numbers (5 + 7 = 12).
let _data =
range x from 1 to 8 step 1
| summarize l=make_list(x) by xMod2 = x % 2;
_data
| mv-apply l to typeof(long) on
(
top 2 by l
| summarize SumOfTop2=sum(l)
)
Output
xMod2 | l | SumOfTop2 |
---|---|---|
1 | [1,3,5,7] | 12 |
0 | [2,4,6,8] | 14 |
Select elements in arrays
The query identifies the top 2 elements from each dynamic array based on the Arr2 values and summarizes them into new lists.
datatable (Val:int, Arr1:dynamic, Arr2:dynamic)
[ 1, dynamic(['A1', 'A2', 'A3']), dynamic([10, 30, 7]),
7, dynamic(['B1', 'B2', 'B5']), dynamic([15, 11, 50]),
3, dynamic(['C1', 'C2', 'C3', 'C4']), dynamic([6, 40, 20, 8])
]
| mv-apply NewArr1=Arr1, NewArr2=Arr2 to typeof(long) on (
top 2 by NewArr2
| summarize NewArr1=make_list(NewArr1), NewArr2=make_list(NewArr2)
)
Output
Val1 | Arr1 | Arr2 | NewArr1 | NewArr2 |
---|---|---|---|---|
1 | [“A1”,“A2”,“A3”] | [10,30,7] | [“A2’,“A1”] | [30,10] |
7 | [“B1”,“B2”,“B5”] | [15,11,50] | [“B5”,“B1”] | [50,15] |
3 | [“C1”,“C2”,“C3”,“C4”] | [6,40,20,8] | [“C2”,“C3”] | [40,20] |
Using with_itemindex
for working with a subset of the array
The query results in a table with rows where the index is 3 or greater, including the index and element values from the original lists of even and odd numbers.
let _data =
range x from 1 to 10 step 1
| summarize l=make_list(x) by xMod2 = x % 2;
_data
| mv-apply with_itemindex=index element=l to typeof(long) on
(
// here you have 'index' column
where index >= 3
)
| project index, element
Output
index | element |
---|---|
3 | 7 |
4 | 9 |
3 | 8 |
4 | 10 |
Using mutiple columns to join element of 2 arrays
The query combines elements from two dynamic arrays into a new concatenated format and then summarizes them into lists.
datatable (Val: int, Arr1: dynamic, Arr2: dynamic)
[
1, dynamic(['A1', 'A2', 'A3']), dynamic(['B1', 'B2', 'B3']),
5, dynamic(['C1', 'C2']), dynamic(['D1', 'D2'])
]
| mv-apply Arr1, Arr2 on (
extend Out = strcat(Arr1, "_", Arr2)
| summarize Arr1 = make_list(Arr1), Arr2 = make_list(Arr2), Out= make_list(Out)
)
Output
Val | Arr1 | Arr2 | Out |
---|---|---|---|
1 | [“A1”,“A2”,“A3”] | [“B1”,“B2”,“B3”] | [“A1_B1”,“A2_B2”,“A3_B3”] |
5 | [“C1”,“C2”] | [“D1”,“D2”] | [“C1_D1”,“C2_D2”] |
Applying mv-apply to a property bag
This query dynamically removes properties from the packed values object based on the criteria that their values do not start with “555”. The final result contains the original columns with unwanted properties removed.
datatable(SourceNumber: string, TargetNumber: string, CharsCount: long)
[
'555-555-1234', '555-555-1212', 46,
'555-555-1212', '', int(null)
]
| extend values = pack_all()
| mv-apply removeProperties = values on
(
mv-expand kind = array values
| where values[1] !startswith "555"
| summarize propsToRemove = make_set(values[0])
)
| extend values = bag_remove_keys(values, propsToRemove)
| project-away propsToRemove
Output
SourceNumber | TargetNumber | CharsCount | values |
---|---|---|---|
555-555-1234 | 555-555-1212 | 46 | { “SourceNumber”: “555-555-1234”, “TargetNumber”: “555-555-1212” } |
555-555-1212 | { “SourceNumber”: “555-555-1212” } |
Related content
- mv-expand operator
15.19 - mv-expand operator
Expands multi-value dynamic arrays or property bags into multiple records.
mv-expand
can be described as the opposite of the aggregation operators
that pack multiple values into a single dynamic-typed
array or property bag, such as summarize
… make-list()
and make-series
.
Each element in the (scalar) array or property bag generates a new record in the
output of the operator. All columns of the input that aren’t expanded are duplicated to all the records in the output.
Syntax
T |mv-expand
[kind=
(bag
| array
)] [with_itemindex=
IndexColumnName] ColumnName [to typeof(
Typename)
] [,
ColumnName …] [limit
Rowlimit]
T |mv-expand
[kind=
(bag
| array
)] [Name =
] ArrayExpression [to typeof(
Typename)
] [,
[Name =
] ArrayExpression [to typeof(
Typename)
] …] [limit
Rowlimit]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ColumnName, ArrayExpression | string | ✔️ | A column reference, or a scalar expression with a value of type dynamic that holds an array or a property bag. The individual top-level elements of the array or property bag get expanded into multiple records.When ArrayExpression is used and Name doesn’t equal any input column name, the expanded value is extended into a new column in the output. Otherwise, the existing ColumnName is replaced. |
Name | string | A name for the new column. | |
Typename | string | ✔️ | Indicates the underlying type of the array’s elements, which becomes the type of the column produced by the mv-expand operator. The operation of applying type is cast-only and doesn’t include parsing or type-conversion. Array elements that don’t conform with the declared type become null values. |
RowLimit | int | The maximum number of rows generated from each original row. The default is 2147483647. mvexpand is a legacy and obsolete form of the operator mv-expand . The legacy version has a default row limit of 128. | |
IndexColumnName | string | If with_itemindex is specified, the output includes another column named IndexColumnName that contains the index starting at 0 of the item in the original expanded collection. |
Returns
For each record in the input, the operator returns zero, one, or many records in the output, as determined in the following way:
Input columns that aren’t expanded appear in the output with their original value. If a single input record is expanded into multiple output records, the value is duplicated to all records.
For each ColumnName or ArrayExpression that is expanded, the number of output records is determined for each value as explained in modes of expansion. For each input record, the maximum number of output records is calculated. All arrays or property bags are expanded “in parallel” so that missing values (if any) are replaced by null values. Elements are expanded into rows in the order that they appear in the original array/bag.
If the dynamic value is null, then a single record is produced for that value (null). If the dynamic value is an empty array or property bag, no record is produced for that value. Otherwise, as many records are produced as there are elements in the dynamic value.
The expanded columns are of type dynamic
, unless they’re explicitly typed
by using the to typeof()
clause.
Modes of expansion
Two modes of property bag expansions are supported:
kind=bag
orbagexpansion=bag
: Property bags are expanded into single-entry property bags. This mode is the default mode.kind=array
orbagexpansion=array
: Property bags are expanded into two-element[
key,
value]
array structures, allowing uniform access to keys and values. This mode also allows, for example, running a distinct-count aggregation over property names.
Examples
The examples in this section show how to use the syntax to help you get started.
Single column - array expansion
datatable (a: int, b: dynamic)
[
1, dynamic([10, 20]),
2, dynamic(['a', 'b'])
]
| mv-expand b
Output
a | b |
---|---|
1 | 10 |
1 | 20 |
2 | a |
2 | b |
Single column - bag expansion
A simple expansion of a single column:
datatable (a: int, b: dynamic)
[
1, dynamic({"prop1": "a1", "prop2": "b1"}),
2, dynamic({"prop1": "a2", "prop2": "b2"})
]
| mv-expand b
Output
a | b |
---|---|
1 | {“prop1”: “a1”} |
1 | {“prop2”: “b1”} |
2 | {“prop1”: “a2”} |
2 | {“prop2”: “b2”} |
Single column - bag expansion to key-value pairs
A simple bag expansion to key-value pairs:
datatable (a: int, b: dynamic)
[
1, dynamic({"prop1": "a1", "prop2": "b1"}),
2, dynamic({"prop1": "a2", "prop2": "b2"})
]
| mv-expand kind=array b
| extend key = b[0], val=b[1]
Output
a | b | key | val |
---|---|---|---|
1 | [“prop1”,“a1”] | prop1 | a1 |
1 | [“prop2”,“b1”] | prop2 | b1 |
2 | [“prop1”,“a2”] | prop1 | a2 |
2 | [“prop2”,“b2”] | prop2 | b2 |
Zipped two columns
Expanding two columns will first ‘zip’ the applicable columns and then expand them:
datatable (a: int, b: dynamic, c: dynamic)[
1, dynamic({"prop1": "a", "prop2": "b"}), dynamic([5, 4, 3])
]
| mv-expand b, c
Output
a | b | c |
---|---|---|
1 | {“prop1”:“a”} | 5 |
1 | {“prop2”:“b”} | 4 |
1 | 3 |
Cartesian product of two columns
If you want to get a Cartesian product of expanding two columns, expand one after the other:
datatable (a: int, b: dynamic, c: dynamic)
[
1, dynamic({"prop1": "a", "prop2": "b"}), dynamic([5, 6])
]
| mv-expand b
| mv-expand c
Output
a | b | c |
---|---|---|
1 | { “prop1”: “a”} | 5 |
1 | { “prop1”: “a”} | 6 |
1 | { “prop2”: “b”} | 5 |
1 | { “prop2”: “b”} | 6 |
Convert output
To force the output of an mv-expand to a certain type (default is dynamic), use to typeof
:
datatable (a: string, b: dynamic, c: dynamic)[
"Constant", dynamic([1, 2, 3, 4]), dynamic([6, 7, 8, 9])
]
| mv-expand b, c to typeof(int)
| getschema
Output
ColumnName | ColumnOrdinal | DateType | ColumnType |
---|---|---|---|
a | 0 | System.String | string |
b | 1 | System.Object | dynamic |
c | 2 | System.Int32 | int |
Notice column b
is returned as dynamic
while c
is returned as int
.
Using with_itemindex
Expansion of an array with with_itemindex
:
range x from 1 to 4 step 1
| summarize x = make_list(x)
| mv-expand with_itemindex=Index x
Output
x | Index |
---|---|
1 | 0 |
2 | 1 |
3 | 2 |
4 | 3 |
Related content
- mv-apply operator.
- For the opposite of the mv-expand operator, see summarize make_list().
- For expanding dynamic JSON objects into columns using property bag keys, see bag_unpack() plugin.
15.20 - parse operator
Evaluates a string expression and parses its value into one or more calculated columns. The calculated columns return null
values for unsuccessfully parsed strings. If there’s no need to use rows where parsing doesn’t succeed, prefer using the parse-where operator.
Syntax
T | parse
[ kind=
kind [ flags=
regexFlags ]] expression with
[ *
] stringConstant columnName [:
columnType] [ *
] ,
…
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to parse. |
kind | string | ✔️ | One of the supported kind values. The default value is simple . |
regexFlags | string | If kind is regex , then you can specify regex flags to be used like U for ungreedy, m for multi-line mode, s for match new line \n , and i for case-insensitive. More flags can be found in Flags. | |
expression | string | ✔️ | An expression that evaluates to a string. |
stringConstant | string | ✔️ | A string constant for which to search and parse. |
columnName | string | ✔️ | The name of a column to assign a value to, extracted from the string expression. |
columnType | string | The scalar value that indicates the type to convert the value to. The default is string . |
Supported kind
values
Text | Description |
---|---|
simple | This is the default value. stringConstant is a regular string value and the match is strict. All string delimiters should appear in the parsed string, and all extended columns must match the required types. |
regex | stringConstant can be a regular expression and the match is strict. All string delimiters, which can be a regex for this mode, should appear in the parsed string, and all extended columns must match the required types. |
relaxed | stringConstant is a regular string value and the match is relaxed. All string delimiters should appear in the parsed string, but extended columns might partially match the required types. Extended columns that didn’t match the required types get the value null . |
Regex mode
In regex mode, parse translates the pattern to a regex. Use regular expressions to do the matching and use numbered captured groups that are handled internally. For example:
parse kind=regex Col with * <regex1> var1:string <regex2> var2:long
In the parse statement, the regex internally generated by the parse is .*?<regex1>(.*?)<regex2>(\-\d+)
.
*
was translated to.*?
.string
was translated to.*?
.long
was translated to\-\d+
.
Returns
The input table extended according to the list of columns that are provided to the operator.
Examples
The examples in this section show how to use the syntax to help you get started.
The parse
operator provides a streamlined way to extend
a table by using multiple extract
applications on the same string
expression. This result is useful, when the table has a string
column that contains several values that you want to break into individual columns. For example, a column that’s produced by a developer trace ("printf
"/"Console.WriteLine
") statement.
Parse and extend results
In the following example, the column EventText
of table Traces
contains
strings of the form Event: NotifySliceRelease (resourceName={0}, totalSlices={1}, sliceNumber={2}, lockTime={3}, releaseTime={4}, previousLockTime={5})
.
The operation extends the table with six columns: resourceName
, totalSlices
, sliceNumber
, lockTime
, releaseTime
, and previousLockTime
.
let Traces = datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse EventText with * "resourceName=" resourceName ", totalSlices=" totalSlices: long * "sliceNumber=" sliceNumber: long * "lockTime=" lockTime ", releaseTime=" releaseTime: date "," * "previousLockTime=" previousLockTime: date ")" *
| project resourceName, totalSlices, sliceNumber, lockTime, releaseTime, previousLockTime
Output
resourceName | totalSlices | sliceNumber | lockTime | releaseTime | previousLockTime |
---|---|---|---|---|---|
PipelineScheduler | 27 | 15 | 02/17/2016 08:40:00 | 2016-02-17 08:40:00.0000000 | 2016-02-17 08:39:00.0000000 |
PipelineScheduler | 27 | 23 | 02/17/2016 08:40:01 | 2016-02-17 08:40:01.0000000 | 2016-02-17 08:39:01.0000000 |
PipelineScheduler | 27 | 20 | 02/17/2016 08:40:01 | 2016-02-17 08:40:01.0000000 | 2016-02-17 08:39:01.0000000 |
PipelineScheduler | 27 | 16 | 02/17/2016 08:41:00 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:40:00.0000000 |
PipelineScheduler | 27 | 22 | 02/17/2016 08:41:01 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:40:01.0000000 |
Extract email alias and DNS
In the following example, entries from the Contacts table are parsed to extract the alias and domain from an email address, and the domain from a website URL. The query returns the EmailAddress
, EmailAlias
, and WebsiteDomain
columns, where the fullEmail
column combines the parsed email aliases and domains.
let Leads=datatable(Contacts: string)
[
"Event: LeadContact (email=john@contosohotel.com, Website=https:contosohotel.com)",
"Event: LeadContact (email=abi@fourthcoffee.com, Website=https:www.fourthcoffee.com)",
"Event: LeadContact (email=nevena@treyresearch.com, Website=https:treyresearch.com)",
"Event: LeadContact (email=faruk@tailspintoys.com, Website=https:tailspintoys.com)",
"Event: LeadContact (email=ebere@relecloud.com, Website=https:relecloud.com)",
];
Leads
| parse Contacts with * "email=" alias:string "@" domain: string ", Website=https:" WebsiteDomain: string ")"
| project EmailAddress=strcat(alias, "@", domain), EmailAlias=alias, WebsiteDomain
Output
EmailAddress | EmailAlias | WebsiteDomain |
---|---|---|
nevena@treyresearch.com | nevena | treyresearch.com |
john@contosohotel.com | john | contosohotel.com |
faruk@tailspintoys.com | faruk | tailspintoys.com |
ebere@relecloud.com | ebere | relecloud.com |
abi@fourthcoffee.com | abi | www.fourthcoffee.com |
Regex mode
In the following example, regular expressions are used to parse and extract data from the EventText
column. The extracted data is projected into new fields.
let Traces=datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=regex EventText with "(.*?)[a-zA-Z]*=" resourceName @", totalSlices=\s*\d+\s*.*?sliceNumber=" sliceNumber: long ".*?(previous)?lockTime=" lockTime ".*?releaseTime=" releaseTime ".*?previousLockTime=" previousLockTime: date "\\)"
| project resourceName, sliceNumber, lockTime, releaseTime, previousLockTime
Output
resourceName | sliceNumber | lockTime | releaseTime | previousLockTime |
---|---|---|---|---|
PipelineScheduler | 15 | 02/17/2016 08:40:00, | 02/17/2016 08:40:00, | 2016-02-17 08:39:00.0000000 |
PipelineScheduler | 23 | 02/17/2016 08:40:01, | 02/17/2016 08:40:01, | 2016-02-17 08:39:01.0000000 |
PipelineScheduler | 20 | 02/17/2016 08:40:01, | 02/17/2016 08:40:01, | 2016-02-17 08:39:01.0000000 |
PipelineScheduler | 16 | 02/17/2016 08:41:00, | 02/17/2016 08:41:00, | 2016-02-17 08:40:00.0000000 |
PipelineScheduler | 22 | 02/17/2016 08:41:01, | 02/17/2016 08:41:00, | 2016-02-17 08:40:01.0000000 |
Regex mode with regex flags
In the following example resourceName
is extracted.
let Traces=datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=regex EventText with * "resourceName=" resourceName ',' *
| project resourceName
Output
resourceName |
---|
PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01 |
PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00 |
PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01 |
PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00 |
PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00 |
If there are records where resourceName
sometimes appears as lower-case and sometimes as upper-case, you might get nulls for some values.
The results in the previous example are unexpected, and include full event data since the default mode is greedy.
To extract only resourceName
, run the previous query with the non-greedy U
, and disable case-sensitive i
regex flags.
let Traces=datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=regex flags=Ui EventText with * "RESOURCENAME=" resourceName ',' *
| project resourceName
Output
resourceName |
---|
PipelineScheduler |
PipelineScheduler |
PipelineScheduler |
PipelineScheduler |
PipelineScheduler |
If the parsed string has newlines, use the flag s
, to parse the text.
let Traces=datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=23\nlockTime=02/17/2016 08:40:01\nreleaseTime=02/17/2016 08:40:01\npreviousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=15\nlockTime=02/17/2016 08:40:00\nreleaseTime=02/17/2016 08:40:00\npreviousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=20\nlockTime=02/17/2016 08:40:01\nreleaseTime=02/17/2016 08:40:01\npreviousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=22\nlockTime=02/17/2016 08:41:01\nreleaseTime=02/17/2016 08:41:00\npreviousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=16\nlockTime=02/17/2016 08:41:00\nreleaseTime=02/17/2016 08:41:00\npreviousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=regex flags=s EventText with * "resourceName=" resourceName: string "(.*?)totalSlices=" totalSlices: long "(.*?)lockTime=" lockTime: datetime "(.*?)releaseTime=" releaseTime: datetime "(.*?)previousLockTime=" previousLockTime: datetime "\\)"
| project-away EventText
Output
resourceName | totalSlices | lockTime | releaseTime | previousLockTime |
---|---|---|---|---|
PipelineScheduler | 27 | 2016-02-17 08:40:00.0000000 | 2016-02-17 08:40:00.0000000 | 2016-02-17 08:39:00.0000000 |
PipelineScheduler | 27 | 2016-02-17 08:40:01.0000000 | 2016-02-17 08:40:01.0000000 | 2016-02-17 08:39:01.0000000 |
PipelineScheduler | 27 | 2016-02-17 08:40:01.0000000 | 2016-02-17 08:40:01.0000000 | 2016-02-17 08:39:01.0000000 |
PipelineScheduler | 27 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:40:00.0000000 |
PipelineScheduler | 27 | 2016-02-17 08:41:01.0000000 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:40:01.0000000 |
Relaxed mode
In the following relaxed mode example, the extended column totalSlices
must be of type long
. However, in the parsed string, it has the value nonValidLongValue
.
For the extended column, releaseTime
, the value nonValidDateTime
can’t be parsed as datetime
.
These two extended columns result in null
values while the other columns, such as sliceNumber
, still result in the correct values.
If you use option kind = simple
for the following query, you get null
results for all extended columns. This option is strict on extended columns, and is the difference between relaxed and simple mode.
let Traces=datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=nonValidDateTime 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=nonValidDateTime, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=nonValidLongValue, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=nonValidDateTime 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=nonValidLongValue, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=relaxed EventText with * "resourceName=" resourceName ", totalSlices=" totalSlices: long ", sliceNumber=" sliceNumber: long * "lockTime=" lockTime ", releaseTime=" releaseTime: date "," * "previousLockTime=" previousLockTime: date ")" *
| project-away EventText
Output
resourceName | totalSlices | sliceNumber | lockTime | releaseTime | previousLockTime |
---|---|---|---|---|---|
PipelineScheduler | 27 | 15 | 02/17/2016 08:40:00 | 2016-02-17 08:39:00.0000000 | |
PipelineScheduler | 27 | 23 | 02/17/2016 08:40:01 | 2016-02-17 08:39:01.0000000 | |
PipelineScheduler | 20 | 02/17/2016 08:40:01 | 2016-02-17 08:39:01.0000000 | ||
PipelineScheduler | 16 | 02/17/2016 08:41:00 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:40:00.0000000 | |
PipelineScheduler | 27 | 22 | 02/17/2016 08:41:01 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:40:01.0000000 |
Related content
15.21 - parse-kv operator
Extracts structured information from a string expression and represents the information in a key/value form.
The following extraction modes are supported:
- Specified delimiter: Extraction based on specified delimiters that dictate how keys/values and pairs are separated from each other.
- Non-specified delimiter: Extraction with no need to specify delimiters. Any nonalphanumeric character is considered a delimiter.
- Regex: Extraction based on regular expressions.
Syntax
Specified delimiter
T |
parse-kv
Expression as
(
KeysList )
with
(
pair_delimiter
=
PairDelimiter ,
kv_delimiter
=
KvDelimiter [,
quote
=
QuoteChars … [,
escape
=
EscapeChar …]] [,
greedy
=
true
] )
Nonspecified delimiter
T |
parse-kv
Expression as
(
KeysList )
with
(
[quote
=
QuoteChars … [,
escape
=
EscapeChar …]] )
Regex
T |
parse-kv
Expression as
(
KeysList )
with
(
regex
=
RegexPattern)
)
Parameters
Name | Type | Required | Description |
---|---|---|---|
Expression | string | ✔️ | The expression from which to extract key values. |
KeysList | string | ✔️ | A comma-separated list of key names and their value data types. The order of the keys doesn’t have to match the order in which they appear in the text. |
PairDelimiter | string | A delimiter that separates key value pairs from each other. | |
KvDelimiter | string | A delimiter that separates keys from values. | |
QuoteChars | string | A one- or two-character string literal representing opening and closing quotes that key name or the extracted value may be wrapped with. The parameter can be repeated to specify a separate set of opening/closing quotes. | |
EscapeChar | string | A one-character string literal describing a character that may be used for escaping special characters in a quoted value. The parameter can be repeated if multiple escape characters are used. | |
RegexPattern | string | A regular expression containing two capturing groups exactly. The first group represents the key name, and the second group represents the key value. |
Returns
The original input tabular expression T, extended with columns per specified keys to extract.
Examples
The examples in this section show how to use the syntax to help you get started.
Extraction with well-defined delimiters
In this query, keys and values are separated by well defined delimiters. These delimeters are comma and colon characters.
print str="ThreadId:458745723, Machine:Node001, Text: The service is up, Level: Info"
| parse-kv str as (Text: string, ThreadId:long, Machine: string) with (pair_delimiter=',', kv_delimiter=':')
| project-away str
Output
Text | ThreadId | Machine |
---|---|---|
The service is up | 458745723 | Node001 |
Extraction with value quoting
Sometimes key names or values are wrapped in quotes, which allow the values themselves to contain delimiter characters. The following examples show how a quote
argument is used for extracting such values.
print str='src=10.1.1.123 dst=10.1.1.124 bytes=125 failure="connection aborted" "event time"=2021-01-01T10:00:54'
| parse-kv str as (['event time']:datetime, src:string, dst:string, bytes:long, failure:string) with (pair_delimiter=' ', kv_delimiter='=', quote='"')
| project-away str
Output
event time | src | dst | bytes | failure |
---|---|---|---|---|
2021-01-01 10:00:54.0000000 | 10.1.1.123 | 10.1.1.124 | 125 | connection aborted |
This query uses different opening and closing quotes:
print str='src=10.1.1.123 dst=10.1.1.124 bytes=125 failure=(connection aborted) (event time)=(2021-01-01 10:00:54)'
| parse-kv str as (['event time']:datetime, src:string, dst:string, bytes:long, failure:string) with (pair_delimiter=' ', kv_delimiter='=', quote='()')
| project-away str
Output
event time | src | dst | bytes | failure |
---|---|---|---|---|
2021-01-01 10:00:54.0000000 | 10.1.1.123 | 10.1.1.124 | 125 | connection aborted |
The values themselves may contain properly escaped quote characters, as the following example shows:
print str='src=10.1.1.123 dst=10.1.1.124 bytes=125 failure="the remote host sent \\"bye!\\"" time=2021-01-01T10:00:54'
| parse-kv str as (['time']:datetime, src:string, dst:string, bytes:long, failure:string) with (pair_delimiter=' ', kv_delimiter='=', quote='"', escape='\\')
| project-away str
Output
time | src | dst | bytes | failure |
---|---|---|---|---|
2021-01-01 10:00:54.0000000 | 10.1.1.123 | 10.1.1.124 | 125 | the remote host sent “bye!” |
Extraction in greedy mode
There are cases when unquoted values may contain pair delimiters. In this case, use the greedy
mode to indicate to the operator to scan until the next key appearance (or end of string) when looking for the value ending.
The following examples compare how the operator works with and without the greedy
mode specified:
print str='name=John Doe phone=555 5555 city=New York'
| parse-kv str as (name:string, phone:string, city:string) with (pair_delimiter=' ', kv_delimiter='=')
| project-away str
Output
name | phone | city |
---|---|---|
John | 555 | New |
print str='name=John Doe phone=555 5555 city=New York'
| parse-kv str as (name:string, phone:string, city:string) with (pair_delimiter=' ', kv_delimiter='=', greedy=true)
| project-away str
Output
name | phone | city |
---|---|---|
John Doe | 555 5555 | New York |
Extraction with no well-defined delimiters
In the following example, any nonalphanumeric character is considered a valid delimiter:
print str="2021-01-01T10:00:34 [INFO] ThreadId:458745723, Machine:Node001, Text: Started"
| parse-kv str as (Text: string, ThreadId:long, Machine: string)
| project-away str
Output
Text | ThreadId | Machine |
---|---|---|
Started | 458745723 | Node001 |
Values quoting and escaping is allowed in this mode as shown in the following example:
print str="2021-01-01T10:00:34 [INFO] ThreadId:458745723, Machine:Node001, Text: 'The service \\' is up'"
| parse-kv str as (Text: string, ThreadId:long, Machine: string) with (quote="'", escape='\\')
| project-away str
Output
Text | ThreadId | Machine |
---|---|---|
The service ’ is up | 458745723 | Node001 |
Extraction using regex
When no delimiters define text structure enough, regular expression-based extraction can be useful.
print str=@'["referer url: https://hostname.com/redirect?dest=/?h=1234", "request url: https://hostname.com/?h=1234", "advertiser id: 24fefbca-cf27-4d62-a623-249c2ad30c73"]'
| parse-kv str as (['referer url']:string, ['request url']:string, ['advertiser id']: guid) with (regex=@'"([\w ]+)\s*:\s*([^"]*)"')
| project-away str
Output
referer url | request url | advertiser id |
---|---|---|
https://hostname.com/redirect?dest=/?h=1234 | https://hostname.com/?h=1234 | 24fefbca-cf27-4d62-a623-249c2ad30c73 |
15.22 - parse-where operator
Evaluates a string expression, and parses its value into one or more calculated columns. The result is only the successfully parsed strings.
parse-where
parses the strings in the same way as parse, and filters out strings that were not parsed successfully.
See parse operator, which produces nulls for unsuccessfully parsed strings.
Syntax
T | parse-where
[kind=
kind [flags=
regexFlags]] expression with
*
(stringConstant columnName [:
columnType]) *
…
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to parse. |
kind | string | ✔️ | One of the supported kind values. The default value is simple . |
regexFlags | string | If kind is regex , then you can specify regex flags to be used like U for ungreedy, m for multi-line mode, s for match new line \n , and i for case-insensitive. More flags can be found in Flags. | |
expression | string | ✔️ | An expression that evaluates to a string. |
stringConstant | string | ✔️ | A string constant for which to search and parse. |
columnName | string | ✔️ | The name of a column to assign a value to, extracted from the string expression. |
columnType | string | The scalar value that indicates the type to convert the value to. The default is the string . |
Supported kind values
Text | Description |
---|---|
simple | This is the default value. stringConstant is a regular string value and the match is strict. All string delimiters should appear in the parsed string, and all extended columns must match the required types. |
regex | stringConstant may be a regular expression and the match is strict. All string delimiters, which can be a regex for this mode, should appear in the parsed string, and all extended columns must match the required types. |
Regex mode
In regex mode, parse will translate the pattern to a regex and use regular expressions in order to do the matching using numbered captured groups that are handled internally. For example:
parse-where kind=regex Col with * <regex1> var1:string <regex2> var2:long
The regex that will be generated by the parse internally is .*?<regex1>(.*?)<regex2>(\-\d+)
.
*
was translated to.*?
.string
was translated to.*?
.long
was translated to\-\d+
.
Returns
The input table, which is extended according to the list of columns that are provided to the operator.
Examples
The examples in this section show how to use the syntax to help you get started.
The parse-where
operator provides a streamlined way to extend
a table by using multiple extract
applications on the same string
expression. This is most useful when the table has a string
column that contains several values that you want to break into individual columns. For example, you can break up a column that was produced by a developer trace ("printf
"/"Console.WriteLine
") statement.
Using parse
In the example below, the column EventText
of table Traces
contains strings of the form Event: NotifySliceRelease (resourceName={0}, totalSlices= {1}, sliceNumber={2}, lockTime={3}, releaseTime={4}, previousLockTime={5})
. The operation below will extend the table with six columns: resourceName
, totalSlices
, sliceNumber
, lockTime
, releaseTime
, previousLockTime
, Month
, and Day
.
A few of the strings don’t have a full match.
Using parse
, the calculated columns will have nulls.
let Traces = datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=invalid_number, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=invalid_datetime, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=invalid_number, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse EventText with * "resourceName=" resourceName ", totalSlices=" totalSlices: long * "sliceNumber=" sliceNumber: long * "lockTime=" lockTime ", releaseTime=" releaseTime: date "," * "previousLockTime=" previouLockTime: date ")" *
| project
resourceName,
totalSlices,
sliceNumber,
lockTime,
releaseTime,
previouLockTime
Output
resourceName | totalSlices | sliceNumber | lockTime | releaseTime | previousLockTime |
---|---|---|---|---|---|
PipelineScheduler | 27 | 20 | 02/17/2016 08:40:01 | 2016-02-17 08:40:01.0000000 | 2016-02-17 08:39:01.0000000 |
PipelineScheduler | 27 | 22 | 02/17/2016 08:41:01 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:40:01.0000000 |
Using parse-where
Using ‘parse-where’ will filter-out unsuccessfully parsed strings from the result.
let Traces = datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=invalid_number, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=invalid_datetime, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=invalid_number, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse-where EventText with * "resourceName=" resourceName ", totalSlices=" totalSlices: long * "sliceNumber=" sliceNumber: long * "lockTime=" lockTime ", releaseTime=" releaseTime: date "," * "previousLockTime=" previousLockTime: date ")" *
| project
resourceName,
totalSlices,
sliceNumber,
lockTime,
releaseTime,
previousLockTime
Output
resourceName | totalSlices | sliceNumber | lockTime | releaseTime | previousLockTime |
---|---|---|---|---|---|
PipelineScheduler | 27 | 20 | 02/17/2016 08:40:01 | 2016-02-17 08:40:01.0000000 | 2016-02-17 08:39:01.0000000 |
PipelineScheduler | 27 | 22 | 02/17/2016 08:41:01 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:40:01.0000000 |
Regex mode using regex flags
To get the resourceName and totalSlices, use the following query:
let Traces = datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=non_valid_integer, sliceNumber=11, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=non_valid_integer, sliceNumber=44, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse-where kind = regex EventText with * "RESOURCENAME=" resourceName "," * "totalSlices=" totalSlices: long "," *
| project resourceName, totalSlices
Output
resourceName | totalSlices |
---|---|
parse-where
with case-insensitive regex flag
In the above query, the default mode was case-sensitive, so the strings were parsed successfully. No result was obtained.
To get the required result, run parse-where
with a case-insensitive (i
) regex flag.
Only three strings will be parsed successfully, so the result is three records (some totalSlices hold invalid integers).
let Traces = datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=non_valid_integer, sliceNumber=11, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=non_valid_integer, sliceNumber=44, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse-where kind = regex flags=i EventText with * "RESOURCENAME=" resourceName "," * "totalSlices=" totalSlices: long "," *
| project resourceName, totalSlices
Output
resourceName | totalSlices |
---|---|
PipelineScheduler | 27 |
PipelineScheduler | 27 |
PipelineScheduler | 27 |
15.23 - partition operator
The partition operator partitions the records of its input table into multiple subtables according to values in a key column. The operator runs a subquery on each subtable, and produces a single output table that is the union of the results of all subqueries.
The partition operator is useful when you need to perform a subquery only on a subset of rows that belong to the same partition key, and not a query of the whole dataset. These subqueries could include aggregate functions, window functions, top N and others.
The partition operator supports several strategies of subquery operation:
- Native - use with an implicit data source with thousands of key partition values.
- Shuffle - use with an implicit source with millions of key partition values.
- Legacy - use with an implicit or explicit source for 64 or less key partition values.
Syntax
T |
partition
[ hint.strategy=
Strategy ] [ Hints ] by
Column (
TransformationSubQuery )
T |
partition
[ hint.strategy=legacy
] [ Hints ] by
Column {
SubQueryWithSource }
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular source. |
Strategy | string | The value legacy , shuffle , or native . This hint defines the execution strategy of the partition operator.If no strategy is specified, the legacy strategy is used. For more information, see Strategies. | |
Column | string | ✔️ | The name of a column in T whose values determine how to partition the input tabular source. |
TransformationSubQuery | string | ✔️ | A tabular transformation expression. The source is implicitly the subtables produced by partitioning the records of T. Each subtable is homogenous on the value of Column.The expression must provide only one tabular result and shouldn’t have other types of statements, such as let statements. |
SubQueryWithSource | string | ✔️ | A tabular expression that includes its own tabular source, such as a table reference. This syntax is only supported with the legacy strategy. The subquery can only reference the key column, Column, from T. To reference the column, use the syntax toscalar( Column) .The expression must provide only one tabular result and shouldn’t have other types of statements, such as let statements. |
Hints | string | Zero or more space-separated parameters in the form of: HintName = Value that control the behavior of the operator. See the supported hints per strategy type. |
Supported hints
Hint name | Type | Strategy | Description |
---|---|---|---|
hint.shufflekey | string | shuffle | The partition key used to run the partition operator with the shuffle strategy. |
hint.materialized | bool | legacy | If set to true , materializes the source of the partition operator. The default value is false . |
hint.concurrency | int | legacy | Determines how many partitions to run in parallel. The default value is 16 . |
hint.spread | int | legacy | Determines how to distribute the partitions among cluster nodes. The default value is 1 .For example, if there are N partitions and the spread hint is set to P, then the N partitions are processed by P different cluster nodes equally, in parallel/sequentially depending on the concurrency hint. |
Returns
The operator returns a union of the results of the individual subqueries.
Strategies
The partition operator supports several strategies of subquery operation: native, shuffle, and legacy.
Native strategy
This strategy should be applied when the number of distinct values of the partition key isn’t large, roughly in the thousands.
The subquery must be a tabular transformation that doesn’t specify a tabular source. The source is implicit and is assigned according to the subtable partitions. Only certain supported operators can be used in the subquery. There’s no restriction on the number of partitions.
To use this strategy, specify hint.strategy=native
.
Shuffle strategy
This strategy should be applied when the number of distinct values of the partition key is large, in the millions.
The subquery must be a tabular transformation that doesn’t specify a tabular source. The source is implicit and is assigned according to the subtable partitions. Only certain supported operators can be used in the subquery. There’s no restriction on the number of partitions.
To use this strategy, specify hint.strategy=shuffle
. For more information about shuffle strategy and performance, see shuffle query.
Supported operators for the native and shuffle strategies
The following list of operators can be used in subqueries with the native or shuffle strategies:
- count
- distinct
- extend
- make-series
- mv-apply
- mv-expand
- parse
- parse-where
- project
- project-away
- project-keep
- project-rename
- project-reorder
- reduce
- sample
- sample-distinct
- scan
- search
- serialize
- sort
- summarize
- take
- top
- top-hitters
- top-nested
- where
Legacy strategy
For historical reasons, the legacy
strategy is the default strategy. However, we recommend favoring the native or shuffle strategies, as the legacy
approach is limited to 64 partitions and is less efficient.
In some scenarios, the legacy
strategy might be necessary due to its support for including a tabular source in the subquery. In such cases, the subquery can only reference the key column, Column, from the input tabular source, T. To reference the column, use the syntax toscalar(
Column)
.
If the subquery is a tabular transformation without a tabular source, the source is implicit and is based on the subtable partitions.
To use this strategy, specify hint.strategy=legacy
or omit any other strategy indication.
Examples
The examples in this section show how to use the syntax to help you get started.
Find top values
In some cases, it’s more performant and easier to write a query using the partition
operator than using the top-nested
operator. The following query runs a subquery calculating summarize
and top
for each State
starting with W
: “WYOMING”, “WASHINGTON”, “WEST VIRGINIA”, and “WISCONSIN”.
StormEvents
| where State startswith 'W'
| partition hint.strategy=native by State
(
summarize Events=count(), Injuries=sum(InjuriesDirect) by EventType, State
| top 3 by Events
)
Output
EventType | State | Events | Injuries |
---|---|---|---|
Hail | WYOMING | 108 | 0 |
High Wind | WYOMING | 81 | 5 |
Winter Storm | WYOMING | 72 | 0 |
Heavy Snow | WASHINGTON | 82 | 0 |
High Wind | WASHINGTON | 58 | 13 |
Wildfire | WASHINGTON | 29 | 0 |
Thunderstorm Wind | WEST VIRGINIA | 180 | 1 |
Hail | WEST VIRGINIA | 103 | 0 |
Winter Weather | WEST VIRGINIA | 88 | 0 |
Thunderstorm Wind | WISCONSIN | 416 | 1 |
Winter Storm | WISCONSIN | 310 | 0 |
Hail | WISCONSIN | 303 | 1 |
Native strategy
The following query returns the top 2 EventType
values by TotalInjuries
for each State
that starts with ‘W’:
StormEvents
| where State startswith 'W'
| partition hint.strategy = native by State
(
summarize TotalInjueries = sum(InjuriesDirect) by EventType
| top 2 by TotalInjueries
)
Output
EventType | TotalInjueries |
---|---|
Tornado | 4 |
Hail | 1 |
Thunderstorm Wind | 1 |
Excessive Heat | 0 |
High Wind | 13 |
Lightning | 5 |
High Wind | 5 |
Avalanche | 3 |
Shuffle strategy
The following query returns the top 3 DamagedProperty
values foreach EpisodeId
and the columns EpisodeId
and State
.
StormEvents
| partition hint.strategy=shuffle by EpisodeId
(
top 3 by DamageProperty
| project EpisodeId, State, DamageProperty
)
| count
Output
Count |
---|
22345 |
Legacy strategy with explicit source
The following query runs two subqueries:
- When
x == 1
, the query returns all rows fromStormEvents
that haveInjuriesIndirect == 1
. - When
x == 2
, the query returns all rows fromStormEvents
that haveInjuriesIndirect == 2
.
The final result is the union of these two subqueries.
range x from 1 to 2 step 1
| partition hint.strategy=legacy by x {StormEvents | where x == InjuriesIndirect}
| count
Output
Count |
---|
113 |
Partition reference
The following example shows how to use the as operator to give a “name” to each data partition and then reuse that name within the subquery. This approach is only relevant to the legacy
strategy.
T
| partition by Dim
(
as Partition
| extend MetricPct = Metric * 100.0 / toscalar(Partition | summarize sum(Metric))
)
15.24 - print operator
Outputs a single row with one or more scalar expression results as columns.
Syntax
print
[ColumnName =
] ScalarExpression [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ColumnName | string | The name to assign to the output column. | |
ScalarExpression | string | ✔️ | The expression to evaluate. |
Returns
A table with one or more columns and a single row. Each column returns the corresponding value of the evaluated ScalarExpression.
Examples
The examples in this section show how to use the syntax to help you get started.
Print sum and variable value
The following example outputs a row with two columns. One column contains the sum of a series of numbers and the other column contains the value of the variable, x
.
print 0 + 1 + 2 + 3 + 4 + 5, x = "Wow!"
Output
print_0 | x |
---|---|
15 | Wow! |
Print concatenated string
The following example outputs the results of the strcat()
function as a concatenated string.
print banner=strcat("Hello", ", ", "World!")
Output
banner |
---|
Hello, World! |
15.25 - Project operator
Select the columns to include, rename or drop, and insert new computed columns.
The order of the columns in the result is specified by the order of the arguments. Only the columns specified in the arguments are included in the result. Any other columns in the input are dropped.
Syntax
T | project
[ColumnName | (
ColumnName[,
])
=
] Expression [,
…]
or
T | project
ColumnName [=
Expression] [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input for which to project certain columns. |
ColumnName | string | A column name or comma-separated list of column names to appear in the output. | |
Expression | string | The scalar expression to perform over the input. |
- Either ColumnName or Expression must be specified.
- If there’s no Expression, then a column of ColumnName must appear in the input.
- If ColumnName is omitted, the output column name of Expression will be automatically generated.
- If Expression returns more than one column, a list of column names can be specified in parentheses. If a list of the column names isn’t specified, all Expression’s output columns with generated names will be added to the output.
Returns
A table with columns that were named as arguments. Contains same number of rows as the input table.
Examples
The examples in this section show how to use the syntax to help you get started.
Only show specific columns
Only show the EventId
, State
, EventType
of the StormEvents
table.
StormEvents
| project EventId, State, EventType
Output
The table shows the first 10 results.
EventId | State | EventType |
---|---|---|
61032 | ATLANTIC SOUTH | Waterspout |
60904 | FLORIDA | Heavy Rain |
60913 | FLORIDA | Tornado |
64588 | GEORGIA | Thunderstorm Wind |
68796 | MISSISSIPPI | Thunderstorm Wind |
68814 | MISSISSIPPI | Tornado |
68834 | MISSISSIPPI | Thunderstorm Wind |
68846 | MISSISSIPPI | Hail |
73241 | AMERICAN SAMOA | Flash Flood |
64725 | KENTUCKY | Flood |
… | … | … |
Potential manipulations using project
The following query renames the BeginLocation
column and creates a new column called TotalInjuries
from a calculation over two existing columns.
StormEvents
| project StartLocation = BeginLocation, TotalInjuries = InjuriesDirect + InjuriesIndirect
| where TotalInjuries > 5
Output
The table shows the first 10 results.
StartLocation | TotalInjuries |
---|---|
LYDIA | 15 |
ROYAL | 15 |
GOTHENBURG | 9 |
PLAINS | 8 |
KNOXVILLE | 9 |
CAROL STREAM | 11 |
HOLLY | 9 |
RUFFIN | 9 |
ENTERPRISE MUNI ARPT | 50 |
COLLIERVILLE | 6 |
… | … |
Related content
15.26 - project-away operator
Select what columns from the input table to exclude from the output table.
Syntax
T | project-away
ColumnNameOrPattern [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input from which to remove columns. |
ColumnNameOrPattern | string | ✔️ | One or more column names or column wildcard-patterns to be removed from the output. |
Returns
A table with columns that weren’t named as arguments. Contains same number of rows as the input table.
Examples
The input table PopulationData
has 2 columns: State
and Population
. Project-away the Population
column and you’re left with a list of state names.
PopulationData
| project-away Population
Output
The following table shows only the first 10 results.
State |
---|
ALABAMA |
ALASKA |
ARIZONA |
ARKANSAS |
CALIFORNIA |
COLORADO |
CONNECTICUT |
DELAWARE |
DISTRICT OF COLUMBIA |
FLORIDA |
… |
Project-away using a column name pattern
This query removes columns starting with the word “session”.
ConferenceSessions
| project-away session*
Output
The table shows only the first 10 results.
conference | owner | participants | URL | level | starttime | duration | time_and_duration | kusto_affinity |
---|---|---|---|---|---|---|---|---|
PASS Summit 2019 | Avner Aharoni | https://www.eventbrite.com/e/near-real-time-interact-analytics-on-big-data-using-azure-data-explorer-fg-tickets-77532775619 | 2019-11-07T19:15:00Z | Thu, Nov 7, 11:15 AM-12:15 PM PST | Focused | |||
PASS Summit | Rohan Kumar | Ariel Pisetzky | https://www.pass.org/summit/2018/Learn/Keynotes.aspx | 2018-11-07T08:15:00Z | 90 | Wed, Nov 7, 8:15-9:45 am | Mention | |
Intelligent Cloud 2019 | Rohan Kumar | Henning Rauch | 2019-04-09T09:00:00Z | 90 | Tue, Apr 9, 9:00-10:30 AM | Mention | ||
Ignite 2019 | Jie Feng | https://myignite.techcommunity.microsoft.com/sessions/83940 | 100 | 2019-11-06T14:35:00Z | 20 | Wed, Nov 6, 9:35 AM - 9:55 AM | Mention | |
Ignite 2019 | Bernhard Rode | Le Hai Dang, Ricardo Niepel | https://myignite.techcommunity.microsoft.com/sessions/81596 | 200 | 2019-11-06T16:45:00Z | 45 | Wed, Nov 6, 11:45 AM-12:30 PM | Mention |
Ignite 2019 | Tzvia Gitlin | Troyna | https://myignite.techcommunity.microsoft.com/sessions/83933 | 400 | 2019-11-06T17:30:00Z | 75 | Wed, Nov 6, 12:30 PM-1:30 PM | Focused |
Ignite 2019 | Jie Feng | https://myignite.techcommunity.microsoft.com/sessions/81057 | 300 | 2019-11-06T20:30:00Z | 45 | Wed, Nov 6, 3:30 PM-4:15 PM | Mention | |
Ignite 2019 | Manoj Raheja | https://myignite.techcommunity.microsoft.com/sessions/83939 | 300 | 2019-11-07T18:15:00Z | 20 | Thu, Nov 7, 1:15 PM-1:35 PM | Focused | |
Ignite 2019 | Uri Barash | https://myignite.techcommunity.microsoft.com/sessions/81060 | 300 | 2019-11-08T17:30:00Z | 45 | Fri, Nov8, 10:30 AM-11:15 AM | Focused | |
Ignite 2018 | Manoj Raheja | https://azure.microsoft.com/resources/videos/ignite-2018-azure-data-explorer-%E2%80%93-query-billions-of-records-in-seconds/ | 200 | 20 | Focused | |||
… | … | … | … | … | … | … | … | … |
Related content
- To choose what columns from the input to keep in the output, use project-keep.
- To rename columns, use
project-rename
. - To reorder columns, use
project-reorder
.
15.27 - project-keep operator
Select what columns from the input to keep in the output. Only the columns that are specified as arguments will be shown in the result. The other columns are excluded.
Syntax
T | project-keep
ColumnNameOrPattern [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input from which to keep columns. |
ColumnNameOrPattern | string | ✔️ | One or more column names or column wildcard-patterns to be kept in the output. |
Returns
A table with columns that were named as arguments. Contains same number of rows as the input table.
Example
This query returns columns from the ConferenceSessions
table that contain the word “session”.
ConferenceSessions
| project-keep session*
Output
The output table shows only the first 10 results.
sessionid | session_title | session_type | session_location |
---|---|---|---|
COM64 | Focus Group: Azure Data Explorer | Focus Group | Online |
COM65 | Focus Group: Azure Data Explorer | Focus Group | Online |
COM08 | Ask the Team: Azure Data Explorer | Ask the Team | Online |
COM137 | Focus Group: Built-In Dashboard and Smart Auto Scaling Capabilities in Azure Data Explorer | Focus Group | Online |
CON-PRT157 | Roundtable: Monitoring and managing your Azure Data Explorer deployments | Roundtable | Online |
CON-PRT103 | Roundtable: Advanced Kusto query language topics | Roundtable | Online |
CON-PRT157 | Roundtable: Monitoring and managing your Azure Data Explorer deployments | Roundtable | Online |
CON-PRT103 | Roundtable: Advanced Kusto query language topics | Roundtable | Online |
CON-PRT130 | Roundtable: Data exploration and visualization with Azure Data Explorer | Roundtable | Online |
CON-PRT130 | Roundtable: Data exploration and visualization with Azure Data Explorer | Roundtable | Online |
… | … | … | … |
Related content
- To choose what columns from the input to exclude from the output, use project-away.
- To rename columns, use
project-rename
. - To reorder columns, use
project-reorder
.
15.28 - project-rename operator
Renames columns in the output table.
Syntax
T | project-rename
NewColumnName = ExistingColumnName [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular data. |
NewColumnName | string | ✔️ | The new column name. |
ExistingColumnName | string | ✔️ | The name of the existing column to rename. |
Returns
A table that has the columns in the same order as in an existing table, with columns renamed.
Example
If you have a table with columns a, b, and c, and you want to rename a to new_a and b to new_b while keeping the same order, the query would look like this:
print a='alpha', b='bravo', c='charlie'
| project-rename new_a=a, new_b=b, new_c=c
Output
new_a | new_b | new_c |
---|---|---|
alpha | bravo | charlie |
15.29 - project-reorder operator
Reorders columns in the output table.
Syntax
T | project-reorder
ColumnNameOrPattern [asc
| desc
| granny-asc
| granny-desc
] [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular data. |
ColumnNameOrPattern | string | ✔️ | The name of the column or column wildcard pattern by which to order the columns. |
asc , desc , granny-asc , granny-desc | string | Indicates how to order the columns when a wildcard pattern is used. asc or desc orders columns by column name in ascending or descending manner, respectively. granny-asc or granny-desc orders by ascending or descending, respectively, while secondarily sorting by the next numeric value. For example, a20 comes before a100 when granny-asc is specified. |
Returns
A table that contains columns in the order specified by the operator arguments. project-reorder
doesn’t rename or remove columns from the table, therefore, all columns that existed in the source table, appear in the result table.
Examples
The examples in this section show how to use the syntax to help you get started.
Reorder with b first
Reorder a table with three columns (a, b, c) so the second column (b) will appear first.
print a='a', b='b', c='c'
| project-reorder b
Output
b | a | c |
---|---|---|
b | a | c |
Reorder with a first
Reorder columns of a table so that columns starting with a
will appear before other columns.
print b = 'b', a2='a2', a3='a3', a1='a1'
| project-reorder a* asc
Output
a1 | a2 | a3 | b |
---|---|---|---|
a1 | a2 | a3 | b |
15.30 - Queries
A query is a read-only operation against data ingested into your cluster. Queries always run in the context of a particular database in the cluster. They may also refer to data in another database, or even in another cluster.
As ad-hoc query of data is the top-priority scenario for Kusto, the Kusto Query Language syntax is optimized for non-expert users authoring and running queries over their data and being able to understand unambiguously what each query does (logically).
The language syntax is that of a data flow, where “data” means “tabular data” (data in one or more rows/columns rectangular shape). At a minimum, a query consists of source data references (references to Kusto tables) and one or more query operators applied in sequence, indicated visually by the use of a pipe character (|
) to delimit operators.
For example:
StormEvents
| where State == 'FLORIDA' and StartTime > datetime(2000-01-01)
| count
Each filter prefixed by the pipe character |
is an instance of an operator, with some parameters. The input to the operator is the table that is the result of the preceding pipeline. In most cases, any parameters are scalar expressions over the columns of the input.
In a few cases, the parameters are the names of input columns, and in a few cases, the parameter is a second table. The result of a query is always a table, even if it only has one column and one row.
T
is used in query to denote the preceding pipeline or source table.
15.31 - range operator
Generates a single-column table of values.
Syntax
range
columnName from
start to
stop step
step
Parameters
Name | Type | Required | Description |
---|---|---|---|
columnName | string | ✔️ | The name of the single column in the output table. |
start | int, long, real, datetime, or timespan | ✔️ | The smallest value in the output. |
stop | int, long, real, datetime, or timespan | ✔️ | The highest value being generated in the output or a bound on the highest value if step is over this value. |
step | int, long, real, datetime, or timespan | ✔️ | The difference between two consecutive values. |
Returns
A table with a single column called columnName,
whose values are start, start +
step, … up to and until stop.
Examples
The example in this section shows how to use the syntax to help you get started.
Range over the past seven days
The following example creates a table with entries for the current time stamp extended over the past seven days, once a day.
range LastWeek from ago(7d) to now() step 1d
Output
LastWeek |
---|
2015-12-05 09:10:04.627 |
2015-12-06 09:10:04.627 |
… |
2015-12-12 09:10:04.627 |
Combine different stop times
The following example shows how to extend ranges to use multiple stop times by using the union
operator.
let Range1 = range Time from datetime(2024-01-01) to datetime(2024-01-05) step 1d;
let Range2 = range Time from datetime(2024-01-06) to datetime(2024-01-10) step 1d;
union Range1, Range2
| order by Time asc
Output
Time |
---|
2024-01-04 00:00:00.0000000 |
2024-01-05 00:00:00.0000000 |
2024-01-06 00:00:00.0000000 |
2024-01-07 00:00:00.0000000 |
2024-01-08 00:00:00.0000000 |
2024-01-09 00:00:00.0000000 |
2024-01-10 00:00:00.0000000 |
Range using parameters
The following example shows how to use the range
operator with parameters, which are then extended and consumed as a table.
let toUnixTime = (dt:datetime)
{
(dt - datetime(1970-01-01)) / 1s
};
let MyMonthStart = startofmonth(now()); //Start of month
let StepBy = 4.534h; //Supported timespans
let nn = 64000; // Row Count parametrized
let MyTimeline = range MyMonthHour from MyMonthStart to now() step StepBy
| extend MyMonthHourinUnixTime = toUnixTime(MyMonthHour), DateOnly = bin(MyMonthHour,1d), TimeOnly = MyMonthHour - bin(MyMonthHour,1d)
; MyTimeline | order by MyMonthHour asc | take nn
Output
MyMonthHour | MyMonthHourinUnixTime | DateOnly | TimeOnly |
---|---|---|---|
2023-02-01 | 00:00:00.0000000 | 1675209600 | 2023-02-01 00:00:00.0000000 |
2023-02-01 | 04:32:02.4000000 | 1675225922.4 | 2023-02-01 00:00:00.0000000 |
2023-02-01 | 09:04:04.8000000 | 1675242244.8 | 2023-02-01 00:00:00.0000000 |
2023-02-01 | 13:36:07.2000000 | 1675258567.2 | 2023-02-01 00:00:00.0000000 |
… | … | … | … |
Incremented steps
The following example creates a table with a single column called Steps
whose type is long
and results in values from one to eight incremented by three.
range Steps from 1 to 8 step 3
Output
Steps |
---|
1 |
4 |
7 |
Traces over a time range
The following example shows how the range
operator can be used to create a dimension table that is used to introduce zeros where the source data has no values. It takes timestamps from the last four hours and counts traces for each one-minute interval. When there are no traces for a specific interval, the count is zero.
range TIMESTAMP from ago(4h) to now() step 1m
| join kind=fullouter
(Traces
| where TIMESTAMP > ago(4h)
| summarize Count=count() by bin(TIMESTAMP, 1m)
) on TIMESTAMP
| project Count=iff(isnull(Count), 0, Count), TIMESTAMP
| render timechart
15.32 - reduce operator
Groups a set of strings together based on value similarity.
For each such group, the operator returns a pattern
, count
, and representative
. The pattern
best describes the group, in which the *
character represents a wildcard. The count
is the number of values in the group, and the representative
is one of the original values in the group.
Syntax
T |
reduce
[kind
=
ReduceKind] by
Expr [with
[threshold
=
Threshold] [,
characters
=
Characters]]
Parameters
Name | Type | Required | Description |
---|---|---|---|
Expr | string | ✔️ | The value by which to reduce. |
Threshold | real | A value between 0 and 1 that determines the minimum fraction of rows required to match the grouping criteria in order to trigger a reduction operation. The default value is 0.1. We recommend setting a small threshold value for large inputs. With a smaller threshold value, more similar values are grouped together, resulting in fewer but more similar groups. A larger threshold value requires less similarity, resulting in more groups that are less similar. See Examples. | |
ReduceKind | string | The only valid value is source . If source is specified, the operator appends the Pattern column to the existing rows in the table instead of aggregating by Pattern . |
Returns
A table with as many rows as there are groups and columns titled pattern
, count
, and representative
. The pattern
best describes the group, in which the *
character represents a wildcard, or placeholder for an arbitrary insertion string. The count
is the number of values in the group, and the representative
is one of the original values in the group.
For example, the result of reduce by city
might include:
Pattern | Count | Representative |
---|---|---|
San * | 5182 | San Bernard |
Saint * | 2846 | Saint Lucy |
Moscow | 3726 | Moscow |
* -on- * | 2730 | One -on- One |
Paris | 2716 | Paris |
Examples
The example in this section shows how to use the syntax to help you get started.
Small threshold value
This query generates a range of numbers, creates a new column with concatenated strings and random integers, and then groups the rows by the new column with specific reduction parameters.
range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText with threshold=0.001 , characters = "X"
Output
Pattern | Count | Representative |
---|---|---|
MachineLearning* | 1000 | MachineLearningX4 |
Large threshold value
This query generates a range of numbers, creates a new column with concatenated strings and random integers, and then groups the rows by the new column with specific reduction parameters.
range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText with threshold=0.9 , characters = "X"
Output
The result includes only those groups where the MyText value appears in at least 90% of the rows.
Pattern | Count | Representative |
---|---|---|
MachineLearning* | 177 | MachineLearningX9 |
MachineLearning* | 102 | MachineLearningX0 |
MachineLearning* | 106 | MachineLearningX1 |
MachineLearning* | 96 | MachineLearningX6 |
MachineLearning* | 110 | MachineLearningX4 |
MachineLearning* | 100 | MachineLearningX3 |
MachineLearning* | 99 | MachineLearningX8 |
MachineLearning* | 104 | MachineLearningX7 |
MachineLearning* | 106 | MachineLearningX2 |
Behavior of Characters
parameter
If the Characters parameter is unspecified, then every non-ascii numeric character becomes a term separator.
range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str
Output
Pattern | Count | Representative |
---|---|---|
others | 10 |
However, if you specify that “Z” is a separator, then it’s as if each value in str
is two terms: foo
and tostring(x)
:
range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str with characters="Z"
Output
Pattern | Count | Representative |
---|---|---|
foo* | 10 | fooZ1 |
Apply reduce
to sanitized input
The following example shows how one might apply the reduce
operator to a “sanitized”
input, in which GUIDs in the column being reduced are replaced before reducing:
Start with a few records from the Trace table. Then reduce the Text column which includes random GUIDs. As random GUIDs interfere with the reduce operation, replace them all by the string “GUID”. Now perform the reduce operation. In case there are other “quasi-random” identifiers with embedded ‘-’ or ‘_’ characters in them, treat characters as non-term-breakers.
Trace
| take 10000
| extend Text = replace(@"[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}", "GUID", Text)
| reduce by Text with characters="-_"
Related content
15.33 - sample operator
Returns up to the specified number of random rows from the input table.
Syntax
T | sample
NumberOfRows
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
NumberOfRows | int, long, or real | ✔️ | The number of rows to return. You can specify any numeric expression. |
Examples
The example in this section shows how to use the syntax to help you get started.
Generate a sample
This query creates a range of numbers, samples one value, and then duplicates that sample.
let _data = range x from 1 to 100 step 1;
let _sample = _data | sample 1;
union (_sample), (_sample)
Output
x |
---|
74 |
63 |
To ensure that in example above _sample
is calculated once, one can use materialize() function:
let _data = range x from 1 to 100 step 1;
let _sample = materialize(_data | sample 1);
union (_sample), (_sample)
Output
x |
---|
24 |
24 |
Generate a sample of a certain percentage of data
To sample a certain percentage of your data (rather than a specified number of rows), you can use
StormEvents | where rand() < 0.1
Output
The table contains the first few rows of the output. Run the query to view the full result.
StartTime | EndTime | EpisodeId | EventId | State | EventType |
---|---|---|---|---|---|
2007-01-01T00:00:00Z | 2007-01-20T10:24:00Z | 2403 | 11914 | INDIANA | Flood |
2007-01-01T00:00:00Z | 2007-01-24T18:47:00Z | 2408 | 11930 | INDIANA | Flood |
2007-01-01T00:00:00Z | 2007-01-01T12:00:00Z | 1979 | 12631 | DELAWARE | Heavy Rain |
2007-01-01T00:00:00Z | 2007-01-01T00:00:00Z | 2592 | 13208 | NORTH CAROLINA | Thunderstorm Wind |
2007-01-01T00:00:00Z | 2007-01-31T23:59:00Z | 1492 | 7069 | MINNESOTA | Drought |
2007-01-01T00:00:00Z | 2007-01-31T23:59:00Z | 2240 | 10858 | TEXAS | Drought |
… | … | … | … | … | … |
Generate a sample of keys
To sample keys rather than rows (for example - sample 10 Ids and get all rows for these Ids), you can use sample-distinct
in combination with the in
operator.
let sampleEpisodes = StormEvents | sample-distinct 10 of EpisodeId;
StormEvents
| where EpisodeId in (sampleEpisodes)
Output
The table contains the first few rows of the output. Run the query to view the full result.
StartTime | EndTime | EpisodeId | EventId | State | EventType |
---|---|---|---|---|---|
2007-09-18T20:00:00Z | 2007-09-19T18:00:00Z | 11074 | 60904 | FLORIDA | Heavy Rain |
2007-09-20T21:57:00Z | 2007-09-20T22:05:00Z | 11078 | 60913 | FLORIDA | Tornado |
2007-09-29T08:11:00Z | 2007-09-29T08:11:00Z | 11091 | 61032 | ATLANTIC SOUTH | Waterspout |
2007-12-07T14:00:00Z | 2007-12-08T04:00:00Z | 13183 | 73241 | AMERICAN SAMOA | Flash Flood |
2007-12-11T21:45:00Z | 2007-12-12T16:45:00Z | 12826 | 70787 | KANSAS | Flood |
2007-12-13T09:02:00Z | 2007-12-13T10:30:00Z | 11780 | 64725 | KENTUCKY | Flood |
… | … | … | … | … | … |
15.34 - sample-distinct operator
Returns a single column that contains up to the specified number of distinct values of the requested column.
The operator tries to return an answer as quickly as possible rather than trying to make a fair sample.
Syntax
T | sample-distinct
NumberOfValues of
ColumnName
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
NumberOfValues | int, long, or real | ✔️ | The number distinct values of T to return. You can specify any numeric expression. |
ColumnName | string | ✔️ | The name of the column from which to sample. |
Examples
The example in this section shows how to use the syntax to help you get started.
Get 10 distinct values from a population
StormEvents | sample-distinct 10 of EpisodeId
Output
EpisodeId |
---|
11074 |
11078 |
11749 |
12554 |
12561 |
13183 |
11780 |
11781 |
12826 |
Further compute the sample values
let sampleEpisodes = StormEvents | sample-distinct 10 of EpisodeId;
StormEvents
| where EpisodeId in (sampleEpisodes)
| summarize totalInjuries=sum(InjuriesDirect) by EpisodeId
Output
EpisodeId | totalInjuries |
---|---|
11091 | 0 |
11074 | 0 |
11078 | 0 |
11749 | 0 |
12554 | 3 |
12561 | 0 |
13183 | 0 |
11780 | 0 |
11781 | 0 |
12826 | 0 |
15.35 - scan operator
Scans data, matches, and builds sequences based on the predicates.
Matching records are determined according to predicates defined in the operator’s steps. A predicate can depend on the state that is generated by previous steps. The output for the matching record is determined by the input record and assignments defined in the operator’s steps.
Syntax
T | scan
[ with_match_id
=
MatchIdColumnName ] [ declare
(
ColumnDeclarations )
] with
(
StepDefinitions )
ColumnDeclarations syntax
ColumnName :
ColumnType[=
DefaultValue ] [,
… ]
StepDefinition syntax
step
StepName [ output
= all
| last
| none
] :
Condition [ =>
Column =
Assignment [,
… ] ] ;
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular source. |
MatchIdColumnName | string | The name of a column of type long that is appended to the output as part of the scan execution. Indicates the 0-based index of the match for the record. | |
ColumnDeclarations | string | Declares an extension to the schema of T. These columns are assigned values in the steps. If not assigned, the DefaultValue is returned. Unless otherwise specified, DefaultValue is null . | |
StepName | string | ✔️ | Used to reference values in the state of scan for conditions and assignments. The step name must be unique. |
Condition | string | ✔️ | An expression that evaluates to true or false that defines which records from the input match the step. A record matches the step when the condition is true with the step’s state or with the previous step’s state. |
Assignment | string | A scalar expression that is assigned to the corresponding column when a record matches a step. | |
output | string | Controls the output logic of the step on repeated matches. all outputs all records matching the step, last outputs only the last record in a series of repeating matches for the step, and none doesn’t output records matching the step. The default is all . |
Returns
A record for each match of a record from the input to a step. The schema of the output is the schema of the source extended with the column in the declare
clause.
Scan logic
scan
goes over the serialized input data, record by record, comparing each record against each step’s condition while taking into account the current state of each step.
State
The underlying state of the scan
operator can be thought of as a table with a row for each step
. Each step maintains its own state with the latest values of the columns and declared variables from all of the previous steps and the current step. If relevant, it also holds the match ID for the ongoing sequence.
If a scan operator has n steps named s_1, s_2, …, s_n then step s_k would have k records in its state corresponding to s_1, s_2, …, s_k. The StepName.ColumnName format is used to reference a value in the state. For instance, s_2.col1
would reference column col1
that belongs to step s_2 in the state of s_k. For a detailed example, see the scan logic walkthrough.
The state starts empty and updates whenever a scanned input record matches a step. When the state of the current step is nonempty, the step is referred to as having an active sequence.
Matching logic
Each input record is evaluated against all of the steps in reverse order, from the last step to the first. When a record r is evaluated against some step s_k, the following logic is applied:
Check 1: If the state of the previous step (s_k-1) is nonempty, and r meets the Condition of s_k, then a match occurs. The match leads to the following actions:
- The state of s_k is cleared.
- The state of s_k-1 is promoted to become the state of s_k.
- The assignments of s_k are calculated and extend r.
- The extended r is added to the output and to the state of s_k.
[!NOTE] If Check 1 results in a match, Check 2 is disregarded, and r moves on to be evaluated against s_k-1.
Check 2: If the state of s_k has an active sequence or s_k is the first step, and r meets the Condition of s_k, then a match occurs. The match leads to the following actions:
- The assignments of s_k are calculated and extend r.
- The values that represent s_k in the state of s_k are replaced with the values of the extended r.
- If s_k is defined as
output=all
, the extended r is added to the output. - If s_k is the first step, a new sequence begins and the match ID increases by
1
. This only affects the output whenwith_match_id
is used.
Once the checks for s_k are complete, r moves on to be evaluated against s_k-1.
For a detailed example of this logic, see the scan logic walkthrough.
Examples
The example in this section shows how to use the syntax to help you get started.
Cumulative sum
Calculate the cumulative sum for an input column. The result of this example is equivalent to using row_cumsum().
range x from 1 to 5 step 1
| scan declare (cumulative_x:long=0) with
(
step s1: true => cumulative_x = x + s1.cumulative_x;
)
Output
x | cumulative_x |
---|---|
1 | 1 |
2 | 3 |
3 | 6 |
4 | 10 |
5 | 15 |
Cumulative sum on multiple columns with a reset condition
Calculate the cumulative sum for two input columns, reset the sum value to the current record value whenever the cumulative sum reached 10 or more.
range x from 1 to 5 step 1
| extend y = 2 * x
| scan declare (cumulative_x:long=0, cumulative_y:long=0) with
(
step s1: true => cumulative_x = iff(s1.cumulative_x >= 10, x, x + s1.cumulative_x),
cumulative_y = iff(s1.cumulative_y >= 10, y, y + s1.cumulative_y);
)
Output
x | y | cumulative_x | cumulative_y |
---|---|---|---|
1 | 2 | 1 | 2 |
2 | 4 | 3 | 6 |
3 | 6 | 6 | 12 |
4 | 8 | 10 | 8 |
5 | 10 | 5 | 18 |
Fill forward a column
Fill forward a string column. Each empty value is assigned the last seen nonempty value.
let Events = datatable (Ts: timespan, Event: string) [
0m, "A",
1m, "",
2m, "B",
3m, "",
4m, "",
6m, "C",
8m, "",
11m, "D",
12m, ""
]
;
Events
| sort by Ts asc
| scan declare (Event_filled: string="") with
(
step s1: true => Event_filled = iff(isempty(Event), s1.Event_filled, Event);
)
Output
Ts | Event | Event_filled |
---|---|---|
00:00:00 | A | A |
00:01:00 | A | |
00:02:00 | B | B |
00:03:00 | B | |
00:04:00 | B | |
00:06:00 | C | C |
00:08:00 | C | |
00:11:00 | D | D |
00:12:00 | D |
Sessions tagging
Divide the input into sessions: a session ends 30 minutes after the first event of the session, after which a new session starts. Note the use of with_match_id
flag, which assigns a unique value for each distinct match (session) of scan. Also note the special use of two steps in this example, inSession
has true
as condition so it captures and outputs all the records from the input while endSession
captures records that happen more than 30m from the sessionStart
value for the current match. The endSession
step has output=none
meaning it doesn’t produce output records. The endSession
step is used to advance the state of the current match from inSession
to endSession
, allowing a new match (session) to begin, starting from the current record.
let Events = datatable (Ts: timespan, Event: string) [
0m, "A",
1m, "A",
2m, "B",
3m, "D",
32m, "B",
36m, "C",
38m, "D",
41m, "E",
75m, "A"
]
;
Events
| sort by Ts asc
| scan with_match_id=session_id declare (sessionStart: timespan) with
(
step inSession: true => sessionStart = iff(isnull(inSession.sessionStart), Ts, inSession.sessionStart);
step endSession output=none: Ts - inSession.sessionStart > 30m;
)
Output
Ts | Event | sessionStart | session_id |
---|---|---|---|
00:00:00 | A | 00:00:00 | 0 |
00:01:00 | A | 00:00:00 | 0 |
00:02:00 | B | 00:00:00 | 0 |
00:03:00 | D | 00:00:00 | 0 |
00:32:00 | B | 00:32:00 | 1 |
00:36:00 | C | 00:32:00 | 1 |
00:38:00 | D | 00:32:00 | 1 |
00:41:00 | E | 00:32:00 | 1 |
01:15:00 | A | 01:15:00 | 2 |
Events between Start and Stop
Find all sequences of events between the event Start
and the event Stop
that occur within 5 minutes. Assign a match ID for each sequence.
let Events = datatable (Ts: timespan, Event: string) [
0m, "A",
1m, "Start",
2m, "B",
3m, "D",
4m, "Stop",
6m, "C",
8m, "Start",
11m, "E",
12m, "Stop"
]
;
Events
| sort by Ts asc
| scan with_match_id=m_id with
(
step s1: Event == "Start";
step s2: Event != "Start" and Event != "Stop" and Ts - s1.Ts <= 5m;
step s3: Event == "Stop" and Ts - s1.Ts <= 5m;
)
Output
Ts | Event | m_id |
---|---|---|
00:01:00 | Start | 0 |
00:02:00 | B | 0 |
00:03:00 | D | 0 |
00:04:00 | Stop | 0 |
00:08:00 | Start | 1 |
00:11:00 | E | 1 |
00:12:00 | Stop | 1 |
Calculate a custom funnel of events
Calculate a funnel completion of the sequence Hail
-> Tornado
-> Thunderstorm Wind
by State
with custom thresholds on the times between the events (Tornado
within 1h
and Thunderstorm Wind
within 2h
). This example is similar to the funnel_sequence_completion plugin, but allows greater flexibility.
StormEvents
| partition hint.strategy=native by State
(
sort by StartTime asc
| scan with
(
step hail: EventType == "Hail";
step tornado: EventType == "Tornado" and StartTime - hail.StartTime <= 1h;
step thunderstormWind: EventType == "Thunderstorm Wind" and StartTime - tornado.StartTime <= 2h;
)
)
| summarize dcount(State) by EventType
Output
EventType | dcount_State |
---|---|
Hail | 50 |
Tornado | 34 |
Thunderstorm Wind | 32 |
Scan logic walkthrough
This section demonstrates the scan logic using a step-by-step walkthrough of the Events between start and stop example:
let Events = datatable (Ts: timespan, Event: string) [
0m, "A",
1m, "Start",
2m, "B",
3m, "D",
4m, "Stop",
6m, "C",
8m, "Start",
11m, "E",
12m, "Stop"
]
;
Events
| sort by Ts asc
| scan with_match_id=m_id with
(
step s1: Event == "Start";
step s2: Event != "Start" and Event != "Stop" and Ts - s1.Ts <= 5m;
step s3: Event == "Stop" and Ts - s1.Ts <= 5m;
)
Output
Ts | Event | m_id |
---|---|---|
00:01:00 | Start | 0 |
00:02:00 | B | 0 |
00:03:00 | D | 0 |
00:04:00 | Stop | 0 |
00:08:00 | Start | 1 |
00:11:00 | E | 1 |
00:12:00 | Stop | 1 |
The state
Think of the state of the scan
operator as a table with a row for each step, in which each step has its own state. This state contains the latest values of the columns and declared variables from all of the previous steps and the current step. To learn more, see State.
For this example, the state can be represented with the following table:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | X | X | |||||
s3 |
The “X” indicates that a specific field is irrelevant for that step.
The matching logic
This section follows the matching logic through each record of the Events
table, explaining the transformation of the state and output at each step.
Record 1
Ts | Event |
---|---|
0m | “A” |
Record evaluation at each step:
s3
: Check 1 isn’t passed because the state ofs2
is empty, and Check 2 isn’t passed becauses3
lacks an active sequence.s2
: Check 1 isn’t passed because the state ofs1
is empty, and Check 2 isn’t passed becauses2
lacks an active sequence.s1
: Check 1 is irrelevant because there’s no previous step. Check 2 isn’t passed because the record doesn’t meet the condition ofEvent == "Start"
. Record 1 is discarded without affecting the state or output.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | X | X | |||||
s3 |
Record 2
Ts | Event |
---|---|
1m | “Start” |
Record evaluation at each step:
s3
: Check 1 isn’t passed because the state ofs2
is empty, and Check 2 isn’t passed becauses3
lacks an active sequence.s2
: Check 1 isn’t passed because the state ofs1
is empty, and Check 2 isn’t passed becauses2
lacks an active sequence.s1
: Check 1 is irrelevant because there’s no previous step. Check 2 is passed because the record meets the condition ofEvent == "Start"
. This match initiates a new sequence, and them_id
is assigned. Record 2 and itsm_id
(0
) are added to the state and the output.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | 0 | 00:01:00 | “Start” | X | X | X | X |
s2 | X | X | |||||
s3 |
Record 3
Ts | Event |
---|---|
2m | “B” |
Record evaluation at each step:
s3
: Check 1 isn’t passed because the state ofs2
is empty, and Check 2 isn’t passed becauses3
lacks an active sequence.s2
: Check 1 is passed because the state ofs1
is nonempty and the record meets the condition ofTs - s1.Ts < 5m
. This match causes the state ofs1
to be cleared and the sequence ins1
to be promoted tos2
. Record 3 and itsm_id
(0
) are added to the state and the output.s1
: Check 1 is irrelevant because there’s no previous step, and Check 2 isn’t passed because the record doesn’t meet the condition ofEvent == "Start"
.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | 0 | 00:01:00 | “Start” | 00:02:00 | “B” | X | X |
s3 |
Record 4
Ts | Event |
---|---|
3m | “D” |
Record evaluation at each step:
s3
: Check 1 isn’t passed because the record doesn’t meet the condition ofEvent == "Stop"
, and Check 2 isn’t passed becauses3
lacks an active sequence.s2
: Check 1 isn’t passed because the state ofs1
is empty. it passes Check 2 because it meets the condition ofTs - s1.Ts < 5m
. Record 4 and itsm_id
(0
) are added to the state and the output. The values from this record overwrite the previous state values fors2.Ts
ands2.Event
.s1
: Check 1 is irrelevant because there’s no previous step, and Check 2 isn’t passed because the record doesn’t meet the condition ofEvent == "Start"
.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | 0 | 00:01:00 | “Start” | 00:03:00 | “D” | X | X |
s3 |
Record 5
Ts | Event |
---|---|
4m | “Stop” |
Record evaluation at each step:
s3
: Check 1 is passed becauses2
is nonempty and it meets thes3
condition ofEvent == "Stop"
. This match causes the state ofs2
to be cleared and the sequence ins2
to be promoted tos3
. Record 5 and itsm_id
(0
) are added to the state and the output.s2
: Check 1 isn’t passed because the state ofs1
is empty, and Check 2 isn’t passed becauses2
lacks an active sequence.s1
: Check 1 is irrelevant because there’s no previous step. Check 2 isn’t passed because the record doesn’t meet the condition ofEvent == "Start"
.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | X | X | |||||
s3 | 0 | 00:01:00 | “Start” | 00:03:00 | “D” | 00:04:00 | “Stop” |
Record 6
Ts | Event |
---|---|
6m | “C” |
Record evaluation at each step:
s3
: Check 1 isn’t passed because the state ofs2
is empty, and Check 2 isn’t passed becauses3
doesn’t meet thes3
condition ofEvent == "Stop"
.s2
: Check 1 isn’t passed because the state ofs1
is empty, and Check 2 isn’t passed becauses2
lacks an active sequence.s1
: Check 1 isn’t passed because there’s no previous step, and Check 2 isn’t passed because it doesn’t meet the condition ofEvent == "Start"
. Record 6 is discarded without affecting the state or output.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | X | X | |||||
s3 | 0 | 00:01:00 | “Start” | 00:03:00 | “D” | 00:04:00 | “Stop” |
Record 7
Ts | Event |
---|---|
8m | “Start” |
Record evaluation at each step:
s3
: Check 1 isn’t passed because the state ofs2
is empty, and Check 2 isn’t passed because it doesn’t meet the condition ofEvent == "Stop"
.s2
: Check 1 isn’t passed because the state ofs1
is empty, and Check 2 isn’t passed becauses2
lacks an active sequence.s1
: Check 1 isn’t passed because there’s no previous step. it passes Check 2 because it meets the condition ofEvent == "Start"
. This match initiates a new sequence ins1
with a newm_id
. Record 7 and itsm_id
(1
) are added to the state and the output.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | 1 | 00:08:00 | “Start” | X | X | X | X |
s2 | X | X | |||||
s3 | 0 | 00:01:00 | “Start” | 00:03:00 | “D” | 00:04:00 | “Stop” |
Record 8
Ts | Event |
---|---|
11m | “E” |
Record evaluation at each step:
s3
: Check 1 isn’t passed because the state ofs2
is empty, and Check 2 isn’t passed because it doesn’t meet thes3
condition ofEvent == "Stop"
.s2
: Check 1 is passed because the state ofs1
is nonempty and the record meets the condition ofTs - s1.Ts < 5m
. This match causes the state ofs1
to be cleared and the sequence ins1
to be promoted tos2
. Record 8 and itsm_id
(1
) are added to the state and the output.s1
: Check 1 is irrelevant because there’s no previous step, and Check 2 isn’t passed because the record doesn’t meet the condition ofEvent == "Start"
.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | 1 | 00:08:00 | “Start” | 00:11:00 | “E” | X | X |
s3 | 0 | 00:01:00 | “Start” | 00:03:00 | “D” | 00:04:00 | “Stop” |
Record 9
Ts | Event |
---|---|
12m | “Stop” |
Record evaluation at each step:
s3
: Check 1 is passed becauses2
is nonempty and it meets thes3
condition ofEvent == "Stop"
. This match causes the state ofs2
to be cleared and the sequence ins2
to be promoted tos3
. Record 9 and itsm_id
(1
) are added to the state and the output.s2
: Check 1 isn’t passed because the state ofs1
is empty, and Check 2 isn’t passed becauses2
lacks an active sequence.s1
: Check 1 isn’t passed because there’s no previous step. it passes Check 2 because it meets the condition ofEvent == "Start"
. This match initiates a new sequence ins1
with a newm_id
.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | X | X | |||||
s3 | 1 | 00:08:00 | “Start” | 00:11:00 | “E” | 00:12:00 | “Stop” |
Final output
Ts | Event | m_id |
---|---|---|
00:01:00 | Start | 0 |
00:02:00 | B | 0 |
00:03:00 | D | 0 |
00:04:00 | Stop | 0 |
00:08:00 | Start | 1 |
00:11:00 | E | 1 |
00:12:00 | Stop | 1 |
15.36 - search operator
Searches a text pattern in multiple tables and columns.
Syntax
[T |
] search
[kind=
CaseSensitivity ] [in
(
TableSources)
] SearchPredicate
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | The tabular data source to be searched over, such as a table name, a union operator, or the results of a tabular query. Can’t be specified together with TableSources. | |
CaseSensitivity | string | A flag that controls the behavior of all string scalar operators, such as has , with respect to case sensitivity. Valid values are default , case_insensitive , case_sensitive . The options default and case_insensitive are synonymous, since the default behavior is case insensitive. | |
TableSources | string | A comma-separated list of “wildcarded” table names to take part in the search. The list has the same syntax as the list of the union operator. Can’t be specified together with tabular data source (T). | |
SearchPredicate | string | ✔️ | A boolean expression to be evaluated for every record in the input. If it returns true , the record is outputted. See Search predicate syntax. |
Search predicate syntax
The SearchPredicate allows you to search for specific terms in all columns of a table. The operator that is applied to a search term depends on the presence and placement of a wildcard asterisk (*
) in the term, as shown in the following table.
Literal | Operator |
---|---|
billg | has |
*billg | hassuffix |
billg* | hasprefix |
*billg* | contains |
bi*lg | matches regex |
You can also restrict the search to a specific column, look for an exact match instead of a term match, or search by regular expression. The syntax for each of these cases is shown in the following table.
Syntax | Explanation |
---|---|
ColumnName: StringLiteral | This syntax can be used to restrict the search to a specific column. The default behavior is to search all columns. |
ColumnName== StringLiteral | This syntax can be used to search for exact matches of a column against a string value. The default behavior is to look for a term-match. |
Column matches regex StringLiteral | This syntax indicates regular expression matching, in which StringLiteral is the regex pattern. |
Use boolean expressions to combine conditions and create more complex searches. For example, "error" and x==123
would result in a search for records that have the term error
in any columns and the value 123
in the x
column.
Search predicate syntax examples
# | Syntax | Meaning (equivalent where ) | Comments |
---|---|---|---|
1 | search "err" | where * has "err" | |
2 | search in (T1,T2,A*) "err" | union T1,T2,A* | where * has “err” | |
3 | search col:"err" | where col has "err" | |
4 | search col=="err" | where col=="err" | |
5 | search "err*" | where * hasprefix "err" | |
6 | search "*err" | where * hassuffix "err" | |
7 | search "*err*" | where * contains "err" | |
8 | search "Lab*PC" | where * matches regex @"\bLab.*PC\b" | |
9 | search * | where 0==0 | |
10 | search col matches regex "..." | where col matches regex "..." | |
11 | search kind=case_sensitive | All string comparisons are case-sensitive | |
12 | search "abc" and ("def" or "hij") | where * has "abc" and (* has "def" or * has hij") | |
13 | search "err" or (A>a and A<b) | where * has "err" or (A>a and A<b) |
Remarks
Unlike the find operator, the search
operator doesn’t support the following syntax:
withsource=
: The output always includes a column called$table
of typestring
whose value is the table name from which each record was retrieved (or some system-generated name if the source isn’t a table but a composite expression).project=
,project-smart
: The output schema is equivalent toproject-smart
output schema.
Examples
The example in this section shows how to use the syntax to help you get started.
Global term search
Search for the term Green in all the tables of the ContosoSales database.
The output finds records with the term Green as a last name or a color in the Customers
, Products
, and SalesTable
tables.
search "Green"
Output
$table | CityName | ContinentName | CustomerKey | Education | FirstName | Gender | LastName |
---|---|---|---|---|---|---|---|
Customers | Ballard | North America | 16549 | Partial College | Mason | M | Green |
Customers | Bellingham | North America | 2070 | High School | Adam | M | Green |
Customers | Bellingham | North America | 10658 | Bachelors | Sara | F | Green |
Customers | Beverly Hills | North America | 806 | Graduate Degree | Richard | M | Green |
Customers | Beverly Hills | North America | 7674 | Graduate Degree | James | M | Green |
Customers | Burbank | North America | 5241 | Graduate Degree | Madeline | F | Green |
Conditional global term search
Search for records that contain the term Green and one of either terms Deluxe or Proseware in the ContosoSales database.
search "Green" and ("Deluxe" or "Proseware")
Output
$table | ProductName | Manufacturer | ColorName | ClassName | ProductCategoryName |
---|---|---|---|---|---|
Products | Contoso 8GB Clock & Radio MP3 Player X850 Green | Contoso, Ltd | Green | Deluxe | Audio |
Products | Proseware Scan Jet Digital Flat Bed Scanner M300 Green | Proseware, Inc. | Green | Regular | Computers |
Products | Proseware All-In-One Photo Printer M200 Green | Proseware, Inc. | Green | Regular | Computers |
Products | Proseware Ink Jet Wireless All-In-One Printer M400 Green | Proseware, Inc. | Green | Regular | Computers |
Products | Proseware Ink Jet Instant PDF Sheet-Fed Scanner M300 Green | Proseware, Inc. | Green | Regular | Computers |
Products | Proseware Desk Jet All-in-One Printer, Scanner, Copier M350 Green | Proseware, Inc. | Green | Regular | Computers |
Products | Proseware Duplex Scanner M200 Green | Proseware, Inc. | Green | Regular | Computers |
Search a specific table
Search for the term Green only in the Customers
table.
search in (Products) "Green"
Output
$table | ProductName | Manufacturer | ColorName |
---|---|---|---|
Products | Contoso 4G MP3 Player E400 Green | Contoso, Ltd | Green |
Products | Contoso 8GB Super-Slim MP3/Video Player M800 Green | Contoso, Ltd | Green |
Products | Contoso 16GB Mp5 Player M1600 Green | Contoso, Ltd | Green |
Products | Contoso 8GB Clock & Radio MP3 Player X850 Green | Contoso, Ltd | Green |
Products | NT Wireless Bluetooth Stereo Headphones M402 Green | Northwind Traders | Green |
Products | NT Wireless Transmitter and Bluetooth Headphones M150 Green | Northwind Traders | Green |
Case-sensitive search
Search for records that match the case-sensitive term in the ContosoSales database.
search kind=case_sensitive "blue"
Output
$table | ProductName | Manufacturer | ColorName | ClassName |
---|---|---|---|---|
Products | Contoso 16GB New Generation MP5 Player M1650 blue | Contoso, Ltd | blue | Regular |
Products | Contoso Bright Light battery E20 blue | Contoso, Ltd | blue | Economy |
Products | Litware 120mm Blue LED Case Fan E901 blue | Litware, Inc. | blue | Economy |
NewSales | Litware 120mm Blue LED Case Fan E901 blue | Litware, Inc. | blue | Economy |
NewSales | Litware 120mm Blue LED Case Fan E901 blue | Litware, Inc. | blue | Economy |
NewSales | Litware 120mm Blue LED Case Fan E901 blue | Litware, Inc. | blue | Economy |
NewSales | Litware 120mm Blue LED Case Fan E901 blue | Litware, Inc. | blue | Economy |
Search specific columns
Search for the terms Aaron and Hughes, in the “FirstName” and “LastName” columns respectively, in the ContosoSales database.
search FirstName:"Aaron" or LastName:"Hughes"
Output
$table | CustomerKey | Education | FirstName | Gender | LastName |
---|---|---|---|---|---|
Customers | 18285 | High School | Riley | F | Hughes |
Customers | 802 | Graduate Degree | Aaron | M | Sharma |
Customers | 986 | Bachelors | Melanie | F | Hughes |
Customers | 12669 | High School | Jessica | F | Hughes |
Customers | 13436 | Graduate Degree | Mariah | F | Hughes |
Customers | 10152 | Graduate Degree | Aaron | M | Campbell |
Limit search by timestamp
Search for the term Hughes in the ContosoSales database, if the term appears in a record with a date greater than the given date in ‘datetime’.
search "Hughes" and DateKey > datetime('2009-01-01')
Output
$table | DateKey | SalesAmount_real |
---|---|---|
SalesTable | 2021-12-13T00:00:00Z | 446.4715 |
SalesTable | 2021-12-13T00:00:00Z | 120.555 |
SalesTable | 2021-12-13T00:00:00Z | 48.4405 |
SalesTable | 2021-12-13T00:00:00Z | 39.6435 |
SalesTable | 2021-12-13T00:00:00Z | 56.9905 |
Performance Tips
# | Tip | Prefer | Over |
---|---|---|---|
1 | Prefer to use a single search operator over several consecutive search operators | search "billg" and ("steveb" or "satyan") | search “billg” | search “steveb” or “satyan” |
2 | Prefer to filter inside the search operator | search "billg" and "steveb" | search * | where * has “billg” and * has “steveb” |
15.37 - serialize operator
Marks that the order of the input row set is safe to use for window functions.
The operator has a declarative meaning. It marks the input row set as serialized (ordered), so that window functions can be applied to it.
Syntax
serialize
[Name1 =
Expr1 [,
Name2 =
Expr2]…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
Name | string | The name of the column to add or update. If omitted, the output column name is automatically generated. | |
Expr | string | ✔️ | The calculation to perform over the input. |
Examples
The example in this section shows how to use the syntax to help you get started.
Serialize subset of rows by condition
This query retrieves all log entries from the TraceLogs table that have a specific ClientRequestId and preserves the order of these entries during processing.
TraceLogs
| where ClientRequestId == "5a848f70-9996-eb17-15ed-21b8eb94bf0e"
| serialize
Output
This table only shows the top 5 query results.
Timestamp | Node | Component | ClientRequestId | Message |
---|---|---|---|---|
2014-03-08T12:24:55.5464757Z | Engine000000000757 | INGESTOR_GATEWAY | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | $$IngestionCommand table=fogEvents format=json |
2014-03-08T12:24:56.0929514Z | Engine000000000757 | DOWNLOADER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | Downloading file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_0.json.gz"" |
2014-03-08T12:25:40.3574831Z | Engine000000000341 | INGESTOR_EXECUTER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | IngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_0.json.gz"" |
2014-03-08T12:25:40.9039588Z | Engine000000000341 | DOWNLOADER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | Downloading file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_1.json.gz"" |
2014-03-08T12:26:25.1684905Z | Engine000000000057 | INGESTOR_EXECUTER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | IngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_1.json.gz"" |
… | … | … | … | … |
Add row number to the serialized table
To add a row number to the serialized table, use the row_number() function.
TraceLogs
| where ClientRequestId == "5a848f70-9996-eb17-15ed-21b8eb94bf0e"
| serialize rn = row_number()
Output
This table only shows the top 5 query results.
Timestamp | rn | Node | Component | ClientRequestId | Message |
---|---|---|---|---|---|
2014-03-08T13:00:01.6638235Z | 1 | Engine000000000899 | INGESTOR_EXECUTER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | IngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_46.json.gz"" |
2014-03-08T13:00:02.2102992Z | 2 | Engine000000000899 | DOWNLOADER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | Downloading file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_47.json.gz"" |
2014-03-08T13:00:46.4748309Z | 3 | Engine000000000584 | INGESTOR_EXECUTER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | IngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_47.json.gz"" |
2014-03-08T13:00:47.0213066Z | 4 | Engine000000000584 | DOWNLOADER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | Downloading file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_48.json.gz"" |
2014-03-08T13:01:31.2858383Z | 5 | Engine000000000380 | INGESTOR_EXECUTER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | IngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_48.json.gz"" |
… | … | … | … | … |
Serialization behavior of operators
The output row set of the following operators is marked as serialized.
The output row set of the following operators is marked as nonserialized.
All other operators preserve the serialization property. If the input row set is serialized, then the output row set is also serialized.
15.38 - Shuffle query
The shuffle
query is a semantic-preserving transformation used with a set of operators that support the shuffle
strategy. Depending on the data involved, querying with the shuffle
strategy can yield better performance. It’s better to use the shuffle query strategy when the shuffle
key (a join
key, summarize
key, make-series
key or partition
key) has a high cardinality and the regular operator query hits query limits.
You can use the following operators with the shuffle command:
To use the shuffle
query strategy, add the expression hint.strategy = shuffle
or hint.shufflekey = <key>
. When you use hint.strategy=shuffle
, the operator data will be shuffled by all the keys. Use this expression when the compound key is unique but each key isn’t unique enough, so you’ll shuffle the data using all the keys of the shuffled operator.
When partitioning data with the shuffle strategy, the data load is shared on all cluster nodes. Each node processes one partition of the data. The default number of partitions is equal to the number of cluster nodes.
The partition number can be overridden by using the syntax hint.num_partitions = total_partitions
, which will control the number of partitions. This is useful when the cluster has a small number of cluster nodes and the default partitions number will be small, and the query fails or takes a long execution time.
In some cases, the hint.strategy = shuffle
is ignored, and the query won’t run in shuffle
strategy. This can happen when:
- The
join
operator has anothershuffle
-compatible operator (join
,summarize
,make-series
orpartition
) on the left side or the right side. - The
summarize
operator appears after anothershuffle
-compatible operator (join
,summarize
,make-series
orpartition
) in the query.
Syntax
With hint.strategy
= shuffle
T |
DataExpression |
join
hint.strategy
= shuffle
(
DataExpression )
T |
summarize
hint.strategy
= shuffle
DataExpression
T |
Query |
partition hint.strategy
= shuffle
(
SubQuery )
With hint.shufflekey
= key
T |
DataExpression |
join
hint.shufflekey
= key (
DataExpression )
T |
summarize
hint.shufflekey
= key DataExpression
T |
make-series
hint.shufflekey
= key DataExpression
T |
Query |
partition hint.shufflekey
= key (
SubQuery )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular source whose data is to be processed by the operator. |
DataExpression | string | An implicit or explicit tabular transformation expression. | |
Query | string | A transformation expression run on the records of T. | |
key | string | Use a join key, summarize key, make-series key or partition key. | |
SubQuery | string | A transformation expression. |
Examples
The example in this section shows how to use the syntax to help you get started.
Use summarize with shuffle
The shuffle
strategy query with summarize
operator shares the load on all cluster nodes, where each node processes one partition of the data.
StormEvents
| summarize hint.strategy = shuffle count(), avg(InjuriesIndirect) by State
| count
Output
Count |
---|
67 |
Use join with shuffle
StormEvents
| where State has "West"
| where EventType has "Flood"
| join hint.strategy=shuffle
(
StormEvents
| where EventType has "Hail"
| project EpisodeId, State, DamageProperty
)
on State
| count
Output
Count |
---|
103 |
Use make-series with shuffle
StormEvents
| where State has "North"
| make-series hint.shufflekey = State sum(DamageProperty) default = 0 on StartTime in range(datetime(2007-01-01 00:00:00.0000000), datetime(2007-01-31 23:59:00.0000000), 15d) by State
Output
State | sum_DamageProperty | StartTime | |
---|---|---|---|
NORTH DAKOTA | [60000,0,0] | [“2006-12-31T00:00:00.0000000Z”,“2007-01-15T00:00:00.0000000Z”,“2007-01-30T00:00:00.0000000Z”] | |
NORTH CAROLINA | [20000,0,1000] | [“2006-12-31T00:00:00.0000000Z”,“2007-01-15T00:00:00.0000000Z”,“2007-01-30T00:00:00.0000000Z”] | |
ATLANTIC NORTH | [0,0,0] | [“2006-12-31T00:00:00.0000000Z”,“2007-01-15T00:00:00.0000000Z”,“2007-01-30T00:00:00.0000000Z”] |
Use partition with shuffle
StormEvents
| partition hint.strategy=shuffle by EpisodeId
(
top 3 by DamageProperty
| project EpisodeId, State, DamageProperty
)
| count
Output
Count |
---|
22345 |
Compare hint.strategy=shuffle and hint.shufflekey=key
When you use hint.strategy=shuffle
, the shuffled operator will be shuffled by all the keys. In the following example, the query shuffles the data using both EpisodeId
and EventId
as keys:
StormEvents
| where StartTime > datetime(2007-01-01 00:00:00.0000000)
| join kind = inner hint.strategy=shuffle (StormEvents | where DamageCrops > 62000000) on EpisodeId, EventId
| count
Output
Count |
---|
14 |
The following query uses hint.shufflekey = key
. The query above is equivalent to this query.
StormEvents
| where StartTime > datetime(2007-01-01 00:00:00.0000000)
| join kind = inner hint.shufflekey = EpisodeId hint.shufflekey = EventId (StormEvents | where DamageCrops > 62000000) on EpisodeId, EventId
Output
Count |
---|
14 |
Shuffle the data with multiple keys
In some cases, the hint.strategy=shuffle
will be ignored, and the query won’t run in shuffle strategy. For example, in the following example, the join has summarize on its left side, so using hint.strategy=shuffle
won’t apply shuffle strategy to the query:
StormEvents
| where StartTime > datetime(2007-01-01 00:00:00.0000000)
| summarize count() by EpisodeId, EventId
| join kind = inner hint.strategy=shuffle (StormEvents | where DamageCrops > 62000000) on EpisodeId, EventId
Output
EpisodeId | EventId | … | EpisodeId1 | EventId1 | … |
---|---|---|---|---|---|
1030 | 4407 | … | 1030 | 4407 | … |
1030 | 13721 | … | 1030 | 13721 | … |
2477 | 12530 | … | 2477 | 12530 | … |
2103 | 10237 | … | 2103 | 10237 | … |
2103 | 10239 | … | 2103 | 10239 | … |
… | … | … | … | … | … |
To overcome this issue and run in shuffle strategy, choose the key that is common for the summarize
and join
operations. In this case, this key is EpisodeId
. Use the hint hint.shufflekey
to specify the shuffle key on the join
to hint.shufflekey = EpisodeId
:
StormEvents
| where StartTime > datetime(2007-01-01 00:00:00.0000000)
| summarize count() by EpisodeId, EventId
| join kind = inner hint.shufflekey=EpisodeId (StormEvents | where DamageCrops > 62000000) on EpisodeId, EventId
Output
EpisodeId | EventId | … | EpisodeId1 | EventId1 | … |
---|---|---|---|---|---|
1030 | 4407 | … | 1030 | 4407 | … |
1030 | 13721 | … | 1030 | 13721 | … |
2477 | 12530 | … | 2477 | 12530 | … |
2103 | 10237 | … | 2103 | 10237 | … |
2103 | 10239 | … | 2103 | 10239 | … |
… | … | … | … | … | … |
Use summarize with shuffle to improve performance
In this example, using the summarize
operator with shuffle
strategy improves performance. The source table has 150M records and the cardinality of the group by key is 10M, which is spread over 10 cluster nodes.
Using summarize
operator without shuffle
strategy, the query ends after 1:08 and the memory usage peak is ~3 GB:
orders
| summarize arg_max(o_orderdate, o_totalprice) by o_custkey
| where o_totalprice < 1000
| count
Output
Count |
---|
1086 |
While using shuffle
strategy with summarize
, the query ends after ~7 seconds and the memory usage peak is 0.43 GB:
orders
| summarize hint.strategy = shuffle arg_max(o_orderdate, o_totalprice) by o_custkey
| where o_totalprice < 1000
| count
Output
Count |
---|
1086 |
The following example demonstrates performance on a cluster that has two cluster nodes, with a table that has 60M records, where the cardinality of the group by key is 2M.
Running the query without hint.num_partitions
will use only two partitions (as cluster nodes number) and the following query will take ~1:10 mins:
lineitem
| summarize hint.strategy = shuffle dcount(l_comment), dcount(l_shipdate) by l_partkey
| consume
If setting the partitions number to 10, the query will end after 23 seconds:
lineitem
| summarize hint.strategy = shuffle hint.num_partitions = 10 dcount(l_comment), dcount(l_shipdate) by l_partkey
| consume
Use join with shuffle to improve performance
The following example shows how using shuffle
strategy with the join
operator improves performance.
The examples were sampled on a cluster with 10 nodes where the data is spread over all these nodes.
The query’s left-side source table has 15M records where the cardinality of the join
key is ~14M. The query’s right-side source has 150M records and the cardinality of the join
key is 10M. The query ends after ~28 seconds and the memory usage peak is 1.43 GB:
customer
| join
orders
on $left.c_custkey == $right.o_custkey
| summarize sum(c_acctbal) by c_nationkey
When using shuffle
strategy with a join
operator, the query ends after ~4 seconds and the memory usage peak is 0.3 GB:
customer
| join
hint.strategy = shuffle orders
on $left.c_custkey == $right.o_custkey
| summarize sum(c_acctbal) by c_nationkey
In another example, we try the same queries on a larger dataset with the following conditions:
- Left-side source of the
join
is 150M and the cardinality of the key is 148M. - Right-side source of the
join
is 1.5B, and the cardinality of the key is ~100M.
The query with just the join
operator hits limits and times-out after 4 mins. However, when using shuffle
strategy with the join
operator, the query ends after ~34 seconds and the memory usage peak is 1.23 GB.
The following example shows the improvement on a cluster that has two cluster nodes, with a table of 60M records, where the cardinality of the join
key is 2M.
Running the query without hint.num_partitions
will use only two partitions (as cluster nodes number) and the following query will take ~1:10 mins:
lineitem
| summarize dcount(l_comment), dcount(l_shipdate) by l_partkey
| join
hint.shufflekey = l_partkey part
on $left.l_partkey == $right.p_partkey
| consume
When setting the partitions number to 10, the query will end after 23 seconds:
lineitem
| summarize dcount(l_comment), dcount(l_shipdate) by l_partkey
| join
hint.shufflekey = l_partkey hint.num_partitions = 10 part
on $left.l_partkey == $right.p_partkey
| consume
15.39 - sort operator
Sorts the rows of the input table into order by one or more columns.
Syntax
T | sort by
column [asc
| desc
] [nulls first
| nulls last
] [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to sort. |
column | scalar | ✔️ | The column of T by which to sort. The type of the column values must be numeric, date, time or string. |
asc or desc | string | asc sorts into ascending order, low to high. Default is desc , high to low. | |
nulls first or nulls last | string | nulls first will place the null values at the beginning and nulls last will place the null values at the end. Default for asc is nulls first . Default for desc is nulls last . |
Returns
A copy of the input table sorted in either ascending or descending order based on the provided column.
Using special floating-point values
When the input table contains the special values null
, NaN
, -inf
and +inf
, the order will be as follows:
Value | Ascending | Descending |
---|---|---|
Nulls first | null ,NaN ,-inf ,-5 ,0 ,5 ,+inf | null ,NaN ,+inf ,5 ,0 ,-5 |
Nulls last | -inf ,-5 ,0 ,+inf ,NaN ,null | +inf ,5 ,0 ,-5 ,NaN ,null |
Example
The following example shows storm events by state in alphabetical order with the most recent storms in each state appearing first.
StormEvents
| sort by State asc, StartTime desc
Output
This table only shows the top 10 query results.
StartTime | State | EventType | … |
---|---|---|---|
2007-12-28T12:10:00Z | ALABAMA | Hail | … |
2007-12-28T04:30:00Z | ALABAMA | Hail | … |
2007-12-28T04:16:00Z | ALABAMA | Hail | … |
2007-12-28T04:15:00Z | ALABAMA | Hail | … |
2007-12-28T04:13:00Z | ALABAMA | Hail | … |
2007-12-21T14:30:00Z | ALABAMA | Strong Wind | … |
2007-12-20T18:15:00Z | ALABAMA | Strong Wind | … |
2007-12-20T18:00:00Z | ALABAMA | Strong Wind | … |
2007-12-20T18:00:00Z | ALABAMA | Strong Wind | … |
2007-12-20T17:45:00Z | ALABAMA | Strong Wind | … |
2007-12-20T17:45:00Z | ALABAMA | Strong Wind | … |
15.40 - take operator
Return up to the specified number of rows.
There is no guarantee which records are returned, unless the source data is sorted. If the data is sorted, then the top values will be returned.
Syntax
take
NumberOfRows
Parameters
Name | Type | Required | Description |
---|---|---|---|
NumberOfRows | int | ✔️ | The number of rows to return. |
Paging of query results
Methods for implementing paging include:
- Export the result of a query to an external storage and paging through the generated data.
- Write a middle-tier application that provides a stateful paging API by caching the results of a Kusto query.
- Use pagination in Stored query results
Example
StormEvents | take 5
Related content
15.41 - top operator
Returns the first N records sorted by the specified column.
Syntax
T | top
NumberOfRows by
Expression [asc
| desc
] [nulls first
| nulls last
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to sort. |
NumberOfRows | int | ✔️ | The number of rows of T to return. |
Expression | string | ✔️ | The scalar expression by which to sort. |
asc or desc | string | Controls whether the selection is from the “bottom” or “top” of the range. Default desc . | |
nulls first or nulls last | string | Controls whether null values appear at the “bottom” or “top” of the range. Default for asc is nulls first . Default for desc is nulls last . |
Example
Show top three storms with most direct injuries.
StormEvents
| top 3 by InjuriesDirect
The below table shows only the relevant column. Run the query above to see more storm details for these events.
InjuriesDirect | … |
---|---|
519 | … |
422 | … |
200 | … |
Related content
- Use top-nested operator to produce hierarchical (nested) top results.
15.42 - top-hitters operator
Returns an approximation for the most popular distinct values, or the values with the largest sum, in the input.
Syntax
T |
top-hitters
NumberOfValues of
ValueExpression [ by
SummingExpression ]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
NumberOfValues | int, long, or real | ✔️ | The number of distinct values of ValueExpression. |
ValueExpression | string | ✔️ | An expression over the input table T whose distinct values are returned. |
SummingExpression | string | If specified, a numeric expression over the input table T whose sum per distinct value of ValueExpression establishes which values to emit. If not specified, the count of each distinct value of ValueExpression is used instead. |
Remarks
The first syntax (no SummingExpression) is conceptually equivalent to:
T
|
summarize
C``=``count()
by
ValueExpression
|
top
NumberOfValues by C
desc
The second syntax (with SummingExpression) is conceptually equivalent to:
T
|
summarize
S``=``sum(*SummingExpression*)
by
ValueExpression
|
top
NumberOfValues by S
desc
Examples
Get most frequent items
StormEvents
| top-hitters 5 of EventType
Output
EventType | approximate_count_EventType |
---|---|
Thunderstorm Wind | 13015 |
Hail | 12711 |
Flash Flood | 3688 |
Drought | 3616 |
Winter Weather | 3349 |
Get top hitters based on column value
The next example shows how to find the States with the most “Thunderstorm Wind” events.
StormEvents
| where EventType == "Thunderstorm Wind"
| top-hitters 10 of State
Output
State | approximate_sum_State |
---|---|
TEXAS | 830 |
GEORGIA | 609 |
MICHIGAN | 602 |
IOWA | 585 |
PENNSYLVANIA | 549 |
ILLINOIS | 533 |
NEW YORK | 502 |
VIRGINIA | 482 |
KANSAS | 476 |
OHIO | 455 |
15.43 - top-nested operator
The top-nested
operator performs hierarchical aggregation and value selection.
Imagine you have a table with sales information like regions, salespeople, and amounts sold. The top-nested
operator can help you answer complex questions, such as “What are the top five regions by sales, and who are the top three salespeople in each of those regions?”
The source data is partitioned based on the criteria set in the first top-nested
clause, such as region. Next, the operator picks the top records in each partition using an aggregation, such as adding sales amounts. Each subsequent top-nested
clause refines the partitions created by the previous clause, creating a hierarchy of more precise groups.
The result is a table with two columns per clause. One column holds the partitioning values, such as region, while the other column holds the outcomes of the aggregation calculation, like the sum of sales.
Syntax
T |
top-nested
[ N ] of
Expr [with
others
=
ConstExpr] by
Aggregation [asc
| desc
] [,
top-nested
… ]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
N | int | The number of top values to be returned for this hierarchy level. If omitted, all distinct values are returned. | |
Expr | string | ✔️ | An expression over the input record indicating which value to return for this hierarchy level. Typically, it refers to a column from T or involves a calculation like bin() on a column. Optionally, set an output column name as Name = Expr. |
ConstExpr | string | If specified, for each hierarchy level, one record is added with the value that is the aggregation over all records that didn’t make it to the top. | |
Aggregation | string | The aggregation function applied to records with the same Expr value. The result determines the top records. See Supported aggregation functions. Optionally, set an output column name as Name = Aggregation. |
Supported aggregation functions
The following aggregation functions are supported:
Returns
A table with two columns for each clause. One column contains unique values computed using Expr, and the other column shows the results obtained from the Aggregation calculation.
Using the with
others
clause
Using the top-nested
operator with with
others
adds the ability to see your top content contextualized in a wider data set. Evaluating your data in this way is valuable when rendering the data visually.
Include data from other columns
Only columns specified as a top-nested
clause Expr are displayed in the output table.
To include all values of a column at a specific level:
- Don’t specify the value of N.
- Use the column name as the value of Expr.
- Use
Ignore=max(1)
as the value of Aggregation. - Remove the unnecessary
Ignore
column with project-away.
For an example, see Most recent events per state with other column data.
Performance considerations
The number of records can grow exponentially with the number of top-nested
clauses, and record growth is even faster if the N parameter is not specified. This operator can consume a considerable amount of resources.
If the aggregation distribution is irregular, limit the number of distinct values to return by specifying N. Then, use the with
others
=
ConstExpr clause to get a sense of the weight of all other cases.
Examples
Top damaged states, event types, and end locations by property damage
The following query partitions the StormEvents
table by the State
column and calculates the total property damage for each state. The query selects the top two states with the largest amount of property damage. Within these top two states, the query groups the data by EventType
and selects the top three event types with the most damage. Then the query groups the data by EndLocation
and selects the EndLocation
with the highest damage. Only one EndLocation
value appears in the results, possibly due to the large nature of the storm events or not documenting the end location.
StormEvents // Data source.
| top-nested 2 of State by sum(DamageProperty), // Top 2 States by total damaged property.
top-nested 3 of EventType by sum(DamageProperty), // Top 3 EventType by total damaged property for each State.
top-nested 1 of EndLocation by sum(DamageProperty) // Top 1 EndLocation by total damaged property for each EventType and State.
| project State, EventType, EndLocation, StateTotalDamage = aggregated_State, EventTypeTotalDamage = aggregated_EventType, EndLocationDamage = aggregated_EndLocation
Output
State | EventType | EndLocation | StateTotalDamage | EventTypeTotalDamage | EndLocationDamage |
---|---|---|---|---|---|
CALIFORNIA | Wildfire | 1445937600 | 1326315000 | 1326315000 | |
CALIFORNIA | HighWind | 1445937600 | 61320000 | 61320000 | |
CALIFORNIA | DebrisFlow | 1445937600 | 48000000 | 48000000 | |
OKLAHOMA | IceStorm | 915470300 | 826000000 | 826000000 | |
OKLAHOMA | WinterStorm | 915470300 | 40027000 | 40027000 | |
OKLAHOMA | Flood | COMMERCE | 915470300 | 21485000 | 20000000 |
Top five states with property damage with
others
grouped
The following example uses the top-nested
operator to identify the top five states with the most property damage and uses the with
others
clause to group damaged property for all other states. It then visualizes damaged property for the top five states and all other states as a piechart
using the render
command.
StormEvents
| top-nested 5 of State with others="OtherStates" by sum(DamageProperty)
| render piechart
Output
Most recent events per state with other column data
The following query retrieves the two most recent events for each US state with relevant event details. It uses max(1)
within certain columns to propagate data without using the top-nested selection logic. The generated Ignore
aggregation columns are removed using project-away
.
StormEvents
| top-nested of State by Ignore0=max(1), // Partition the data by each unique value of state.
top-nested 2 of StartTime by Ignore1=max(StartTime), // Get the 2 most recent events in each state.
top-nested of EndTime by Ignore2=max(1), // Append the EndTime for each event.
top-nested of EpisodeId by Ignore3=max(1) // Append the EpisodeId for each event.
| project-away Ignore* // Remove the unnecessary aggregation columns.
| order by State asc, StartTime desc // Sort results alphabetically and chronologically.
Latest records per identity with other column data
The following top-nested
example extracts the latest records per identity and builds on the concepts introduced in the previous example. The first top-nested
clause partitions the data by distinct values of id
using Ignore0=max(1)
as a placeholder. For each id
, it identifies the two most recent records based on the timestamp
. Other information is appended using a top-nested
operator without specifying a count and using Ignore2=max(1)
as a placeholder. Finally, unnecessary aggregation columns are removed using the project-away
operator.
datatable(id: string, timestamp: datetime, otherInformation: string) // Create a source datatable.
[
"Barak", datetime(2015-01-01), "1",
"Barak", datetime(2016-01-01), "2",
"Barak", datetime(2017-01-20), "3",
"Donald", datetime(2017-01-20), "4",
"Donald", datetime(2017-01-18), "5",
"Donald", datetime(2017-01-19), "6"
]
| top-nested of id by Ignore0=max(1), // Partition the data by each unique value of id.
top-nested 2 of timestamp by Ignore1=max(timestamp), // Get the 2 most recent events for each state.
top-nested of otherInformation by Ignore2=max(1) // Append otherInformation for each event.
| project-away Ignore0, Ignore1, Ignore2 // Remove the unnecessary aggregation columns.
Output
id | timestamp | otherInformation |
---|---|---|
Barak | 2016-01-01T00:00:00Z | 2 |
Donald | 2017-01-19T00:00:00Z | 6 |
Barak | 2017-01-20T00:00:00Z | 3 |
Donald | 2017-01-20T00:00:00Z | 4 |
Related content
15.44 - union operator
Takes two or more tables and returns the rows of all of them.
Syntax
[ T |
] union
[ UnionParameters ] [kind=
inner
|outer
] [withsource=
ColumnName] [isfuzzy=
true
|false
] Tables
[ T |
] union
[kind=
inner
|outer
] [withsource=
ColumnName] [isfuzzy=
true
|false
] Tables
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | The input tabular expression. | |
UnionParameters | string | Zero or more space-separated parameters in the form of Name = Value that control the behavior of the row-match operation and execution plan. See supported union parameters. | |
kind | string | Either inner or outer . inner causes the result to have the subset of columns that are common to all of the input tables. outer causes the result to have all the columns that occur in any of the inputs. Cells that aren’t defined by an input row are set to null . The default is outer .With outer , the result has all the columns that occur in any of the inputs, one column for each name and type occurrences. This means that if a column appears in multiple tables and has multiple types, it has a corresponding column for each type in the union’s result. This column name is suffixed with a ‘_’ followed by the origin column type. | |
withsource= ColumnName | string | If specified, the output includes a column called ColumnName whose value indicates which source table has contributed each row. If the query effectively references tables from more than one database including the default database, then the value of this column has a table name qualified with the database. cluster and database qualifications are present in the value if more than one cluster is referenced. | |
isfuzzy | bool | If set to true , allows fuzzy resolution of union legs. The set of union sources is reduced to the set of table references that exist and are accessible at the time while analyzing the query and preparing for execution. If at least one such table was found, any resolution failure yields a warning in the query status results, but won’t prevent the query execution. If no resolutions were successful, the query returns an error. The default is false .isfuzzy=true only applies to the union sources resolution phase. Once the set of source tables is determined, possible additional query failures won’t be suppressed. | |
Tables | string | One or more comma-separated table references, a query expression enclosed with parenthesis, or a set of tables specified with a wildcard. For example, E* would form the union of all the tables in the database whose names begin E . |
Supported union parameters
Name | Type | Required | Description |
---|---|---|---|
hint.concurrency | int | Hints the system how many concurrent subqueries of the union operator should be executed in parallel. The default is the number of CPU cores on the single node of the cluster (2 to 16). | |
hint.spread | int | Hints the system how many nodes should be used by the concurrent union subqueries execution. The default is 1. |
Name | Type | Required | Description |
---|---|---|---|
T | string | The input tabular expression. | |
kind | string | Either inner or outer . inner causes the result to have the subset of columns that are common to all of the input tables. outer causes the result to have all the columns that occur in any of the inputs. Cells that aren’t defined by an input row are set to null . The default is outer .With outer , the result has all the columns that occur in any of the inputs, one column for each name and type occurrences. This means that if a column appears in multiple tables and has multiple types, it has a corresponding column for each type in the union’s result. This column name is suffixed with a ‘_’ followed by the origin column type. | |
withsource= ColumnName | string | If specified, the output includes a column called ColumnName whose value indicates which source table has contributed each row. If the query effectively references tables from more than one database including the default database, then the value of this column has a table name qualified with the database. cluster and database qualifications are present in the value if more than one cluster is referenced. | |
isfuzzy | bool | If set to true , allows fuzzy resolution of union legs. The set of union sources is reduced to the set of table references that exist and are accessible at the time while analyzing the query and preparing for execution. If at least one such table was found, any resolution failure yields a warning in the query status results, but won’t prevent the query execution. If no resolutions were successful, the query returns an error. However, in cross-workspace and cross-app queries, if any of the workspaces or apps is not found, the query will fail. The default is false .isfuzzy=true only applies to the union sources resolution phase. Once the set of source tables is determined, possible additional query failures won’t be suppressed. | |
Tables | string | One or more comma-separated table references, a query expression enclosed with parenthesis, or a set of tables specified with a wildcard. For example, E* would form the union of all the tables in the database whose names begin E .Whenever the list of tables is known, refrain from using wildcards. Some workspaces contains very large number of tables that would lead to inefficient execution. Tables may also be added over time leading to unpredicted results. |
Returns
A table with as many rows as there are in all the input tables.
Examples
Tables with string in name or column
union K* | where * has "Kusto"
Rows from all tables in the database whose name starts with K
, and in which any column includes the word Kusto
.
Distinct count
union withsource=SourceTable kind=outer Query, Command
| where Timestamp > ago(1d)
| summarize dcount(UserId)
The number of distinct users that have produced
either a Query
event or a Command
event over the past day. In the result, the ‘SourceTable’ column will indicate either “Query” or “Command”.
Query
| where Timestamp > ago(1d)
| union withsource=SourceTable kind=outer
(Command | where Timestamp > ago(1d))
| summarize dcount(UserId)
This more efficient version produces the same result. It filters each table before creating the union.
Using isfuzzy=true
// Using union isfuzzy=true to access non-existing view:
let View_1 = view () { print x=1 };
let View_2 = view () { print x=1 };
let OtherView_1 = view () { print x=1 };
union isfuzzy=true
(View_1 | where x > 0),
(View_2 | where x > 0),
(View_3 | where x > 0)
| count
Output
Count |
---|
2 |
Observing Query Status - the following warning returned:
Failed to resolve entity 'View_3'
// Using union isfuzzy=true and wildcard access:
let View_1 = view () { print x=1 };
let View_2 = view () { print x=1 };
let OtherView_1 = view () { print x=1 };
union isfuzzy=true View*, SomeView*, OtherView*
| count
Output
Count |
---|
3 |
Observing Query Status - the following warning returned:
Failed to resolve entity 'SomeView*'
Source columns types mismatch
let View_1 = view () { print x=1 };
let View_2 = view () { print x=toint(2) };
union withsource=TableName View_1, View_2
Output
TableName | x_long | x_int |
---|---|---|
View_1 | 1 | |
View_2 | 2 |
let View_1 = view () { print x=1 };
let View_2 = view () { print x=toint(2) };
let View_3 = view () { print x_long=3 };
union withsource=TableName View_1, View_2, View_3
Output
TableName | x_long1 | x_int | x_long |
---|---|---|---|
View_1 | 1 | ||
View_2 | 2 | ||
View_3 | 3 |
Column x
from View_1
received the suffix _long
, and as a column named x_long
already exists in the result schema, the column names were de-duplicated, producing a new column- x_long1
15.45 - where operator
Filters a table to the subset of rows that satisfy a predicate.
Syntax
T | where
Predicate
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Tabular input whose records are to be filtered. |
Predicate | string | ✔️ | Expression that evaluates to a bool for each row in T. |
Returns
Rows in T for which Predicate is true
.
Performance tips
Use simple comparisons between column names and constants. (‘Constant’ means constant over the table - so
now()
andago()
are OK, and so are scalar values assigned using alet
statement.)For example, prefer
where Timestamp >= ago(1d)
towhere bin(Timestamp, 1d) == ago(1d)
.Simplest terms first: If you have multiple clauses conjoined with
and
, put first the clauses that involve just one column. SoTimestamp > ago(1d) and OpId == EventId
is better than the other way around.
For more information, see the summary of available String operators and the summary of available Numerical operators.
Examples
Order comparisons by complexity
The following query returns storm records that report damaged property, are floods, and start and end in different places.
Notice that we put the comparison between two columns last, as the where operator can’t use the index and forces a scan.
StormEvents
| project DamageProperty, EventType, BeginLocation, EndLocation
| where DamageProperty > 0
and EventType == "Flood"
and BeginLocation != EndLocation
The following table only shows the top 10 results. To see the full output, run the query.
DamageProperty | EventType | BeginLocation | EndLocation |
---|---|---|---|
5000 | Flood | FAYETTE CITY LOWBER | |
5000 | Flood | MORRISVILLE WEST WAYNESBURG | |
10000 | Flood | COPELAND HARRIS GROVE | |
5000 | Flood | GLENFORD MT PERRY | |
25000 | Flood | EAST SENECA BUFFALO AIRPARK ARPT | |
20000 | Flood | EBENEZER SLOAN | |
10000 | Flood | BUEL CALHOUN | |
10000 | Flood | GOODHOPE WEST MILFORD | |
5000 | Flood | DUNKIRK FOREST | |
20000 | Flood | FARMINGTON MANNINGTON |
Check if column contains string
The following query returns the rows in which the word “cow” appears in any column.
StormEvents
| where * has "cow"
Related content
16 - Time series analysis
16.1 - Example use cases
16.1.1 - Analyze time series data
Cloud services and IoT devices generate telemetry data that can be used to gain insights such as monitoring service health, physical production processes, and usage trends. Performing time series analysis is one way to identify deviations in the pattern of these metrics compared to their typical baseline pattern.
Kusto Query Language (KQL) contains native support for creation, manipulation, and analysis of multiple time series. In this article, learn how KQL is used to create and analyze thousands of time series in seconds, enabling near real-time monitoring solutions and workflows.
Time series creation
In this section, we’ll create a large set of regular time series simply and intuitively using the make-series
operator, and fill-in missing values as needed.
The first step in time series analysis is to partition and transform the original telemetry table to a set of time series. The table usually contains a timestamp column, contextual dimensions, and optional metrics. The dimensions are used to partition the data. The goal is to create thousands of time series per partition at regular time intervals.
The input table demo_make_series1 contains 600K records of arbitrary web service traffic. Use the following command to sample 10 records:
demo_make_series1 | take 10
The resulting table contains a timestamp column, three contextual dimensions columns, and no metrics:
TimeStamp | BrowserVer | OsVer | Country/Region |
---|---|---|---|
2016-08-25 09:12:35.4020000 | Chrome 51.0 | Windows 7 | United Kingdom |
2016-08-25 09:12:41.1120000 | Chrome 52.0 | Windows 10 | |
2016-08-25 09:12:46.2300000 | Chrome 52.0 | Windows 7 | United Kingdom |
2016-08-25 09:12:46.5100000 | Chrome 52.0 | Windows 10 | United Kingdom |
2016-08-25 09:12:46.5570000 | Chrome 52.0 | Windows 10 | Republic of Lithuania |
2016-08-25 09:12:47.0470000 | Chrome 52.0 | Windows 8.1 | India |
2016-08-25 09:12:51.3600000 | Chrome 52.0 | Windows 10 | United Kingdom |
2016-08-25 09:12:51.6930000 | Chrome 52.0 | Windows 7 | Netherlands |
2016-08-25 09:12:56.4240000 | Chrome 52.0 | Windows 10 | United Kingdom |
2016-08-25 09:13:08.7230000 | Chrome 52.0 | Windows 10 | India |
Since there are no metrics, we can only build a set of time series representing the traffic count itself, partitioned by OS using the following query:
let min_t = toscalar(demo_make_series1 | summarize min(TimeStamp));
let max_t = toscalar(demo_make_series1 | summarize max(TimeStamp));
demo_make_series1
| make-series num=count() default=0 on TimeStamp from min_t to max_t step 1h by OsVer
| render timechart
- Use the
make-series
operator to create a set of three time series, where:num=count()
: time series of trafficfrom min_t to max_t step 1h
: time series is created in 1-hour bins in the time range (oldest and newest timestamps of table records)default=0
: specify fill method for missing bins to create regular time series. Alternatively useseries_fill_const()
,series_fill_forward()
,series_fill_backward()
andseries_fill_linear()
for changesby OsVer
: partition by OS
- The actual time series data structure is a numeric array of the aggregated value per each time bin. We use
render timechart
for visualization.
In the table above, we have three partitions. We can create a separate time series: Windows 10 (red), 7 (blue) and 8.1 (green) for each OS version as seen in the graph:
Time series analysis functions
In this section, we’ll perform typical series processing functions. Once a set of time series is created, KQL supports a growing list of functions to process and analyze them. We’ll describe a few representative functions for processing and analyzing time series.
Filtering
Filtering is a common practice in signal processing and useful for time series processing tasks (for example, smooth a noisy signal, change detection).
- There are two generic filtering functions:
series_fir()
: Applying FIR filter. Used for simple calculation of moving average and differentiation of the time series for change detection.series_iir()
: Applying IIR filter. Used for exponential smoothing and cumulative sum.
Extend
the time series set by adding a new moving average series of size 5 bins (named ma_num) to the query:
let min_t = toscalar(demo_make_series1 | summarize min(TimeStamp));
let max_t = toscalar(demo_make_series1 | summarize max(TimeStamp));
demo_make_series1
| make-series num=count() default=0 on TimeStamp from min_t to max_t step 1h by OsVer
| extend ma_num=series_fir(num, repeat(1, 5), true, true)
| render timechart
Regression analysis
A segmented linear regression analysis can be used to estimate the trend of the time series.
- Use series_fit_line() to fit the best line to a time series for general trend detection.
- Use series_fit_2lines() to detect trend changes, relative to the baseline, that are useful in monitoring scenarios.
Example of series_fit_line()
and series_fit_2lines()
functions in a time series query:
demo_series2
| extend series_fit_2lines(y), series_fit_line(y)
| render linechart with(xcolumn=x)
- Blue: original time series
- Green: fitted line
- Red: two fitted lines
Seasonality detection
Many metrics follow seasonal (periodic) patterns. User traffic of cloud services usually contains daily and weekly patterns that are highest around the middle of the business day and lowest at night and over the weekend. IoT sensors measure in periodic intervals. Physical measurements such as temperature, pressure, or humidity may also show seasonal behavior.
The following example applies seasonality detection on one month traffic of a web service (2-hour bins):
demo_series3
| render timechart
- Use series_periods_detect() to automatically detect the periods in the time series.
- Use series_periods_validate() if we know that a metric should have specific distinct period(s) and we want to verify that they exist.
demo_series3
| project (periods, scores) = series_periods_detect(num, 0., 14d/2h, 2) //to detect the periods in the time series
| mv-expand periods, scores
| extend days=2h*todouble(periods)/1d
periods | scores | days |
---|---|---|
84 | 0.820622786055595 | 7 |
12 | 0.764601405803502 | 1 |
The function detects daily and weekly seasonality. The daily scores less than the weekly because weekend days are different from weekdays.
Element-wise functions
Arithmetic and logical operations can be done on a time series. Using series_subtract() we can calculate a residual time series, that is, the difference between original raw metric and a smoothed one, and look for anomalies in the residual signal:
let min_t = toscalar(demo_make_series1 | summarize min(TimeStamp));
let max_t = toscalar(demo_make_series1 | summarize max(TimeStamp));
demo_make_series1
| make-series num=count() default=0 on TimeStamp in from min_t to max_t step 1h by OsVer
| extend ma_num=series_fir(num, repeat(1, 5), true, true)
| extend residual_num=series_subtract(num, ma_num) //to calculate residual time series
| where OsVer == "Windows 10" // filter on Win 10 to visualize a cleaner chart
| render timechart
- Blue: original time series
- Red: smoothed time series
- Green: residual time series
Time series workflow at scale
The example below shows how these functions can run at scale on thousands of time series in seconds for anomaly detection. To see a few sample telemetry records of a DB service’s read count metric over four days run the following query:
demo_many_series1
| take 4
TIMESTAMP | Loc | Op | DB | DataRead |
---|---|---|---|---|
2016-09-11 21:00:00.0000000 | Loc 9 | 5117853934049630089 | 262 | 0 |
2016-09-11 21:00:00.0000000 | Loc 9 | 5117853934049630089 | 241 | 0 |
2016-09-11 21:00:00.0000000 | Loc 9 | -865998331941149874 | 262 | 279862 |
2016-09-11 21:00:00.0000000 | Loc 9 | 371921734563783410 | 255 | 0 |
And simple statistics:
demo_many_series1
| summarize num=count(), min_t=min(TIMESTAMP), max_t=max(TIMESTAMP)
num | min_t | max_t |
---|---|---|
2177472 | 2016-09-08 00:00:00.0000000 | 2016-09-11 23:00:00.0000000 |
Building a time series in 1-hour bins of the read metric (total four days * 24 hours = 96 points), results in normal pattern fluctuation:
let min_t = toscalar(demo_many_series1 | summarize min(TIMESTAMP));
let max_t = toscalar(demo_many_series1 | summarize max(TIMESTAMP));
demo_many_series1
| make-series reads=avg(DataRead) on TIMESTAMP from min_t to max_t step 1h
| render timechart with(ymin=0)
The above behavior is misleading, since the single normal time series is aggregated from thousands of different instances that may have abnormal patterns. Therefore, we create a time series per instance. An instance is defined by Loc (location), Op (operation), and DB (specific machine).
How many time series can we create?
demo_many_series1
| summarize by Loc, Op, DB
| count
Count |
---|
18339 |
Now, we’re going to create a set of 18339 time series of the read count metric. We add the by
clause to the make-series statement, apply linear regression, and select the top two time series that had the most significant decreasing trend:
let min_t = toscalar(demo_many_series1 | summarize min(TIMESTAMP));
let max_t = toscalar(demo_many_series1 | summarize max(TIMESTAMP));
demo_many_series1
| make-series reads=avg(DataRead) on TIMESTAMP from min_t to max_t step 1h by Loc, Op, DB
| extend (rsquare, slope) = series_fit_line(reads)
| top 2 by slope asc
| render timechart with(title='Service Traffic Outage for 2 instances (out of 18339)')
Display the instances:
let min_t = toscalar(demo_many_series1 | summarize min(TIMESTAMP));
let max_t = toscalar(demo_many_series1 | summarize max(TIMESTAMP));
demo_many_series1
| make-series reads=avg(DataRead) on TIMESTAMP from min_t to max_t step 1h by Loc, Op, DB
| extend (rsquare, slope) = series_fit_line(reads)
| top 2 by slope asc
| project Loc, Op, DB, slope
Loc | Op | DB | slope |
---|---|---|---|
Loc 15 | 37 | 1151 | -102743.910227889 |
Loc 13 | 37 | 1249 | -86303.2334644601 |
In less than two minutes, close to 20,000 time series were analyzed and two abnormal time series in which the read count suddenly dropped were detected.
These advanced capabilities combined with fast performance supply a unique and powerful solution for time series analysis.
Related content
- Learn about Anomaly detection and forecasting with KQL.
- Learn about Machine learning capabilities with KQL.
16.1.2 - Anomaly diagnosis for root cause analysis
Kusto Query Language (KQL) has built-in anomaly detection and forecasting functions to check for anomalous behavior. Once such a pattern is detected, a Root Cause Analysis (RCA) can be run to mitigate or resolve the anomaly.
The diagnosis process is complex and lengthy, and done by domain experts. The process includes:
- Fetching and joining more data from different sources for the same time frame
- Looking for changes in the distribution of values on multiple dimensions
- Charting more variables
- Other techniques based on domain knowledge and intuition
Since these diagnosis scenarios are common, machine learning plugins are available to make the diagnosis phase easier, and shorten the duration of the RCA.
All three of the following Machine Learning plugins implement clustering algorithms: autocluster
, basket
, and diffpatterns
. The autocluster
and basket
plugins cluster a single record set, and the diffpatterns
plugin clusters the differences between two record sets.
Clustering a single record set
A common scenario includes a dataset selected by a specific criteria such as:
- Time window that shows anomalous behavior
- High temperature device readings
- Long duration commands
- Top spending users
You want a fast and easy way to find common patterns (segments) in the data. Patterns are a subset of the dataset whose records share the same values over multiple dimensions (categorical columns).
The following query builds and shows a time series of service exceptions over the period of a week, in ten-minute bins:
let min_t = toscalar(demo_clustering1 | summarize min(PreciseTimeStamp));
let max_t = toscalar(demo_clustering1 | summarize max(PreciseTimeStamp));
demo_clustering1
| make-series num=count() on PreciseTimeStamp from min_t to max_t step 10m
| render timechart with(title="Service exceptions over a week, 10 minutes resolution")
The service exception count correlates with the overall service traffic. You can clearly see the daily pattern for business days, Monday to Friday. There’s a rise in service exception counts at mid-day, and drops in counts during the night. Flat low counts are visible over the weekend. Exception spikes can be detected using time series anomaly detection.
The second spike in the data occurs on Tuesday afternoon. The following query is used to further diagnose and verify whether it’s a sharp spike. The query redraws the chart around the spike in a higher resolution of eight hours in one-minute bins. You can then study its borders.
let min_t=datetime(2016-08-23 11:00);
demo_clustering1
| make-series num=count() on PreciseTimeStamp from min_t to min_t+8h step 1m
| render timechart with(title="Zoom on the 2nd spike, 1 minute resolution")
You see a narrow two-minute spike from 15:00 to 15:02. In the following query, count the exceptions in this two-minute window:
let min_peak_t=datetime(2016-08-23 15:00);
let max_peak_t=datetime(2016-08-23 15:02);
demo_clustering1
| where PreciseTimeStamp between(min_peak_t..max_peak_t)
| count
Count |
---|
972 |
In the following query, sample 20 exceptions out of 972:
let min_peak_t=datetime(2016-08-23 15:00);
let max_peak_t=datetime(2016-08-23 15:02);
demo_clustering1
| where PreciseTimeStamp between(min_peak_t..max_peak_t)
| take 20
PreciseTimeStamp | Region | ScaleUnit | DeploymentId | Tracepoint | ServiceHost |
---|---|---|---|---|---|
2016-08-23 15:00:08.7302460 | scus | su5 | 9dbd1b161d5b4779a73cf19a7836ebd6 | 100005 | 00000000-0000-0000-0000-000000000000 |
2016-08-23 15:00:09.9496584 | scus | su5 | 9dbd1b161d5b4779a73cf19a7836ebd6 | 10007006 | 8d257da1-7a1c-44f5-9acd-f9e02ff507fd |
2016-08-23 15:00:10.5911748 | scus | su5 | 9dbd1b161d5b4779a73cf19a7836ebd6 | 100005 | 00000000-0000-0000-0000-000000000000 |
2016-08-23 15:00:12.2957912 | scus | su5 | 9dbd1b161d5b4779a73cf19a7836ebd6 | 10007007 | f855fcef-ebfe-405d-aaf8-9c5e2e43d862 |
2016-08-23 15:00:18.5955357 | scus | su5 | 9dbd1b161d5b4779a73cf19a7836ebd6 | 10007006 | 9d390e07-417d-42eb-bebd-793965189a28 |
2016-08-23 15:00:20.7444854 | scus | su5 | 9dbd1b161d5b4779a73cf19a7836ebd6 | 10007006 | 6e54c1c8-42d3-4e4e-8b79-9bb076ca71f1 |
2016-08-23 15:00:23.8694999 | eus2 | su2 | 89e2f62a73bb4efd8f545aeae40d7e51 | 36109 | 19422243-19b9-4d85-9ca6-bc961861d287 |
2016-08-23 15:00:26.4271786 | ncus | su1 | e24ef436e02b4823ac5d5b1465a9401e | 36109 | 3271bae4-1c5b-4f73-98ef-cc117e9be914 |
2016-08-23 15:00:27.8958124 | scus | su3 | 90d3d2fc7ecc430c9621ece335651a01 | 904498 | 8cf38575-fca9-48ca-bd7c-21196f6d6765 |
2016-08-23 15:00:32.9884969 | scus | su3 | 90d3d2fc7ecc430c9621ece335651a01 | 10007007 | d5c7c825-9d46-4ab7-a0c1-8e2ac1d83ddb |
2016-08-23 15:00:34.5061623 | scus | su5 | 9dbd1b161d5b4779a73cf19a7836ebd6 | 1002110 | 55a71811-5ec4-497a-a058-140fb0d611ad |
2016-08-23 15:00:37.4490273 | scus | su3 | 90d3d2fc7ecc430c9621ece335651a01 | 10007006 | f2ee8254-173c-477d-a1de-4902150ea50d |
2016-08-23 15:00:41.2431223 | scus | su3 | 90d3d2fc7ecc430c9621ece335651a01 | 103200 | 8cf38575-fca9-48ca-bd7c-21196f6d6765 |
2016-08-23 15:00:47.2983975 | ncus | su1 | e24ef436e02b4823ac5d5b1465a9401e | 423690590 | 00000000-0000-0000-0000-000000000000 |
2016-08-23 15:00:50.5932834 | scus | su5 | 9dbd1b161d5b4779a73cf19a7836ebd6 | 10007006 | 2a41b552-aa19-4987-8cdd-410a3af016ac |
2016-08-23 15:00:50.8259021 | scus | su5 | 9dbd1b161d5b4779a73cf19a7836ebd6 | 1002110 | 0d56b8e3-470d-4213-91da-97405f8d005e |
2016-08-23 15:00:53.2490731 | scus | su5 | 9dbd1b161d5b4779a73cf19a7836ebd6 | 36109 | 55a71811-5ec4-497a-a058-140fb0d611ad |
2016-08-23 15:00:57.0000946 | eus2 | su2 | 89e2f62a73bb4efd8f545aeae40d7e51 | 64038 | cb55739e-4afe-46a3-970f-1b49d8ee7564 |
2016-08-23 15:00:58.2222707 | scus | su5 | 9dbd1b161d5b4779a73cf19a7836ebd6 | 10007007 | 8215dcf6-2de0-42bd-9c90-181c70486c9c |
2016-08-23 15:00:59.9382620 | scus | su3 | 90d3d2fc7ecc430c9621ece335651a01 | 10007006 | 451e3c4c-0808-4566-a64d-84d85cf30978 |
Use autocluster() for single record set clustering
Even though there are less than a thousand exceptions, it’s still hard to find common segments, since there are multiple values in each column. You can use the autocluster()
plugin to instantly extract a short list of common segments and find the interesting clusters within the spike’s two minutes, as seen in the following query:
let min_peak_t=datetime(2016-08-23 15:00);
let max_peak_t=datetime(2016-08-23 15:02);
demo_clustering1
| where PreciseTimeStamp between(min_peak_t..max_peak_t)
| evaluate autocluster()
SegmentId | Count | Percent | Region | ScaleUnit | DeploymentId | ServiceHost |
---|---|---|---|---|---|---|
0 | 639 | 65.7407407407407 | eau | su7 | b5d1d4df547d4a04ac15885617edba57 | e7f60c5d-4944-42b3-922a-92e98a8e7dec |
1 | 94 | 9.67078189300411 | scus | su5 | 9dbd1b161d5b4779a73cf19a7836ebd6 | |
2 | 82 | 8.43621399176955 | ncus | su1 | e24ef436e02b4823ac5d5b1465a9401e | |
3 | 68 | 6.99588477366255 | scus | su3 | 90d3d2fc7ecc430c9621ece335651a01 | |
4 | 55 | 5.65843621399177 | weu | su4 | be1d6d7ac9574cbc9a22cb8ee20f16fc |
You can see from the results above that the most dominant segment contains 65.74% of the total exception records and shares four dimensions. The next segment is much less common. It contains only 9.67% of the records, and shares three dimensions. The other segments are even less common.
Autocluster uses a proprietary algorithm for mining multiple dimensions and extracting interesting segments. “Interesting” means that each segment has significant coverage of both the records set and the features set. The segments are also diverged, meaning that each one is different from the others. One or more of these segments might be relevant for the RCA process. To minimize segment review and assessment, autocluster extracts only a small segment list.
Use basket() for single record set clustering
You can also use the basket()
plugin as seen in the following query:
let min_peak_t=datetime(2016-08-23 15:00);
let max_peak_t=datetime(2016-08-23 15:02);
demo_clustering1
| where PreciseTimeStamp between(min_peak_t..max_peak_t)
| evaluate basket()
SegmentId | Count | Percent | Region | ScaleUnit | DeploymentId | Tracepoint | ServiceHost |
---|---|---|---|---|---|---|---|
0 | 639 | 65.7407407407407 | eau | su7 | b5d1d4df547d4a04ac15885617edba57 | e7f60c5d-4944-42b3-922a-92e98a8e7dec | |
1 | 642 | 66.0493827160494 | eau | su7 | b5d1d4df547d4a04ac15885617edba57 | ||
2 | 324 | 33.3333333333333 | eau | su7 | b5d1d4df547d4a04ac15885617edba57 | 0 | e7f60c5d-4944-42b3-922a-92e98a8e7dec |
3 | 315 | 32.4074074074074 | eau | su7 | b5d1d4df547d4a04ac15885617edba57 | 16108 | e7f60c5d-4944-42b3-922a-92e98a8e7dec |
4 | 328 | 33.7448559670782 | 0 | ||||
5 | 94 | 9.67078189300411 | scus | su5 | 9dbd1b161d5b4779a73cf19a7836ebd6 | ||
6 | 82 | 8.43621399176955 | ncus | su1 | e24ef436e02b4823ac5d5b1465a9401e | ||
7 | 68 | 6.99588477366255 | scus | su3 | 90d3d2fc7ecc430c9621ece335651a01 | ||
8 | 167 | 17.1810699588477 | scus | ||||
9 | 55 | 5.65843621399177 | weu | su4 | be1d6d7ac9574cbc9a22cb8ee20f16fc | ||
10 | 92 | 9.46502057613169 | 10007007 | ||||
11 | 90 | 9.25925925925926 | 10007006 | ||||
12 | 57 | 5.8641975308642 | 00000000-0000-0000-0000-000000000000 |
Basket implements the “Apriori” algorithm for item set mining. It extracts all segments whose coverage of the record set is above a threshold (default 5%). You can see that more segments were extracted with similar ones, such as segments 0, 1 or 2, 3.
Both plugins are powerful and easy to use. Their limitation is that they cluster a single record set in an unsupervised manner with no labels. It’s unclear whether the extracted patterns characterize the selected record set, anomalous records, or the global record set.
Clustering the difference between two records sets
The diffpatterns()
plugin overcomes the limitation of autocluster
and basket
. Diffpatterns
takes two record sets and extracts the main segments that are different. One set usually contains the anomalous record set being investigated. One is analyzed by autocluster
and basket
. The other set contains the reference record set, the baseline.
In the following query, diffpatterns
finds interesting clusters within the spike’s two minutes, which are different from the clusters within the baseline. The baseline window is defined as the eight minutes before 15:00, when the spike started. You extend by a binary column (AB), and specify whether a specific record belongs to the baseline or to the anomalous set. Diffpatterns
implements a supervised learning algorithm, where the two class labels were generated by the anomalous versus the baseline flag (AB).
let min_peak_t=datetime(2016-08-23 15:00);
let max_peak_t=datetime(2016-08-23 15:02);
let min_baseline_t=datetime(2016-08-23 14:50);
let max_baseline_t=datetime(2016-08-23 14:58); // Leave a gap between the baseline and the spike to avoid the transition zone.
let splitime=(max_baseline_t+min_peak_t)/2.0;
demo_clustering1
| where (PreciseTimeStamp between(min_baseline_t..max_baseline_t)) or
(PreciseTimeStamp between(min_peak_t..max_peak_t))
| extend AB=iff(PreciseTimeStamp > splitime, 'Anomaly', 'Baseline')
| evaluate diffpatterns(AB, 'Anomaly', 'Baseline')
SegmentId | CountA | CountB | PercentA | PercentB | PercentDiffAB | Region | ScaleUnit | DeploymentId | Tracepoint |
---|---|---|---|---|---|---|---|---|---|
0 | 639 | 21 | 65.74 | 1.7 | 64.04 | eau | su7 | b5d1d4df547d4a04ac15885617edba57 | |
1 | 167 | 544 | 17.18 | 44.16 | 26.97 | scus | |||
2 | 92 | 356 | 9.47 | 28.9 | 19.43 | 10007007 | |||
3 | 90 | 336 | 9.26 | 27.27 | 18.01 | 10007006 | |||
4 | 82 | 318 | 8.44 | 25.81 | 17.38 | ncus | su1 | e24ef436e02b4823ac5d5b1465a9401e | |
5 | 55 | 252 | 5.66 | 20.45 | 14.8 | weu | su4 | be1d6d7ac9574cbc9a22cb8ee20f16fc | |
6 | 57 | 204 | 5.86 | 16.56 | 10.69 |
The most dominant segment is the same segment that was extracted by autocluster
. Its coverage on the two-minute anomalous window is also 65.74%. However, its coverage on the eight-minute baseline window is only 1.7%. The difference is 64.04%. This difference seems to be related to the anomalous spike. To verify this assumption, the following query splits the original chart into the records that belong to this problematic segment, and records from the other segments.
let min_t = toscalar(demo_clustering1 | summarize min(PreciseTimeStamp));
let max_t = toscalar(demo_clustering1 | summarize max(PreciseTimeStamp));
demo_clustering1
| extend seg = iff(Region == "eau" and ScaleUnit == "su7" and DeploymentId == "b5d1d4df547d4a04ac15885617edba57"
and ServiceHost == "e7f60c5d-4944-42b3-922a-92e98a8e7dec", "Problem", "Normal")
| make-series num=count() on PreciseTimeStamp from min_t to max_t step 10m by seg
| render timechart
This chart allows us to see that the spike on Tuesday afternoon was because of exceptions from this specific segment, discovered by using the diffpatterns
plugin.
Summary
The Machine Learning plugins are helpful for many scenarios. The autocluster
and basket
implement an unsupervised learning algorithm and are easy to use. Diffpatterns
implements a supervised learning algorithm and, although more complex, it’s more powerful for extracting differentiation segments for RCA.
These plugins are used interactively in ad-hoc scenarios and in automatic near real-time monitoring services. Time series anomaly detection is followed by a diagnosis process. The process is highly optimized to meet necessary performance standards.
16.1.3 - Time series anomaly detection & forecasting
Cloud services and IoT devices generate telemetry data that can be used to gain insights such as monitoring service health, physical production processes, and usage trends. Performing time series analysis is one way to identify deviations in the pattern of these metrics compared to their typical baseline pattern.
Kusto Query Language (KQL) contains native support for creation, manipulation, and analysis of multiple time series. With KQL, you can create and analyze thousands of time series in seconds, enabling near real time monitoring solutions and workflows.
This article details time series anomaly detection and forecasting capabilities of KQL. The applicable time series functions are based on a robust well-known decomposition model, where each original time series is decomposed into seasonal, trend, and residual components. Anomalies are detected by outliers on the residual component, while forecasting is done by extrapolating the seasonal and trend components. The KQL implementation significantly enhances the basic decomposition model by automatic seasonality detection, robust outlier analysis, and vectorized implementation to process thousands of time series in seconds.
Prerequisites
- A Microsoft account or a Microsoft Entra user identity. An Azure subscription isn’t required.
- Read Time series analysis for an overview of time series capabilities.
Time series decomposition model
The KQL native implementation for time series prediction and anomaly detection uses a well-known decomposition model. This model is applied to time series of metrics expected to manifest periodic and trend behavior, such as service traffic, component heartbeats, and IoT periodic measurements to forecast future metric values and detect anomalous ones. The assumption of this regression process is that other than the previously known seasonal and trend behavior, the time series is randomly distributed. You can then forecast future metric values from the seasonal and trend components, collectively named baseline, and ignore the residual part. You can also detect anomalous values based on outlier analysis using only the residual portion.
To create a decomposition model, use the function series_decompose()
. The series_decompose()
function takes a set of time series and automatically decomposes each time series to its seasonal, trend, residual, and baseline components.
For example, you can decompose traffic of an internal web service by using the following query:
let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid
| where sid == 'TS1' // select a single time series for a cleaner visualization
| extend (baseline, seasonal, trend, residual) = series_decompose(num, -1, 'linefit') // decomposition of a set of time series to seasonal, trend, residual, and baseline (seasonal+trend)
| render timechart with(title='Web app. traffic of a month, decomposition', ysplit=panels)
- The original time series is labeled num (in red).
- The process starts by auto detection of the seasonality by using the function
series_periods_detect()
and extracts the seasonal pattern (in purple). - The seasonal pattern is subtracted from the original time series and a linear regression is run using the function
series_fit_line()
to find the trend component (in light blue). - The function subtracts the trend and the remainder is the residual component (in green).
- Finally, the function adds the seasonal and trend components to generate the baseline (in blue).
Time series anomaly detection
The function series_decompose_anomalies()
finds anomalous points on a set of time series. This function calls series_decompose()
to build the decomposition model and then runs series_outliers()
on the residual component. series_outliers()
calculates anomaly scores for each point of the residual component using Tukey’s fence test. Anomaly scores above 1.5 or below -1.5 indicate a mild anomaly rise or decline respectively. Anomaly scores above 3.0 or below -3.0 indicate a strong anomaly.
The following query allows you to detect anomalies in internal web service traffic:
let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid
| where sid == 'TS1' // select a single time series for a cleaner visualization
| extend (anomalies, score, baseline) = series_decompose_anomalies(num, 1.5, -1, 'linefit')
| render anomalychart with(anomalycolumns=anomalies, title='Web app. traffic of a month, anomalies') //use "| render anomalychart with anomalycolumns=anomalies" to render the anomalies as bold points on the series charts.
- The original time series (in red).
- The baseline (seasonal + trend) component (in blue).
- The anomalous points (in purple) on top of the original time series. The anomalous points significantly deviate from the expected baseline values.
Time series forecasting
The function series_decompose_forecast()
predicts future values of a set of time series. This function calls series_decompose()
to build the decomposition model and then, for each time series, extrapolates the baseline component into the future.
The following query allows you to predict next week’s web service traffic:
let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
let horizon=7d;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t+horizon step dt by sid
| where sid == 'TS1' // select a single time series for a cleaner visualization
| extend forecast = series_decompose_forecast(num, toint(horizon/dt))
| render timechart with(title='Web app. traffic of a month, forecasting the next week by Time Series Decomposition')
- Original metric (in red). Future values are missing and set to 0, by default.
- Extrapolate the baseline component (in blue) to predict next week’s values.
Scalability
Kusto Query Language syntax enables a single call to process multiple time series. Its unique optimized implementation allows for fast performance, which is critical for effective anomaly detection and forecasting when monitoring thousands of counters in near real-time scenarios.
The following query shows the processing of three time series simultaneously:
let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
let horizon=7d;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t+horizon step dt by sid
| extend offset=case(sid=='TS3', 4000000, sid=='TS2', 2000000, 0) // add artificial offset for easy visualization of multiple time series
| extend num=series_add(num, offset)
| extend forecast = series_decompose_forecast(num, toint(horizon/dt))
| render timechart with(title='Web app. traffic of a month, forecasting the next week for 3 time series')
Summary
This document details native KQL functions for time series anomaly detection and forecasting. Each original time series is decomposed into seasonal, trend and residual components for detecting anomalies and/or forecasting. These functionalities can be used for near real-time monitoring scenarios, such as fault detection, predictive maintenance, and demand and load forecasting.
Related content
- Learn about Anomaly diagnosis capabilities with KQL
16.2 - make-series operator
Create series of specified aggregated values along a specified axis.
Syntax
T | make-series
[MakeSeriesParameters]
[Column =
] Aggregation [default
=
DefaultValue] [,
…]
on
AxisColumn [from
start] [to
end] step
step
[by
[Column =
] GroupExpression [,
…]]
Parameters
Name | Type | Required | Description |
---|---|---|---|
Column | string | The name for the result column. Defaults to a name derived from the expression. | |
DefaultValue | scalar | A default value to use instead of absent values. If there’s no row with specific values of AxisColumn and GroupExpression, then the corresponding element of the array will be assigned a DefaultValue. Default is 0. | |
Aggregation | string | ✔️ | A call to an aggregation function, such as count() or avg() , with column names as arguments. See the list of aggregation functions. Only aggregation functions that return numeric results can be used with the make-series operator. |
AxisColumn | string | ✔️ | The column by which the series will be ordered. Usually the column values will be of type datetime or timespan but all numeric types are accepted. |
start | scalar | ✔️ | The low bound value of the AxisColumn for each of the series to be built. If start is not specified, it will be the first bin, or step, that has data in each series. |
end | scalar | ✔️ | The high bound non-inclusive value of the AxisColumn. The last index of the time series is smaller than this value and will be start plus integer multiple of step that is smaller than end. If end is not specified, it will be the upper bound of the last bin, or step, that has data per each series. |
step | scalar | ✔️ | The difference, or bin size, between two consecutive elements of the AxisColumn array. For a list of possible time intervals, see timespan. |
GroupExpression | An expression over the columns that provides a set of distinct values. Typically it’s a column name that already provides a restricted set of values. | ||
MakeSeriesParameters | Zero or more space-separated parameters in the form of Name = Value that control the behavior. See supported make series parameters. |
Supported make series parameters
Name | Description |
---|---|
kind | Produces default result when the input of make-series operator is empty. Value: nonempty |
hint.shufflekey=<key> | The shufflekey query shares the query load on cluster nodes, using a key to partition data. See shuffle query |
Alternate Syntax
T | make-series
[Column =
] Aggregation [default
=
DefaultValue] [,
…]
on
AxisColumn in
range(
start,
stop,
step)
[by
[Column =
] GroupExpression [,
…]]
The generated series from the alternate syntax differs from the main syntax in two aspects:
- The stop value is inclusive.
- Binning the index axis is generated with bin() and not bin_at(), which means that start may not be included in the generated series.
It’s recommended to use the main syntax of make-series and not the alternate syntax.
Returns
The input rows are arranged into groups having the same values of the by
expressions and the bin_at(
AxisColumn,
step,
start)
expression. Then the specified aggregation functions are computed over each group, producing a row for each group. The result contains the by
columns, AxisColumn column and also at least one column for each computed aggregate. (Aggregations over multiple columns or non-numeric results aren’t supported.)
This intermediate result has as many rows as there are distinct combinations of by
and bin_at(
AxisColumn,
step,
start)
values.
Finally the rows from the intermediate result arranged into groups having the same values of the by
expressions and all aggregated values are arranged into arrays (values of dynamic
type). For each aggregation, there’s one column containing its array with the same name. The last column is an array containing the values of AxisColumn binned according to the specified step.
List of aggregation functions
Function | Description |
---|---|
avg() | Returns an average value across the group |
avgif() | Returns an average with the predicate of the group |
count() | Returns a count of the group |
countif() | Returns a count with the predicate of the group |
dcount() | Returns an approximate distinct count of the group elements |
dcountif() | Returns an approximate distinct count with the predicate of the group |
max() | Returns the maximum value across the group |
maxif() | Returns the maximum value with the predicate of the group |
min() | Returns the minimum value across the group |
minif() | Returns the minimum value with the predicate of the group |
percentile() | Returns the percentile value across the group |
take_any() | Returns a random non-empty value for the group |
stdev() | Returns the standard deviation across the group |
sum() | Returns the sum of the elements within the group |
sumif() | Returns the sum of the elements with the predicate of the group |
variance() | Returns the variance across the group |
List of series analysis functions
Function | Description |
---|---|
series_fir() | Applies Finite Impulse Response filter |
series_iir() | Applies Infinite Impulse Response filter |
series_fit_line() | Finds a straight line that is the best approximation of the input |
series_fit_line_dynamic() | Finds a line that is the best approximation of the input, returning dynamic object |
series_fit_2lines() | Finds two lines that are the best approximation of the input |
series_fit_2lines_dynamic() | Finds two lines that are the best approximation of the input, returning dynamic object |
series_outliers() | Scores anomaly points in a series |
series_periods_detect() | Finds the most significant periods that exist in a time series |
series_periods_validate() | Checks whether a time series contains periodic patterns of given lengths |
series_stats_dynamic() | Return multiple columns with the common statistics (min/max/variance/stdev/average) |
series_stats() | Generates a dynamic value with the common statistics (min/max/variance/stdev/average) |
For a complete list of series analysis functions, see: Series processing functions
List of series interpolation functions
Function | Description |
---|---|
series_fill_backward() | Performs backward fill interpolation of missing values in a series |
series_fill_const() | Replaces missing values in a series with a specified constant value |
series_fill_forward() | Performs forward fill interpolation of missing values in a series |
series_fill_linear() | Performs linear interpolation of missing values in a series |
- Note: Interpolation functions by default assume
null
as a missing value. Therefore specifydefault=
double(null
) inmake-series
if you intend to use interpolation functions for the series.
Examples
A table that shows arrays of the numbers and average prices of each fruit from each supplier ordered by the timestamp with specified range. There’s a row in the output for each distinct combination of fruit and supplier. The output columns show the fruit, supplier, and arrays of: count, average, and the whole timeline (from 2016-01-01 until 2016-01-10). All arrays are sorted by the respective timestamp and all gaps are filled with default values (0 in this example). All other input columns are ignored.
T | make-series PriceAvg=avg(Price) default=0
on Purchase from datetime(2016-09-10) to datetime(2016-09-13) step 1d by Supplier, Fruit
let data=datatable(timestamp:datetime, metric: real)
[
datetime(2016-12-31T06:00), 50,
datetime(2017-01-01), 4,
datetime(2017-01-02), 3,
datetime(2017-01-03), 4,
datetime(2017-01-03T03:00), 6,
datetime(2017-01-05), 8,
datetime(2017-01-05T13:40), 13,
datetime(2017-01-06), 4,
datetime(2017-01-07), 3,
datetime(2017-01-08), 8,
datetime(2017-01-08T21:00), 8,
datetime(2017-01-09), 2,
datetime(2017-01-09T12:00), 11,
datetime(2017-01-10T05:00), 5,
];
let interval = 1d;
let stime = datetime(2017-01-01);
let etime = datetime(2017-01-10);
data
| make-series avg(metric) on timestamp from stime to etime step interval
avg_metric | timestamp |
---|---|
[ 4.0, 3.0, 5.0, 0.0, 10.5, 4.0, 3.0, 8.0, 6.5 ] | [ “2017-01-01T00:00:00.0000000Z”, “2017-01-02T00:00:00.0000000Z”, “2017-01-03T00:00:00.0000000Z”, “2017-01-04T00:00:00.0000000Z”, “2017-01-05T00:00:00.0000000Z”, “2017-01-06T00:00:00.0000000Z”, “2017-01-07T00:00:00.0000000Z”, “2017-01-08T00:00:00.0000000Z”, “2017-01-09T00:00:00.0000000Z” ] |
When the input to make-series
is empty, the default behavior of make-series
produces an empty result.
let data=datatable(timestamp:datetime, metric: real)
[
datetime(2016-12-31T06:00), 50,
datetime(2017-01-01), 4,
datetime(2017-01-02), 3,
datetime(2017-01-03), 4,
datetime(2017-01-03T03:00), 6,
datetime(2017-01-05), 8,
datetime(2017-01-05T13:40), 13,
datetime(2017-01-06), 4,
datetime(2017-01-07), 3,
datetime(2017-01-08), 8,
datetime(2017-01-08T21:00), 8,
datetime(2017-01-09), 2,
datetime(2017-01-09T12:00), 11,
datetime(2017-01-10T05:00), 5,
];
let interval = 1d;
let stime = datetime(2017-01-01);
let etime = datetime(2017-01-10);
data
| take 0
| make-series avg(metric) default=1.0 on timestamp from stime to etime step interval
| count
Output
Count |
---|
0 |
Using kind=nonempty
in make-series
will produce a non-empty result of the default values:
let data=datatable(timestamp:datetime, metric: real)
[
datetime(2016-12-31T06:00), 50,
datetime(2017-01-01), 4,
datetime(2017-01-02), 3,
datetime(2017-01-03), 4,
datetime(2017-01-03T03:00), 6,
datetime(2017-01-05), 8,
datetime(2017-01-05T13:40), 13,
datetime(2017-01-06), 4,
datetime(2017-01-07), 3,
datetime(2017-01-08), 8,
datetime(2017-01-08T21:00), 8,
datetime(2017-01-09), 2,
datetime(2017-01-09T12:00), 11,
datetime(2017-01-10T05:00), 5,
];
let interval = 1d;
let stime = datetime(2017-01-01);
let etime = datetime(2017-01-10);
data
| take 0
| make-series kind=nonempty avg(metric) default=1.0 on timestamp from stime to etime step interval
Output
avg_metric | timestamp |
---|---|
[ 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0 ] | [ “2017-01-01T00:00:00.0000000Z”, “2017-01-02T00:00:00.0000000Z”, “2017-01-03T00:00:00.0000000Z”, “2017-01-04T00:00:00.0000000Z”, “2017-01-05T00:00:00.0000000Z”, “2017-01-06T00:00:00.0000000Z”, “2017-01-07T00:00:00.0000000Z”, “2017-01-08T00:00:00.0000000Z”, “2017-01-09T00:00:00.0000000Z” ] |
16.3 - series_abs()
Calculates the element-wise absolute value of the numeric series input.
Syntax
series_abs(
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values over which the absolute value function is applied. |
Returns
Dynamic array of calculated absolute value. Any non-numeric element yields a null
element value.
Example
print arr = dynamic([-6.5,0,8.2])
| extend arr_abs = series_abs(arr)
Output
arr | arr_abs |
---|---|
[-6.5,0,8.2] | [6.5,0,8.2] |
16.4 - series_acos()
Calculates the element-wise arccosine function of the numeric series input.
Syntax
series_acos(
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values over which the arccosine function is applied. |
Returns
Dynamic array of calculated arccosine function values. Any non-numeric element yields a null
element value.
Example
print arr = dynamic([-1,0,1])
| extend arr_acos = series_acos(arr)
Output
arr | arr_acos |
---|---|
[-6.5,0,8.2] | [3.1415926535897931,1.5707963267948966,0.0] |
16.5 - series_add()
Calculates the element-wise addition of two numeric series inputs.
Syntax
series_add(
series1,
series2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series1, series2 | dynamic | ✔️ | The numeric arrays to be element-wise added into a dynamic array result. |
Returns
Dynamic array of calculated element-wise add operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null
element value.
Example
range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| project s1 = pack_array(x,y,z), s2 = pack_array(z, y, x)
| extend s1_add_s2 = series_add(s1, s2)
Output
s1 | s2 | s1_add_s2 |
---|---|---|
[1,2,4] | [4,2,1] | [5,4,5] |
[2,4,8] | [8,4,2] | [10,8,10] |
[3,6,12] | [12,6,3] | [15,12,15] |
16.6 - series_atan()
Calculates the element-wise arctangent function of the numeric series input.
Syntax
series_atan(
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values over which the arctangent function is applied. |
Returns
Dynamic array of calculated arctangent function values. Any non-numeric element yields a null
element value.
Example
print arr = dynamic([-1,0,1])
| extend arr_atan = series_atan(arr)
Output
arr | arr_atan |
---|---|
[-6.5,0,8.2] | [-0.78539816339744828,0.0,0.78539816339744828] |
16.7 - series_cos()
Calculates the element-wise cosine function of the numeric series input.
Syntax
series_cos(
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values over which the cosine function is applied. |
Returns
Dynamic array of calculated cosine function values. Any non-numeric element yields a null
element value.
Example
print arr = dynamic([-1,0,1])
| extend arr_cos = series_cos(arr)
Output
arr | arr_cos |
---|---|
[-6.5,0,8.2] | [0.54030230586813976,1.0,0.54030230586813976] |
16.8 - series_cosine_similarity()
Calculate the cosine similarity of two numerical vectors.
The function series_cosine_similarity()
takes two numeric series as input, and calculates their cosine similarity.
Syntax
series_cosine_similarity(
series1,
series2, [
*magnitude1, [
*magnitude2]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
series1, series2 | dynamic | ✔️ | Input arrays with numeric data. |
magnitude1, magnitude2 | real | Optional magnitude of the first and the second vectors respectively. The magnitude is the square root of the dot product of the vector with itself. If the magnitude isn’t provided, it will be calculated. |
Returns
Returns a value of type real
whose value is the cosine similarity of series1 with series2.
In case both series length isn’t equal, the longer series will be truncated to the length of the shorter one.
Any non-numeric element of the input series will be ignored.
Example
target="_blank">Run the query
datatable(s1:dynamic, s2:dynamic)
[
dynamic([0.1,0.2,0.1,0.2]), dynamic([0.11,0.2,0.11,0.21]),
dynamic([0.1,0.2,0.1,0.2]), dynamic([1,2,3,4]),
]
| extend cosine_similarity=series_cosine_similarity(s1, s2)
s1 | s2 | cosine_similarity |
---|---|---|
[0.1,0.2,0.1,0.2] | [0.11,0.2,0.11,0.21] | 0.99935343825504 |
[0.1,0.2,0.1,0.2] | [1,2,3,4] | 0.923760430703401 |
Related content
16.9 - series_decompose_anomalies()
Anomaly Detection is based on series decomposition. For more information, see series_decompose().
The function takes an expression containing a series (dynamic numerical array) as input, and extracts anomalous points with scores.
Syntax
series_decompose_anomalies (
Series,
[ Threshold,
Seasonality,
Trend,
Test_points,
AD_method,
Seasonality_threshold ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
Series | dynamic | ✔️ | An array of numeric values, typically the resulting output of make-series or make_list operators. |
Threshold | real | The anomaly threshold. The default is 1.5, k value, for detecting mild or stronger anomalies. | |
Seasonality | int | Controls the seasonal analysis. The possible values are: - -1 : Autodetect seasonality using series_periods_detect. This is the default value.- Integer time period: A positive integer specifying the expected period in number of bins. For example, if the series is in 1h bins, a weekly period is 168 bins.- 0 : No seasonality, so skip extracting this component. | |
Trend | string | Controls the trend analysis. The possible values are: - avg : Define trend component as average(x) . This is the default.- linefit : Extract trend component using linear regression.- none : No trend, so skip extracting this component. | |
Test_points | int | A positive integer specifying the number of points at the end of the series to exclude from the learning, or regression, process. This parameter should be set for forecasting purposes. The default value is 0. | |
AD_method | string | Controls the anomaly detection method on the residual time series, containing one of the following values: - ctukey : Tukey’s fence test with custom 10th-90th percentile range. This is the default.- tukey : Tukey’s fence test with standard 25th-75th percentile range.For more information on residual time series, see series_outliers. | |
Seasonality_threshold | real | The threshold for seasonality score when Seasonality is set to autodetect. The default score threshold is 0.6. For more information, see series_periods_detect. |
Returns
The function returns the following respective series:
ad_flag
: A ternary series containing (+1, -1, 0) marking up/down/no anomaly respectivelyad_score
: Anomaly scorebaseline
: The predicted value of the series, according to the decomposition
The algorithm
This function follows these steps:
- Calls series_decompose() with the respective parameters, to create the baseline and residuals series.
- Calculates ad_score series by applying series_outliers() with the chosen anomaly detection method on the residuals series.
- Calculates the ad_flag series by applying the threshold on the ad_score to mark up/down/no anomaly respectively.
Examples
Detect anomalies in weekly seasonality
In the following example, generate a series with weekly seasonality, and then add some outliers to it. series_decompose_anomalies
autodetects the seasonality and generates a baseline that captures the repetitive pattern. The outliers you added can be clearly spotted in the ad_score component.
let ts=range t from 1 to 24*7*5 step 1
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t
| extend y = 2*rand() + iff((t/24)%7>=5, 10.0, 15.0) - (((t%24)/10)*((t%24)/10)) // generate a series with weekly seasonality
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts
| extend series_decompose_anomalies(y)
| render timechart
Detect anomalies in weekly seasonality with trend
In this example, add a trend to the series from the previous example. First, run series_decompose_anomalies
with the default parameters in which the trend avg
default value only takes the average and doesn’t compute the trend. The generated baseline doesn’t contain the trend and is less exact, compared to the previous example. Consequently, some of the outliers you inserted in the data aren’t detected because of the higher variance.
let ts=range t from 1 to 24*7*5 step 1
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t
| extend y = 2*rand() + iff((t/24)%7>=5, 5.0, 15.0) - (((t%24)/10)*((t%24)/10)) + t/72.0 // generate a series with weekly seasonality and ongoing trend
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts
| extend series_decompose_anomalies(y)
| extend series_decompose_anomalies_y_ad_flag =
series_multiply(10, series_decompose_anomalies_y_ad_flag) // multiply by 10 for visualization purposes
| render timechart
Next, run the same example, but since you’re expecting a trend in the series, specify linefit
in the trend parameter. You can see that the baseline is much closer to the input series. All the inserted outliers are detected, and also some false positives. See the next example on tweaking the threshold.
let ts=range t from 1 to 24*7*5 step 1
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t
| extend y = 2*rand() + iff((t/24)%7>=5, 5.0, 15.0) - (((t%24)/10)*((t%24)/10)) + t/72.0 // generate a series with weekly seasonality and ongoing trend
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts
| extend series_decompose_anomalies(y, 1.5, -1, 'linefit')
| extend series_decompose_anomalies_y_ad_flag =
series_multiply(10, series_decompose_anomalies_y_ad_flag) // multiply by 10 for visualization purposes
| render timechart
Tweak the anomaly detection threshold
A few noisy points were detected as anomalies in the previous example. Now increase the anomaly detection threshold from a default of 1.5 to 2.5. Use this interpercentile range, so that only stronger anomalies are detected. Now, only the outliers you inserted in the data, will be detected.
let ts=range t from 1 to 24*7*5 step 1
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t
| extend y = 2*rand() + iff((t/24)%7>=5, 5.0, 15.0) - (((t%24)/10)*((t%24)/10)) + t/72.0 // generate a series with weekly seasonality and onlgoing trend
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts
| extend series_decompose_anomalies(y, 2.5, -1, 'linefit')
| extend series_decompose_anomalies_y_ad_flag =
series_multiply(10, series_decompose_anomalies_y_ad_flag) // multiply by 10 for visualization purposes
| render timechart
16.10 - series_decompose_forecast()
Forecast based on series decomposition.
Takes an expression containing a series (dynamic numerical array) as input, and predicts the values of the last trailing points. For more information, see series_decompose.
Syntax
series_decompose_forecast(
Series,
Points,
[ Seasonality,
Trend,
Seasonality_threshold ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
Series | dynamic | ✔️ | An array of numeric values, typically the resulting output of make-series or make_list operators. |
Points | int | ✔️ | Specifies the number of points at the end of the series to predict, or forecast. These points are excluded from the learning, or regression, process. |
Seasonality | int | Controls the seasonal analysis. The possible values are: - -1 : Autodetect seasonality using series_periods_detect. This is the default value.- Period: A positive integer specifying the expected period in number of bins. For example, if the series is in 1 - h bins, a weekly period is 168 bins.- 0 : No seasonality, so skip extracting this component. | |
Trend | string | Controls the trend analysis. The possible values are: - avg : Define trend component as average(x) . This is the default.- linefit : Extract trend component using linear regression.- none : No trend, so skip extracting this component. | |
Seasonality_threshold | real | The threshold for seasonality score when Seasonality is set to autodetect. The default score threshold is 0.6. For more information, see series_periods_detect. |
Returns
A dynamic array with the forecasted series.
Example
In the following example, we generate a series of four weeks in an hourly grain, with weekly seasonality and a small upward trend. We then use make-series
and add another empty week to the series. series_decompose_forecast
is called with a week (24*7 points), and it automatically detects the seasonality and trend, and generates a forecast of the entire five-week period.
let ts=range t from 1 to 24*7*4 step 1 // generate 4 weeks of hourly data
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t
| extend y = 2*rand() + iff((t/24)%7>=5, 5.0, 15.0) - (((t%24)/10)*((t%24)/10)) + t/72.0 // generate a series with weekly seasonality and ongoing trend
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| make-series y=max(y) on Timestamp from datetime(2018-03-01 05:00) to datetime(2018-03-01 05:00)+24*7*5h step 1h; // create a time series of 5 weeks (last week is empty)
ts
| extend y_forcasted = series_decompose_forecast(y, 24*7) // forecast a week forward
| render timechart
16.11 - series_decompose()
Applies a decomposition transformation on a series.
Takes an expression containing a series (dynamic numerical array) as input and decomposes it to seasonal, trend, and residual components.
Syntax
series_decompose(
Series ,
[ Seasonality,
Trend,
Test_points,
Seasonality_threshold ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
Series | dynamic | ✔️ | An array of numeric values, typically the resulting output of make-series or make_list operators. |
Seasonality | int | Controls the seasonal analysis. The possible values are: - -1 : Autodetect seasonality using series_periods_detect. This is the default value.- Period: A positive integer specifying the expected period in number of bins. For example, if the series is in 1 - h bins, a weekly period is 168 bins.- 0 : No seasonality, so skip extracting this component. | |
Trend | string | Controls the trend analysis. The possible values are: - avg : Define trend component as average(x) . This is the default.- linefit : Extract trend component using linear regression.- none : No trend, so skip extracting this component. | |
Test_points | int | A positive integer specifying the number of points at the end of the series to exclude from the learning, or regression, process. This parameter should be set for forecasting purposes. The default value is 0. | |
Seasonality_threshold | real | The threshold for seasonality score when Seasonality is set to autodetect. The default score threshold is 0.6. For more information, see series_periods_detect. |
Returns
The function returns the following respective series:
baseline
: the predicted value of the series (sum of seasonal and trend components, see below).seasonal
: the series of the seasonal component:- if the period isn’t detected or is explicitly set to 0: constant 0.
- if detected or set to positive integer: median of the series points in the same phase
trend
: the series of the trend component.residual
: the series of the residual component (that is, x - baseline).
More about series decomposition
This method is usually applied to time series of metrics expected to manifest periodic and/or trend behavior. You can use the method to forecast future metric values and/or detect anomalous values. The implicit assumption of this regression process is that apart from seasonal and trend behavior, the time series is stochastic and randomly distributed. Forecast future metric values from the seasonal and trend components while ignoring the residual part. Detect anomalous values based on outlier detection only on the residual part only. Further details can be found in the Time Series Decomposition chapter.
Examples
Weekly seasonality
In the following example, we generate a series with weekly seasonality and without trend, we then add some outliers to it. series_decompose
finds and automatically detects the seasonality, and generates a baseline that is almost identical to the seasonal component. The outliers we added can be clearly seen in the residuals component.
let ts=range t from 1 to 24*7*5 step 1
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t
| extend y = 2*rand() + iff((t/24)%7>=5, 10.0, 15.0) - (((t%24)/10)*((t%24)/10)) // generate a series with weekly seasonality
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts
| extend series_decompose(y)
| render timechart
Weekly seasonality with trend
In this example, we add a trend to the series from the previous example. First, we run series_decompose
with the default parameters. The trend avg
default value only takes the average and doesn’t compute the trend. The generated baseline doesn’t contain the trend. When observing the trend in the residuals, it becomes apparent that this example is less accurate than the previous example.
let ts=range t from 1 to 24*7*5 step 1
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t
| extend y = 2*rand() + iff((t/24)%7>=5, 5.0, 15.0) - (((t%24)/10)*((t%24)/10)) + t/72.0 // generate a series with weekly seasonality and ongoing trend
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts
| extend series_decompose(y)
| render timechart
Next, we rerun the same example. Since we’re expecting a trend in the series, we specify linefit
in the trend parameter. We can see that the positive trend is detected and the baseline is much closer to the input series. The residuals are close to zero, and only the outliers stand out. We can see all the components on the series in the chart.
let ts=range t from 1 to 24*7*5 step 1
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t
| extend y = 2*rand() + iff((t/24)%7>=5, 5.0, 15.0) - (((t%24)/10)*((t%24)/10)) + t/72.0 // generate a series with weekly seasonality and ongoing trend
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts
| extend series_decompose(y, -1, 'linefit')
| render timechart
Related content
- Visualize results with an anomalychart
16.12 - series_divide()
Calculates the element-wise division of two numeric series inputs.
Syntax
series_divide(
series1,
series2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series1, series2 | dynamic | ✔️ | The numeric arrays over which to calculate the element-wise division. The first array is to be divided by the second. |
Returns
Dynamic array of calculated element-wise divide operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null
element value.
Note: the result series is of double type, even if the inputs are integers. Division by zero follows the double division by zero (e.g. 2/0 yields double(+inf)).
Example
range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| project s1 = pack_array(x,y,z), s2 = pack_array(z, y, x)
| extend s1_divide_s2 = series_divide(s1, s2)
Output
s1 | s2 | s1_divide_s2 |
---|---|---|
[1,2,4] | [4,2,1] | [0.25,1.0,4.0] |
[2,4,8] | [8,4,2] | [0.25,1.0,4.0] |
[3,6,12] | [12,6,3] | [0.25,1.0,4.0] |
16.13 - series_dot_product()
Calculates the dot product of two numeric series.
The function series_dot_product()
takes two numeric series as input, and calculates their dot product.
Syntax
series_dot_product(
series1,
series2)
Alternate syntax
series_dot_product(
series,
numeric)
series_dot_product(
numeric,
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series1, series2 | dynamic | ✔️ | Input arrays with numeric data, to be element-wise multiplied and then summed into a value of type real . |
Returns
Returns a value of type real
whose value is the sum over the product of each element of series1 with the corresponding element of series2.
In case both series length isn’t equal, the longer series will be truncated to the length of the shorter one.
Any non-numeric element of the input series will be ignored.
Example
range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| project s1 = pack_array(x,y,z), s2 = pack_array(z, y, x)
| extend s1_dot_product_s2 = series_dot_product(s1, s2)
s1 | s2 | s1_dot_product_s2 |
---|---|---|
[1,2,4] | [4,2,1] | 12 |
[2,4,8] | [8,4,2] | 48 |
[3,6,12] | [12,6,3] | 108 |
range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| project s1 = pack_array(x,y,z), s2 = x
| extend s1_dot_product_s2 = series_dot_product(s1, s2)
s1 | s2 | s1_dot_product_s2 |
---|---|---|
[1,2,4] | 1 | 7 |
[2,4,8] | 2 | 28 |
[3,6,12] | 3 | 63 |
16.14 - series_equals()
==
) logic operation of two numeric series inputs.Calculates the element-wise equals (==
) logic operation of two numeric series inputs.
Syntax
series_equals (
series1,
series2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series1, series2 | dynamic | ✔️ | The numeric arrays to be element-wise compared. |
Returns
Dynamic array of booleans containing the calculated element-wise equal logic operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null
element value.
Example
print s1 = dynamic([1,2,4]), s2 = dynamic([4,2,1])
| extend s1_equals_s2 = series_equals(s1, s2)
Output
s1 | s2 | s1_equals_s2 |
---|---|---|
[1,2,4] | [4,2,1] | [false,true,false] |
Related content
For entire series statistics comparisons, see:
16.15 - series_exp()
Calculates the element-wise base-e exponential function (e^x) of the numeric series input.
Syntax
series_exp(
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values whose elements are applied as the exponent in the exponential function. |
Returns
Dynamic array of calculated exponential function. Any non-numeric element yields a null
element value.
Example
print s = dynamic([1,2,3])
| extend s_exp = series_exp(s)
Output
s | s_exp |
---|---|
[1,2,3] | [2.7182818284590451,7.38905609893065,20.085536923187668] |
16.16 - series_fft()
Applies the Fast Fourier Transform (FFT) on a series.
The series_fft() function takes a series of complex numbers in the time/spatial domain and transforms it to the frequency domain using the Fast Fourier Transform. The transformed complex series represents the magnitude and phase of the frequencies appearing in the original series. Use the complementary function series_ifft to transform from the frequency domain back to the time/spatial domain.
Syntax
series_fft(
x_real [,
x_imaginary])
Parameters
Name | Type | Required | Description |
---|---|---|---|
x_real | dynamic | ✔️ | A numeric array representing the real component of the series to transform. |
x_imaginary | dynamic | A similar array representing the imaginary component of the series. This parameter should only be specified if the input series contains complex numbers. |
Returns
The function returns the complex inverse fft in two series. The first series for the real component and the second one for the imaginary component.
Example
Generate a complex series, where the real and imaginary components are pure sine waves in different frequencies. Use FFT to transform it to the frequency domain:
[!div class=“nextstepaction”] Run the query
let sinewave=(x:double, period:double, gain:double=1.0, phase:double=0.0) { gain*sin(2*pi()/period*(x+phase)) } ; let n=128; // signal length range x from 0 to n-1 step 1 | extend yr=sinewave(x, 8), yi=sinewave(x, 32) | summarize x=make_list(x), y_real=make_list(yr), y_imag=make_list(yi) | extend (fft_y_real, fft_y_imag) = series_fft(y_real, y_imag) | render linechart with(ysplit=panels)
This query returns fft_y_real and fft_y_imag:
Transform a series to the frequency domain, and then apply the inverse transform to get back the original series:
[!div class=“nextstepaction”] Run the query
let sinewave=(x:double, period:double, gain:double=1.0, phase:double=0.0) { gain*sin(2*pi()/period*(x+phase)) } ; let n=128; // signal length range x from 0 to n-1 step 1 | extend yr=sinewave(x, 8), yi=sinewave(x, 32) | summarize x=make_list(x), y_real=make_list(yr), y_imag=make_list(yi) | extend (fft_y_real, fft_y_imag) = series_fft(y_real, y_imag) | extend (y_real2, y_image2) = series_ifft(fft_y_real, fft_y_imag) | project-away fft_y_real, fft_y_imag // too many series for linechart with panels | render linechart with(ysplit=panels)
This query returns y_real2 and *y_imag2, which are the same as y_real and y_imag:
16.17 - series_fill_backward()
Performs a backward fill interpolation of missing values in a series.
An expression containing dynamic numerical array is the input. The function replaces all instances of missing_value_placeholder with the nearest value from its right side (other than missing_value_placeholder), and returns the resulting array. The rightmost instances of missing_value_placeholder are preserved.
Syntax
series_fill_backward(
series[,
missing_value_placeholder])
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values. |
missing_value_placeholder | scalar | Specifies a placeholder for missing values. The default value is double( null) . The value can be of any type that will be converted to actual element types. double (null), long (null) and int (null) have the same meaning. |
Returns
series with all instances of missing_value_placeholder filled backwards.
Example
let data = datatable(arr: dynamic)
[
dynamic([111, null, 36, 41, null, null, 16, 61, 33, null, null])
];
data
| project
arr,
fill_backward = series_fill_backward(arr)
Output
arr | fill_backward |
---|---|
[111,null,36,41,null,null,16,61,33,null,null] | [111,36,36,41,16,16,16,61,33,null,null] |
16.18 - series_fill_const()
Replaces missing values in a series with a specified constant value.
Takes an expression containing dynamic numerical array as input, replaces all instances of missing_value_placeholder with the specified constant_value and returns the resulting array.
Syntax
series_fill_const(
series,
constant_value,
[ missing_value_placeholder ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values. |
constant_value | scalar | ✔️ | The value used to replace the missing values. |
missing_value_placeholder | scalar | Specifies a placeholder for missing values. The default value is double( null) . The value can be of any type that will be converted to actual element types. double (null), long (null) and int (null) have the same meaning. |
Returns
series with all instances of missing_value_placeholder replaced with constant_value.
Example
let data = datatable(arr: dynamic)
[
dynamic([111, null, 36, 41, 23, null, 16, 61, 33, null, null])
];
data
| project
arr,
fill_const1 = series_fill_const(arr, 0.0),
fill_const2 = series_fill_const(arr, -1)
Output
arr | fill_const1 | fill_const2 |
---|---|---|
[111,null,36,41,23,null,16,61,33,null,null] | [111,0.0,36,41,23,0.0,16,61,33,0.0,0.0] | [111,-1,36,41,23,-1,16,61,33,-1,-1] |
16.19 - series_fill_forward()
Performs a forward fill interpolation of missing values in a series.
An expression containing dynamic numerical array is the input. The function replaces all instances of missing_value_placeholder with the nearest value from its left side other than missing_value_placeholder, and returns the resulting array. The leftmost instances of missing_value_placeholder are preserved.
Syntax
series_fill_forward(
series,
[ missing_value_placeholder ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values. |
missing_value_placeholder | scalar | Specifies a placeholder for missing values. The default value is double( null) . The value can be of any type that will be converted to actual element types. double (null), long (null) and int (null) have the same meaning. |
Returns
series with all instances of missing_value_placeholder filled forwards.
Example
let data = datatable(arr: dynamic)
[
dynamic([null, null, 36, 41, null, null, 16, 61, 33, null, null])
];
data
| project
arr,
fill_forward = series_fill_forward(arr)
Output
arr | fill_forward |
---|---|
[null,null,36,41,null,null,16,61,33,null,null] | [null,null,36,41,41,41,16,61,33,33,33] |
Use series_fill_backward or series-fill-const to complete interpolation of the above array.
16.20 - series_fill_linear()
Linearly interpolates missing values in a series.
Takes an expression containing dynamic numerical array as input, does linear interpolation for all instances of missing_value_placeholder, and returns the resulting array. If the beginning and end of the array contain missing_value_placeholder, then it’s replaced with the nearest value other than missing_value_placeholder. This feature can be turned off. If the whole array consists of the missing_value_placeholder, the array is filled with constant_value, or 0 if not specified.
Syntax
series_fill_linear(
series,
[ missing_value_placeholder [,
fill_edges [,
constant_value ]]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values. |
missing_value_placeholder | scalar | Specifies a placeholder for missing values. The default value is double( null) . The value can be of any type that will be converted to actual element types. double (null), long (null) and int (null) have the same meaning. | |
fill_edges | bool | Indicates whether missing_value_placeholder at the start and end of the array should be replaced with nearest value. true by default. If set to false , then missing_value_placeholder at the start and end of the array will be preserved. | |
constant_value | scalar | Relevant only for arrays that entirely consist of null values. This parameter specifies a constant value with which to fill the series. Default value is 0. Setting this parameter it to double( null) preserves the null values. |
Returns
A series linear interpolation of series using the specified parameters. If series contains only int
or long
elements, then the linear interpolation returns rounded interpolated values rather than exact ones.
Example
let data = datatable(arr: dynamic)
[
dynamic([null, 111.0, null, 36.0, 41.0, null, null, 16.0, 61.0, 33.0, null, null]), // Array of double
dynamic([null, 111, null, 36, 41, null, null, 16, 61, 33, null, null]), // Similar array of int
dynamic([null, null, null, null]) // Array with missing values only
];
data
| project
arr,
without_args = series_fill_linear(arr),
with_edges = series_fill_linear(arr, double(null), true),
wo_edges = series_fill_linear(arr, double(null), false),
with_const = series_fill_linear(arr, double(null), true, 3.14159)
Output
arr | without_args | with_edges | wo_edges | with_const |
---|---|---|---|---|
[null,111.0,null,36.0,41.0,null,null,16.0,61.0,33.0,null,null] | [111.0,111.0,73.5,36.0,41.0,32.667,24.333,16.0,61.0,33.0,33.0,33.0] | [111.0,111.0,73.5,36.0,41.0,32.667,24.333,16.0,61.0,33.0,33.0,33.0] | [null,111.0,73.5,36.0,41.0,32.667,24.333,16.0,61.0,33.0,null,null] | [111.0,111.0,73.5,36.0,41.0,32.667,24.333,16.0,61.0,33.0,33.0,33.0] |
[null,111,null,36,41,null,null,16,61,33,null,null] | [111,111,73,36,41,32,24,16,61,33,33,33] | [111,111,73,36,41,32,24,16,61,33,33,33] | [null,111,73,36,41,32,24,16,61,33,null,null] | [111,111,74,38, 41,32,24,16,61,33,33,33] |
[null,null,null,null] | [0.0,0.0,0.0,0.0] | [0.0,0.0,0.0,0.0] | [0.0,0.0,0.0,0.0] | [3.14159,3.14159,3.14159,3.14159] |
16.21 - series_fir()
Applies a Finite Impulse Response (FIR) filter on a series.
The function takes an expression containing a dynamic numerical array as input and applies a Finite Impulse Response filter. By specifying the filter
coefficients, it can be used for calculating a moving average, smoothing, change-detection, and many more use cases. The function takes the column containing the dynamic array and a static dynamic array of the filter’s coefficients as input, and applies the filter on the column. It outputs a new dynamic array column, containing the filtered output.
Syntax
series_fir(
series,
filter [,
normalize[,
center]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values. |
filter | dynamic | ✔️ | An array of numeric values containing the coefficients of the filter. |
normalize | bool | Indicates whether the filter should be normalized. That is, divided by the sum of the coefficients. If filter contains negative values, then normalize must be specified as false , otherwise result will be null . If not specified, then a default value of true is assumed, depending on the presence of negative values in the filter. If filter contains at least one negative value, then normalize is assumed to be false . | |
center | bool | Indicates whether the filter is applied symmetrically on a time window before and after the current point, or on a time window from the current point backwards. By default, center is false , which fits the scenario of streaming data so that we can only apply the filter on the current and older points. However, for ad-hoc processing you can set it to true , keeping it synchronized with the time series. See examples below. This parameter controls the filter’s group delay. |
Returns
A new dynamic array column containing the filtered output.
Examples
- Calculate a moving average of five points by setting filter=[1,1,1,1,1] and normalize=
true
(default). Note the effect of center=false
(default) vs.true
:
range t from bin(now(), 1h) - 23h to bin(now(), 1h) step 1h
| summarize t=make_list(t)
| project
id='TS',
val=dynamic([0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 20, 40, 100, 40, 20, 10, 0, 0, 0, 0, 0, 0, 0, 0]),
t
| extend
5h_MovingAvg=series_fir(val, dynamic([1, 1, 1, 1, 1])),
5h_MovingAvg_centered=series_fir(val, dynamic([1, 1, 1, 1, 1]), true, true)
| render timechart
This query returns:
5h_MovingAvg: Five points moving average filter. The spike is smoothed and its peak shifted by (5-1)/2 = 2h.
5h_MovingAvg_centered: Same, but by setting center=true
, the peak stays in its original location.
- To calculate the difference between a point and its preceding one, set filter=[1,-1].
range t from bin(now(), 1h) - 11h to bin(now(), 1h) step 1h
| summarize t=make_list(t)
| project id='TS', t, value=dynamic([0, 0, 0, 0, 2, 2, 2, 2, 3, 3, 3, 3])
| extend diff=series_fir(value, dynamic([1, -1]), false, false)
| render timechart
16.22 - series_fit_2lines_dynamic()
Applies two segments linear regression on a series, returning a dynamic object.
Takes an expression containing dynamic numerical array as input and applies two segments linear regression in order to identify and quantify trend changes in a series. The function iterates on the series indexes. In each iteration, it splits the series to two parts, and fits a separate line using series_fit_line() or series_fit_line_dynamic(). The function fits the lines to each of the two parts, and calculates the total R-squared value. The best split is the one that maximizes R-squared. The function returns its parameters in dynamic value with the following content:
rsquare
: R-squared is a standard measure of the fit quality. It’s a number in the range of [0-1], where 1 is the best possible fit, and 0 means the data is unordered and don’t fit any line.split_idx
: the index of breaking point to two segments (zero-based).variance
: variance of the input data.rvariance
: residual variance that is the variance between the input data values the approximated ones (by the two line segments).line_fit
: numerical array holding a series of values of the best fitted line. The series length is equal to the length of the input array. It’s used for charting.right.rsquare
: r-square of the line on the right side of the split, see series_fit_line() or series_fit_line_dynamic().right.slope
: slope of the right approximated line (of the form y=ax+b).right.interception
: interception of the approximated left line (b from y=ax+b).right.variance
: variance of the input data on the right side of the split.right.rvariance
: residual variance of the input data on the right side of the split.left.rsquare
: r-square of the line on the left side of the split, see [series_fit_line()].(series-fit-line-function.md) or series_fit_line_dynamic().left.slope
: slope of the left approximated line (of the form y=ax+b).left.interception
: interception of the approximated left line (of the form y=ax+b).left.variance
: variance of the input data on the left side of the split.left.rvariance
: residual variance of the input data on the left side of the split.
This operator is similar to series_fit_2lines. Unlike series-fit-2lines
, it returns a dynamic bag.
Syntax
series_fit_2lines_dynamic(
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values. |
Example
print
id=' ',
x=range(bin(now(), 1h) - 11h, bin(now(), 1h), 1h),
y=dynamic([1, 2.2, 2.5, 4.7, 5.0, 12, 10.3, 10.3, 9, 8.3, 6.2])
| extend
LineFit=series_fit_line_dynamic(y).line_fit,
LineFit2=series_fit_2lines_dynamic(y).line_fit
| project id, x, y, LineFit, LineFit2
| render timechart
16.23 - series_fit_2lines()
Applies a two segmented linear regression on a series, returning multiple columns.
Takes an expression containing dynamic numerical array as input and applies a two segmented linear regression in order to identify and quantify a trend change in a series. The function iterates on the series indexes. In each iteration, the function splits the series to two parts, fits a separate line (using series_fit_line()) to each part, and calculates the total r-square. The best split is the one that maximized r-square; the function returns its parameters:
Parameter | Description |
---|---|
rsquare | R-square is standard measure of the fit quality. It’s a number in the range [0-1], where 1 - is the best possible fit, and 0 means the data is unordered and don’t fit any line. |
split_idx | The index of breaking point to two segments (zero-based). |
variance | Variance of the input data. |
rvariance | Residual variance, which is the variance between the input data values the approximated ones (by the two line segments). |
line_fit | Numerical array holding a series of values of the best fitted line. The series length is equal to the length of the input array. It’s mainly used for charting. |
right_rsquare | R-square of the line on the right side of the split, see series_fit_line(). |
right_slope | Slope of the right approximated line (of the form y=ax+b). |
right_interception | Interception of the approximated left line (b from y=ax+b). |
right_variance | Variance of the input data on the right side of the split. |
right_rvariance | Residual variance of the input data on the right side of the split. |
left_rsquare | R-square of the line on the left side of the split, see series_fit_line(). |
left_slope | Slope of the left approximated line (of the form y=ax+b). |
left_interception | Interception of the approximated left line (of the form y=ax+b). |
left_variance | Variance of the input data on the left side of the split. |
left_rvariance | Residual variance of the input data on the left side of the split. |
Syntax
project series_fit_2lines(
series)
- Will return all mentioned above columns with the following names: series_fit_2lines_x_rsquare, series_fit_2lines_x_split_idx etc.
project (rs, si, v)=series_fit_2lines(
series)
- Will return the following columns: rs (r-square), si (split index), v (variance) and the rest will look like series_fit_2lines_x_rvariance, series_fit_2lines_x_line_fit and etc.
extend (rs, si, v)=series_fit_2lines(
series)
- Will return only: rs (r-square), si (split index) and v (variance).
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values. |
Examples
print
id=' ',
x=range(bin(now(), 1h) - 11h, bin(now(), 1h), 1h),
y=dynamic([1, 2.2, 2.5, 4.7, 5.0, 12, 10.3, 10.3, 9, 8.3, 6.2])
| extend
(Slope, Interception, RSquare, Variance, RVariance, LineFit)=series_fit_line(y),
(RSquare2, SplitIdx, Variance2, RVariance2, LineFit2)=series_fit_2lines(y)
| project id, x, y, LineFit, LineFit2
| render timechart
16.24 - series_fit_line_dynamic()
Applies linear regression on a series, returning dynamic object.
Takes an expression containing dynamic numerical array as input, and does linear regression to find the line that best fits it. This function should be used on time series arrays, fitting the output of make-series operator. It generates a dynamic value with the following content:
rsquare
: r-square is a standard measure of the fit quality. It’s a number in the range [0-1], where 1 is the best possible fit, and 0 means the data is unordered and doesn’t fit any lineslope
: Slope of the approximated line (the a-value from y=ax+b)variance
: Variance of the input datarvariance
: Residual variance that is the variance between the input data values and the approximated ones.interception
: Interception of the approximated line (the b-value from y=ax+b)line_fit
: Numerical array containing a series of values of the best fit line. The series length is equal to the length of the input array. It’s used mainly for charting.
This operator is similar to series_fit_line, but unlike series-fit-line
it returns a dynamic bag.
Syntax
series_fit_line_dynamic(
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values. |
Examples
print
id=' ',
x=range(bin(now(), 1h) - 11h, bin(now(), 1h), 1h),
y=dynamic([2, 5, 6, 8, 11, 15, 17, 18, 25, 26, 30, 30])
| extend fit=series_fit_line_dynamic(y)
| extend
RSquare=fit.rsquare,
Slope=fit.slope,
Variance=fit.variance,
RVariance=fit.rvariance,
Interception=fit.interception,
LineFit=fit.line_fit
| render timechart
RSquare | Slope | Variance | RVariance | Interception | LineFit |
---|---|---|---|---|---|
0.982 | 2.730 | 98.628 | 1.686 | -1.666 | 1.064, 3.7945, 6.526, 9.256, 11.987, 14.718, 17.449, 20.180, 22.910, 25.641, 28.371, 31.102 |
16.25 - series_fit_line()
Applies linear regression on a series, returning multiple columns.
Takes an expression containing dynamic numerical array as input and does linear regression to find the line that best fits it. This function should be used on time series arrays, fitting the output of make-series operator. The function generates the following columns:
rsquare
: r-square is a standard measure of the fit quality. The value’s a number in the range [0-1], where 1 - is the best possible fit, and 0 means the data is unordered and doesn’t fit any line.slope
: Slope of the approximated line (“a” from y=ax+b).variance
: Variance of the input data.rvariance
: Residual variance that is the variance between the input data values the approximated ones.interception
: Interception of the approximated line (“b” from y=ax+b).line_fit
: Numerical array holding a series of values of the best fitted line. The series length is equal to the length of the input array. The value’s used for charting.
Syntax
series_fit_line(
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values. |
Examples
print
id=' ',
x=range(bin(now(), 1h) - 11h, bin(now(), 1h), 1h),
y=dynamic([2, 5, 6, 8, 11, 15, 17, 18, 25, 26, 30, 30])
| extend (RSquare, Slope, Variance, RVariance, Interception, LineFit)=series_fit_line(y)
| render timechart
RSquare | Slope | Variance | RVariance | Interception | LineFit |
---|---|---|---|---|---|
0.982 | 2.730 | 98.628 | 1.686 | -1.666 | 1.064, 3.7945, 6.526, 9.256, 11.987, 14.718, 17.449, 20.180, 22.910, 25.641, 28.371, 31.102 |
16.26 - series_fit_poly()
Applies a polynomial regression from an independent variable (x_series) to a dependent variable (y_series). This function takes a table containing multiple series (dynamic numerical arrays) and generates the best fit high-order polynomial for each series using polynomial regression.
Syntax
T | extend series_fit_poly(
y_series [,
x_series,
degree ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
y_series | dynamic | ✔️ | An array of numeric values containing the dependent variable. |
x_series | dynamic | An array of numeric values containing the independent variable. Required only for unevenly spaced series. If not specified, it’s set to a default value of [1, 2, …, length(y_series)]. | |
degree | The required order of the polynomial to fit. For example, 1 for linear regression, 2 for quadratic regression, and so on. Defaults to 1, which indicates linear regression. |
Returns
The series_fit_poly()
function returns the following columns:
rsquare
: r-square is a standard measure of the fit quality. The value’s a number in the range [0-1], where 1 - is the best possible fit, and 0 means the data is unordered and doesn’t fit any line.coefficients
: Numerical array holding the coefficients of the best fitted polynomial with the given degree, ordered from the highest power coefficient to the lowest.variance
: Variance of the dependent variable (y_series).rvariance
: Residual variance that is the variance between the input data values the approximated ones.poly_fit
: Numerical array holding a series of values of the best fitted polynomial. The series length is equal to the length of the dependent variable (y_series). The value’s used for charting.
Examples
Example 1
A fifth order polynomial with noise on x & y axes:
range x from 1 to 200 step 1
| project x = rand()*5 - 2.3
| extend y = pow(x, 5)-8*pow(x, 3)+10*x+6
| extend y = y + (rand() - 0.5)*0.5*y
| summarize x=make_list(x), y=make_list(y)
| extend series_fit_poly(y, x, 5)
| project-rename fy=series_fit_poly_y_poly_fit, coeff=series_fit_poly_y_coefficients
|fork (project x, y, fy) (project-away x, y, fy)
| render linechart
Example 2
Verify that series_fit_poly
with degree=1 matches series_fit_line
:
demo_series1
| extend series_fit_line(y)
| extend series_fit_poly(y)
| project-rename y_line = series_fit_line_y_line_fit, y_poly = series_fit_poly_y_poly_fit
| fork (project x, y, y_line, y_poly) (project-away id, x, y, y_line, y_poly)
| render linechart with(xcolumn=x, ycolumns=y, y_line, y_poly)
Example 3
Irregular (unevenly spaced) time series:
//
// x-axis must be normalized to the range [0-1] if either degree is relatively big (>= 5) or original x range is big.
// so if x is a time axis it must be normalized as conversion of timestamp to long generate huge numbers (number of 100 nano-sec ticks from 1/1/1970)
//
// Normalization: x_norm = (x - min(x))/(max(x) - min(x))
//
irregular_ts
| extend series_stats(series_add(TimeStamp, 0)) // extract min/max of time axis as doubles
| extend x = series_divide(series_subtract(TimeStamp, series_stats__min), series_stats__max-series_stats__min) // normalize time axis to [0-1] range
| extend series_fit_poly(num, x, 8)
| project-rename fnum=series_fit_poly_num_poly_fit
| render timechart with(ycolumns=num, fnum)
16.27 - series_floor()
Calculates the element-wise floor function of the numeric series input.
Syntax
series_floor(
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values on which the floor function is applied. |
Returns
Dynamic array of the calculated floor function. Any non-numeric element yields a null
element value.
Example
print s = dynamic([-1.5,1,2.5])
| extend s_floor = series_floor(s)
Output
s | s_floor |
---|---|
[-1.5,1,2.5] | [-2.0,1.0,2.0] |
16.28 - series_greater_equals()
>=
) logic operation of two numeric series inputs.Calculates the element-wise greater or equals (>=
) logic operation of two numeric series inputs.
Syntax
series_greater_equals(
series1,
series2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series1, series2 | dynamic | ✔️ | The arrays of numeric values to be element-wise compared. |
Returns
Dynamic array of booleans containing the calculated element-wise greater or equal logic operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null
element value.
Example
print s1 = dynamic([1,2,4]), s2 = dynamic([4,2,1])
| extend s1_greater_equals_s2 = series_greater_equals(s1, s2)
Output
s1 | s2 | s1_greater_equals_s2 |
---|---|---|
[1,2,4] | [4,2,1] | [false,true,true] |
Related content
For entire series statistics comparisons, see:
16.29 - series_greater()
>
) logic operation of two numeric series inputs.Calculates the element-wise greater (>
) logic operation of two numeric series inputs.
Syntax
series_greater(
series1,
series2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series1, series2 | dynamic | ✔️ | The arrays of numeric values to be element-wise compared. |
Returns
Dynamic array of booleans containing the calculated element-wise greater logic operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null
element value.
Example
print s1 = dynamic([1,2,4]), s2 = dynamic([4,2,1])
| extend s1_greater_s2 = series_greater(s1, s2)
Output
s1 | s2 | s1_greater_s2 |
---|---|---|
[1,2,4] | [4,2,1] | [false,false,true] |
Related content
For entire series statistics comparisons, see:
16.30 - series_ifft()
Applies the Inverse Fast Fourier Transform (IFFT) on a series.
The series_ifft() function takes a series of complex numbers in the frequency domain and transforms it back to the time/spatial domain using the Fast Fourier Transform. This function is the complementary function of series_fft. Commonly the original series is transformed to the frequency domain for spectral processing and then back to the time/spatial domain.
Syntax
series_ifft(
fft_real [,
fft_imaginary])
Parameters
Name | Type | Required | Description |
---|---|---|---|
fft_real | dynamic | ✔️ | An array of numeric values representing the real component of the series to transform. |
fft_imaginary | dynamic | An array of numeric values representing the imaginary component of the series. This parameter should be specified only if the input series contains complex numbers. |
Returns
The function returns the complex inverse fft in two series. The first series for the real component and the second one for the imaginary component.
Example
See series_fft
16.31 - series_iir()
Applies an Infinite Impulse Response filter on a series.
The function takes an expression containing dynamic numerical array as input, and applies an Infinite Impulse Response filter. By specifying the filter coefficients, you can use the function to:
- calculate the cumulative sum of the series
- apply smoothing operations
- apply various high-pass, band-pass, and low-pass filters
The function takes as input the column containing the dynamic array and two static dynamic arrays of the filter’s denominators and numerators coefficients, and applies the filter on the column. It outputs a new dynamic array column, containing the filtered output.
Syntax
series_iir(
series,
numerators ,
denominators)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values, typically the resulting output of make-series or make_list operators. |
numerators | dynamic | ✔️ | An array of numeric values, containing the numerator coefficients of the filter. |
denominators | dynamic | ✔️ | An array of numeric values, containing the denominator coefficients of the filter. |
The filter’s recursive formula
- Consider an input array X, and coefficients arrays a and b of lengths n_a and n_b respectively. The transfer function of the filter that will generate the output array Y, is defined by:
Example
Calculate a cumulative sum. Use the iir filter with coefficients denominators=[1,-1] and numerators=[1]:
let x = range(1.0, 10, 1);
print x=x, y = series_iir(x, dynamic([1]), dynamic([1,-1]))
| mv-expand x, y
Output
x | y |
---|---|
1.0 | 1.0 |
2.0 | 3.0 |
3.0 | 6.0 |
4.0 | 10.0 |
Here’s how to wrap it in a function:
let vector_sum=(x: dynamic) {
let y=array_length(x) - 1;
todouble(series_iir(x, dynamic([1]), dynamic([1, -1]))[y])
};
print d=dynamic([0, 1, 2, 3, 4])
| extend dd=vector_sum(d)
Output
d | dd |
---|---|
[0,1,2,3,4] | 10 |
16.32 - series_less_equals()
<=
) logic operation of two numeric series inputs.Calculates the element-wise less or equal (<=
) logic operation of two numeric series inputs.
Syntax
series_less_equals(
series1,
series2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series1, series2 | dynamic | ✔️ | The arrays of numeric values to be element-wise compared. |
Returns
Dynamic array of booleans containing the calculated element-wise less or equal logic operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null
element value.
Example
print s1 = dynamic([1,2,4]), s2 = dynamic([4,2,1])
| extend s1_less_equals_s2 = series_less_equals(s1, s2)
Output
s1 | s2 | s1_less_equals_s2 |
---|---|---|
[1,2,4] | [4,2,1] | [true,true,false] |
Related content
For entire series statistics comparisons, see:
16.33 - series_less()
<
) logic operation of two numeric series inputs.Calculates the element-wise less (<
) logic operation of two numeric series inputs.
Syntax
series_less(
series1,
series2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series1, series2 | dynamic | ✔️ | The arrays of numeric values to be element-wise compared. |
Returns
Dynamic array of booleans containing the calculated element-wise less logic operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null
element value.
Example
print s1 = dynamic([1,2,4]), s2 = dynamic([4,2,1])
| extend s1_less_s2 = series_less(s1, s2)
Output
s1 | s2 | s1_less_s2 |
---|---|---|
[1,2,4] | [4,2,1] | [true,false,false] |
Related content
For entire series statistics comparisons, see:
16.34 - series_log()
Calculates the element-wise natural logarithm function (base-e) of the numeric series input.
Syntax
series_log(
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values on which the natural logarithm function is applied. |
Returns
Dynamic array of the calculated natural logarithm function. Any non-numeric element yields a null
element value.
Example
print s = dynamic([1,2,3])
| extend s_log = series_log(s)
Output
s | s_log |
---|---|
[1,2,3] | [0.0,0.69314718055994529,1.0986122886681098] |
16.35 - series_magnitude()
Calculates the magnitude of series elements. This is equivalent to the square root of the dot product of the series with itself.
Syntax
series_magnitude(
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | Array of numeric values. |
Returns
Returns a double type value representing the magnitude of the series.
Example
print arr=dynamic([1,2,3,4])
| extend series_magnitude=series_magnitude(arr)
Output
s1 | s2 | series_magnitude |
---|---|---|
[1,2,3,4] | 5.4772255750516612 |
16.36 - series_multiply()
Calculates the element-wise multiplication of two numeric series inputs.
Syntax
series_multiply(
series1,
series2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series1, series2 | dynamic | ✔️ | The arrays of numeric values to be element-wise multiplied. |
Returns
Dynamic array of calculated element-wise multiplication operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null
element value.
Example
range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| project s1 = pack_array(x,y,z), s2 = pack_array(z, y, x)
| extend s1_multiply_s2 = series_multiply(s1, s2)
Output
s1 | s2 | s1_multiply_s2 |
---|---|---|
[1,2,4] | [4,2,1] | [4,4,4] |
[2,4,8] | [8,4,2] | [16,16,16] |
[3,6,12] | [12,6,3] | [36,36,36] |
16.37 - series_not_equals()
!=
) logic operation of two numeric series inputs.Calculates the element-wise not equals (!=
) logic operation of two numeric series inputs.
Syntax
series_not_equals(
series1,
series2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series1, series2 | dynamic | ✔️ | The arrays of numeric values to be element-wise compared. |
Returns
Dynamic array of booleans containing the calculated element-wise not equal logic operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null
element value.
Example
print s1 = dynamic([1,2,4]), s2 = dynamic([4,2,1])
| extend s1_not_equals_s2 = series_not_equals(s1, s2)
Output
s1 | s2 | s1_not_equals_s2 |
---|---|---|
[1,2,4] | [4,2,1] | [true,false,true] |
Related content
For entire series statistics comparisons, see:
16.38 - series_outliers()
Scores anomaly points in a series.
The function takes an expression with a dynamic numerical array as input, and generates a dynamic numeric array of the same length. Each value of the array indicates a score of a possible anomaly, using “Tukey’s test”. A value greater than 1.5 in the same element of the input indicates a rise anomaly. A value less than -1.5 indicates a decline anomaly.
Syntax
series_outliers(
series [,
kind ] [,
ignore_val ] [,
min_percentile ] [,
max_percentile ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values. |
kind | string | The algorithm to use for outlier detection. The supported options are "tukey" , which is traditional “Tukey”, and "ctukey" , which is custom “Tukey”. The default is "ctukey" . | |
ignore_val | int, long, or real | A numeric value indicating the missing values in the series. The default is double( null) . The score of nulls and ignore values is set to 0 . | |
min_percentile | int, long, or real | The minimum percentile to use to calculate the normal inter-quantile range. The default is 10. The value must be in the range [2.0, 98.0] . This parameter is only relevant for the "ctukey" kind. | |
max_percentile | int, long, or real | The maximum percentile to use to calculate the normal inter-quantile range. The default is 90. The value must be in the range [2.0, 98.0] . This parameter is only relevant for the "ctukey" kind. |
The following table describes differences between "tukey"
and "ctukey"
:
Algorithm | Default quantile range | Supports custom quantile range |
---|---|---|
"tukey" | 25% / 75% | No |
"ctukey" | 10% / 90% | Yes |
Example
range x from 0 to 364 step 1
| extend t = datetime(2023-01-01) + 1d*x
| extend y = rand() * 10
| extend y = iff(monthofyear(t) != monthofyear(prev(t)), y+20, y) // generate a sample series with outliers at first day of each month
| summarize t = make_list(t), series = make_list(y)
| extend outliers=series_outliers(series)
| extend pos_anomalies = array_iff(series_greater_equals(outliers, 1.5), 1, 0)
| render anomalychart with(xcolumn=t, ycolumns=series, anomalycolumns=pos_anomalies)
16.39 - series_pearson_correlation()
Calculates the pearson correlation coefficient of two numeric series inputs.
See: Pearson correlation coefficient.
Syntax
series_pearson_correlation(
series1,
series2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series1, series2 | dynamic | ✔️ | The arrays of numeric values for calculating the correlation coefficient. |
Returns
The calculated Pearson correlation coefficient between the two inputs. Any non-numeric element or nonexisting element (arrays of different sizes) yields a null
result.
Example
range s1 from 1 to 5 step 1
| extend s2 = 2 * s1 // Perfect correlation
| summarize s1 = make_list(s1), s2 = make_list(s2)
| extend correlation_coefficient = series_pearson_correlation(s1, s2)
Output
s1 | s2 | correlation_coefficient |
---|---|---|
[1,2,3,4,5] | [2,4,6,8,10] | 1 |
16.40 - series_periods_detect()
Finds the most significant periods within a time series.
The series_periods_detect() function is useful for detecting periodic patterns in data, such as daily, weekly, or monthly cycles.
Syntax
series_periods_detect(
series,
min_period,
max_period,
num_periods)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values, typically the resulting output of the make-series or make_list operators. |
min_period | real | ✔️ | The minimal period length for which to search. |
max_period | real | ✔️ | The maximal period length for which to search. |
num_periods | long | ✔️ | The maximum number of periods to return. This number is the length of the output dynamic arrays. |
Returns
The function returns a table with two columns:
- periods: A dynamic array containing the periods found, in units of the bin size, ordered by their scores.
- scores: A dynamic array containing values between 0 and 1. Each array measures the significance of a period in its respective position in the periods array.
Example
The following query embeds a snapshot of application traffic for one month. The amount of traffic is aggregated twice a day, meaning the bin size is 12 hours. The query produces a line chart clearly showing a pattern in the data.
print y=dynamic([80, 139, 87, 110, 68, 54, 50, 51, 53, 133, 86, 141, 97, 156, 94, 149, 95, 140, 77, 61, 50, 54, 47, 133, 72, 152, 94, 148, 105, 162, 101, 160, 87, 63, 53, 55, 54, 151, 103, 189, 108, 183, 113, 175, 113, 178, 90, 71, 62, 62, 65, 165, 109, 181, 115, 182, 121, 178, 114, 170])
| project x=range(1, array_length(y), 1), y
| render linechart
You can run the series_periods_detect()
function on the same series to identify the recurring patterns. The function searches for patterns in the specified period range and returns two values. The first value indicates a detected pattern that is 14 point long with a score of approximately .84. The other value is zero that indicates no additional pattern was found.
print y=dynamic([80, 139, 87, 110, 68, 54, 50, 51, 53, 133, 86, 141, 97, 156, 94, 149, 95, 140, 77, 61, 50, 54, 47, 133, 72, 152, 94, 148, 105, 162, 101, 160, 87, 63, 53, 55, 54, 151, 103, 189, 108, 183, 113, 175, 113, 178, 90, 71, 62, 62, 65, 165, 109, 181, 115, 182, 121, 178, 114, 170])
| project x=range(1, array_length(y), 1), y
| project series_periods_detect(y, 0.0, 50.0, 2)
Output
series_periods_detect_y_periods | series_periods_detect_y_periods_scores |
---|---|
[14, 0] | [0.84, 0] |
The value in series_periods_detect_y_periods_scores is truncated.
Related content
16.41 - series_periods_validate()
Checks whether a time series contains periodic patterns of given lengths.
Often a metric measuring the traffic of an application is characterized by a weekly or daily period. This period can be confirmed by running series_periods_validate()
that checks for a weekly and daily period.
Syntax
series_periods_validate(
series,
period1 [ ,
period2 ,
. . . ] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values, typically the resulting output of make-series or make_list operators. |
period1, period2, etc. | real | ✔️ | The periods to validate in units of the bin size. For example, if the series is in 1h bins, a weekly period is 168 bins. At least one period is required. |
Returns
The function outputs a table with two columns:
- periods: A dynamic array that contains the periods to validate as supplied in the input.
- scores: A dynamic array that contains a score between 0 and 1. The score shows the significance of a period in its respective position in the periods array.
Example
The following query embeds a snapshot of a month of an application’s traffic, aggregated twice a day (the bin size is 12 hours).
print y=dynamic([80, 139, 87, 110, 68, 54, 50, 51, 53, 133, 86, 141, 97, 156, 94, 149, 95, 140, 77, 61, 50, 54, 47, 133, 72, 152, 94, 148, 105, 162, 101, 160, 87, 63, 53, 55, 54, 151, 103, 189, 108, 183, 113, 175, 113, 178, 90, 71, 62, 62, 65, 165, 109, 181, 115, 182, 121, 178, 114, 170])
| project x=range(1, array_length(y), 1), y
| render linechart
If you run series_periods_validate()
on this series to validate a weekly period (14 points long) it results in a high score, and with a 0 score when you validate a five-day period (10 points long).
print y=dynamic([80, 139, 87, 110, 68, 54, 50, 51, 53, 133, 86, 141, 97, 156, 94, 149, 95, 140, 77, 61, 50, 54, 47, 133, 72, 152, 94, 148, 105, 162, 101, 160, 87, 63, 53, 55, 54, 151, 103, 189, 108, 183, 113, 175, 113, 178, 90, 71, 62, 62, 65, 165, 109, 181, 115, 182, 121, 178, 114, 170])
| project x=range(1, array_length(y), 1), y
| project series_periods_validate(y, 14.0, 10.0)
Output
series_periods_validate_y_periods | series_periods_validate_y_scores |
---|---|
[14.0, 10.0] | [0.84, 0.0] |
16.42 - series_seasonal()
Calculates the seasonal component of a series, according to the detected or given seasonal period.
Syntax
series_seasonal(
series [,
period ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values. |
period | int | The number of bins for each seasonal period. This value can be any positive integer. By default, the value is set to -1, which automatically detects the period using the series_periods_detect() with a threshold of 0.7. If seasonality is not detected, the function returns zeros. If a different value is set, it ignores seasonality and returns a series of zeros. |
Returns
A dynamic array of the same length as the series input that contains the calculated seasonal component of the series. The seasonal component is calculated as the median of all the values that correspond to the location of the bin, across the periods.
Examples
Auto detect the period
In the following example, the series’ period is automatically detected. The first series’ period is detected to be six bins and the second five bins. The third series’ period is too short to be detected and returns a series of zeroes. See the next example on how to force the period.
print s=dynamic([2, 5, 3, 4, 3, 2, 1, 2, 3, 4, 3, 2, 1, 2, 3, 4, 3, 2, 1, 2, 3, 4, 3, 2, 1])
| union (print s=dynamic([8, 12, 14, 12, 10, 10, 12, 14, 12, 10, 10, 12, 14, 12, 10, 10, 12, 14, 12, 10]))
| union (print s=dynamic([1, 3, 5, 2, 4, 6, 1, 3, 5, 2, 4, 6]))
| extend s_seasonal = series_seasonal(s)
Output
s | s_seasonal |
---|---|
[2,5,3,4,3,2,1,2,3,4,3,2,1,2,3,4,3,2,1,2,3,4,3,2,1] | [1.0,2.0,3.0,4.0,3.0,2.0,1.0,2.0,3.0,4.0,3.0,2.0,1.0,2.0,3.0,4.0,3.0,2.0,1.0,2.0,3.0,4.0,3.0,2.0,1.0] |
[8,12,14,12,10,10,12,14,12,10,10,12,14,12,10,10,12,14,12,10] | [10.0,12.0,14.0,12.0,10.0,10.0,12.0,14.0,12.0,10.0,10.0,12.0,14.0,12.0,10.0,10.0,12.0,14.0,12.0,10.0] |
[1,3,5,2,4,6,1,3,5,2,4,6] | [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0] |
Force a period
In this example, the series’ period is too short to be detected by series_periods_detect(), so we explicitly force the period to get the seasonal pattern.
print s=dynamic([1, 3, 5, 1, 3, 5, 2, 4, 6])
| union (print s=dynamic([1, 3, 5, 2, 4, 6, 1, 3, 5, 2, 4, 6]))
| extend s_seasonal = series_seasonal(s, 3)
Output
s | s_seasonal |
---|---|
[1,3,5,1,3,5,2,4,6] | [1.0,3.0,5.0,1.0,3.0,5.0,1.0,3.0,5.0] |
[1,3,5,2,4,6,1,3,5,2,4,6] | [1.5,3.5,5.5,1.5,3.5,5.5,1.5,3.5,5.5,1.5,3.5,5.5] |
Related content
16.43 - series_sign()
Calculates the element-wise sign of the numeric series input.
Syntax
series_sign(
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values over which the sign function is applied. |
Returns
A dynamic array of calculated sign function values. -1 for negative, 0 for 0, and 1 for positive. Any non-numeric element yields a null
element value.
Example
print arr = dynamic([-6, 0, 8])
| extend arr_sign = series_sign(arr)
Output
arr | arr_sign |
---|---|
[-6,0,8] | [-1,0,1] |
16.44 - series_sin()
Calculates the element-wise sine of the numeric series input.
Syntax
series_sin(
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values over which the sine function is applied. |
Returns
A dynamic array of calculated sine function values. Any non-numeric element yields a null
element value.
Example
print arr = dynamic([-1, 0, 1])
| extend arr_sin = series_sin(arr)
Output
arr | arr_sin |
---|---|
[-6.5,0,8.2] | [-0.8414709848078965,0.0,0.8414709848078965] |
16.45 - series_stats_dynamic()
Returns statistics for a series in a dynamic object.
Syntax
series_stats_dynamic(
series [,
ignore_nonfinite ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values. |
ignore_nonfinite | bool | Indicates whether to calculate the statistics while ignoring non-finite values, such as null, NaN, inf, and so on. The default is false , which returns null if non-finite values are present in the array. |
Returns
A dynamic property bag object with the following content:
min
: The minimum value in the input array.min_idx
: The first position of the minimum value in the input array.max
: The maximum value in the input array.max_idx
: The first position of the maximum value in the input array.avg
: The average value of the input array.variance
: The sample variance of input array.stdev
: The sample standard deviation of the input array.sum
: The sum of the values in the input array.len
: The length of the input array.
Example
print x=dynamic([23, 46, 23, 87, 4, 8, 3, 75, 2, 56, 13, 75, 32, 16, 29])
| project stats=series_stats_dynamic(x)
Output
stats |
---|
{“min”: 2.0, “min_idx”: 8, “max”: 87.0, “max_idx”: 3, “avg”: 32.8, “stdev”: 28.503633853548269, “variance”: 812.45714285714291, “sum”: 492.0, “len”: 15} |
The following query creates a series of the average taxi fare per minute, and then calculates statistics on these average fares:
nyc_taxi
| make-series Series=avg(fare_amount) on pickup_datetime step 1min
| project Stats=series_stats_dynamic(Series)
Output
Stats |
---|
{“min”:0,“min_idx”:96600,“max”:“31.779069767441861”,“max_idx”:481260,“avg”:“13.062685479531414”,“stdev”:“1.7730590207741219”,“variance”:“3.1437382911484884”,“sum”:“6865747.488041711”,“len”:525600} |
16.46 - series_stats()
Returns statistics for a numerical series in a table with a column for each statistic.
Syntax
...
|
extend
(
Name,
… )
=
series_stats
(
series [,
ignore_nonfinite] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
Name | string | The column labels for the output table. If not provided, the system will generate them. If you provide a limited number of names, the table will show only those columns. | |
series | dynamic | ✔️ | An array of numeric values. |
ignore_nonfinite | bool | Determines if the calculation includes non-finite values like null , NaN , inf , and so on. The default is false , which will result in null if non-finite values are present. |
Returns
A table with a column for each of the statistics displayed in the following table.
Statistic | Description |
---|---|
min | The minimum value in the input array. |
min_idx | The first position of the minimum value in the input array. |
max | The maximum value in the input array. |
max_idx | The first position of the maximum value in the input array. |
avg | The average value of the input array. |
variance | The sample variance of input array. |
stdev | The sample standard deviation of the input array. |
Example
print x=dynamic([23, 46, 23, 87, 4, 8, 3, 75, 2, 56, 13, 75, 32, 16, 29])
| project series_stats(x)
Output
series_stats_x_min | series_stats_x_min_idx | series_stats_x_max | series_stats_x_max_idx | series_stats_x_avg | series_stats_x_stdev | series_stats_x_variance |
---|---|---|---|---|---|---|
2 | 8 | 87 | 3 | 32.8 | 28.5036338535483 | 812.457142857143 |
16.47 - series_subtract()
Calculates the element-wise subtraction of two numeric series inputs.
Syntax
series_subtract(
series1,
series2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series1, series2 | dynamic | ✔️ | Arrays of numeric values, the second array to be element-wise subtracted from the first array. |
Returns
A dynamic array of calculated element-wise subtract operation between the two inputs. Any non-numeric element or non-existing element, such as in the case of arrays of different sizes, yields a null
element value.
Example
range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| project s1 = pack_array(x,y,z), s2 = pack_array(z, y, x)
| extend s1_subtract_s2 = series_subtract(s1, s2)
Output
s1 | s2 | s1_subtract_s2 |
---|---|---|
[1,2,4] | [4,2,1] | [-3,0,3] |
[2,4,8] | [8,4,2] | [-6,0,6] |
[3,6,12] | [12,6,3] | [-9,0,9] |
16.48 - series_sum()
Calculates the sum of series elements.
Syntax
series_sum(
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | Array of numeric values. |
Returns
Returns a double type value with the sum of the elements of the array.
Example
print arr=dynamic([1,2,3,4])
| extend series_sum=series_sum(arr)
Output
s1 | series_sum |
---|---|
[1,2,3,4] | 10 |
16.49 - series_tan()
Calculates the element-wise tangent of the numeric series input.
Syntax
series_tan(
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values on which the tangent function is applied. |
Returns
A dynamic array of calculated tangent function values. Any non-numeric element yields a null
element value.
Example
print arr = dynamic([-1, 0, 1])
| extend arr_tan = series_tan(arr)
Output
arr | arr_tan |
---|---|
[-6.5,0,8.2] | [-1.5574077246549023,0.0,1.5574077246549023] |
16.50 - series_asin()
Calculates the element-wise arcsine function of the numeric series input.
Syntax
series_asin(
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values over which the arcsine function is applied. |
Returns
Dynamic array of calculated arcsine function values. Any non-numeric element yields a null
element value.
Example
The following example creates a dynamic array, arr
, with the value [-1,0,1]
. It then extends the results with column arr_asin
, containing the results of the series_asin()
function applied to the arr
array.
print arr = dynamic([-1,0,1])
| extend arr_asin = series_asin(arr)
Output
arr | arr_asin |
---|---|
[-1,0,1] | ["-1.5707963267948966",0,“1.5707963267948966”] |
16.51 - series_ceiling()
Calculates the element-wise ceiling function of the numeric series input.
Syntax
series_ceiling(
series)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series | dynamic | ✔️ | An array of numeric values over which the ceiling function is applied. |
Returns
Dynamic array of the calculated ceiling function. Any non-numeric element yields a null
element value.
Example
print s = dynamic([-1.5,1,2.5])
| extend s_ceiling = series_ceiling(s)
Output
s | s_ceiling |
---|---|
[-1.5,1,2.5] | [-1.0,1.0,3.0] |
16.52 - series_pow()
Calculates the element-wise power of two numeric series inputs.
Syntax
series_pow(
series1,
series2)
Parameters
Name | Type | Required | Description |
---|---|---|---|
series1, series2 | dynamic | ✔️ | Arrays of numeric values. The first array, or base, is element-wise raised to the power of the second array, or power, into a dynamic array result. |
Returns
A dynamic array of calculated element-wise power operation between the two inputs. Any non-numeric element or non-existing element, such as in the case of arrays of different sizes, yields a null
element value.
Example
print x = dynamic([1, 2, 3, 4]), y=dynamic([1, 2, 3, 0.5])
| extend x_pow_y = series_pow(x, y)
Output
x | y | x_pow_y |
---|---|---|
[1,2,3,4] | [1,2,3,0.5] | [1.0,4.0,27.0,2.0] |
17 - Window functions
17.1 - next()
Returns the value of a column in a row that is at some offset following the current row in a serialized row set.
Syntax
next(
column,
[ offset,
default_value ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
column | string | ✔️ | The column from which to get the values. |
offset | int | The amount of rows to move from the current row. Default is 1. | |
default_value | scalar | The default value when there’s no value in the next row. When no default value is specified, null is used. |
Examples
Filter data based on comparison between adjacent rows
The following query returns rows that show breaks longer than a quarter of a second between calls to sensor-9
.
TransformedSensorsData
| where SensorName == 'sensor-9'
| sort by Timestamp asc
| extend timeDiffInMilliseconds = datetime_diff('millisecond', next(Timestamp, 1), Timestamp)
| where timeDiffInMilliseconds > 250
Output
Timestamp | SensorName | Value | PublisherId | MachineId | timeDiff |
---|---|---|---|---|---|
2022-04-13T00:58:53.048506Z | sensor-9 | 0.39217481975439894 | fdbd39ab-82ac-4ca0-99ed-2f83daf3f9bb | M100 | 251 |
2022-04-13T01:07:09.63713Z | sensor-9 | 0.46645392778288297 | e3ed081e-501b-4d59-8e60-8524633d9131 | M100 | 313 |
2022-04-13T01:07:10.858267Z | sensor-9 | 0.693091598493419 | 278ca033-2b5e-4f2c-b493-00319b275aea | M100 | 254 |
2022-04-13T01:07:11.203834Z | sensor-9 | 0.52415808840249778 | 4ea27181-392d-4947-b811-ad5af02a54bb | M100 | 331 |
2022-04-13T01:07:14.431908Z | sensor-9 | 0.35430645405452 | 0af415c2-59dc-4a50-89c3-9a18ae5d621f | M100 | 268 |
… | … | … | … | … | … |
Perform aggregation based on comparison between adjacent rows
The following query calculates the average time difference in milliseconds between calls to sensor-9
.
TransformedSensorsData
| where SensorName == 'sensor-9'
| sort by Timestamp asc
| extend timeDiffInMilliseconds = datetime_diff('millisecond', next(Timestamp, 1), Timestamp)
| summarize avg(timeDiffInMilliseconds)
Output
avg_timeDiffInMilliseconds |
---|
30.726900061254298 |
Extend row with data from the next row
In the following query, as part of the serialization done with the serialize operator, a new column next_session_type
is added with data from the next row.
ConferenceSessions
| where conference == 'Build 2019'
| serialize next_session_type = next(session_type)
| project time_and_duration, session_title, session_type, next_session_type
Output
time_and_duration | session_title | session_type | next_session_type |
---|---|---|---|
Mon, May 6, 8:30-10:00 am | Vision Keynote - Satya Nadella | Keynote | Expo Session |
Mon, May 6, 1:20-1:40 pm | Azure Data Explorer: Advanced Time Series analysis | Expo Session | Breakout |
Mon, May 6, 2:00-3:00 pm | Azure’s Data Platform - Powering Modern Applications and Cloud Scale Analytics at Petabyte Scale | Breakout | Expo Session |
Mon, May 6, 4:00-4:20 pm | How BASF is using Azure Data Services | Expo Session | Expo Session |
Mon, May 6, 6:50 - 7:10 pm | Azure Data Explorer: Operationalize your ML models | Expo Session | Expo Session |
… | … | … | … |
17.2 - prev()
Returns the value of a specific column in a specified row. The specified row is at a specified offset from the current row in a serialized row set.
Syntax
prev(
column,
[ offset ],
[ default_value ] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
column | string | ✔️ | The column from which to get the values. |
offset | int | The offset to go back in rows. The default is 1. | |
default_value | scalar | The default value to be used when there are no previous rows from which to take the value. The default is null . |
Examples
Filter data based on comparison between adjacent rows
The following query returns rows that show breaks longer than a quarter of a second between calls to sensor-9
.
TransformedSensorsData
| where SensorName == 'sensor-9'
| sort by Timestamp asc
| extend timeDiffInMilliseconds = datetime_diff('millisecond', Timestamp, prev(Timestamp, 1))
| where timeDiffInMilliseconds > 250
Output
Timestamp | SensorName | Value | PublisherId | MachineId | timeDiff |
---|---|---|---|---|---|
2022-04-13T00:58:53.048506Z | sensor-9 | 0.39217481975439894 | fdbd39ab-82ac-4ca0-99ed-2f83daf3f9bb | M100 | 251 |
2022-04-13T01:07:09.63713Z | sensor-9 | 0.46645392778288297 | e3ed081e-501b-4d59-8e60-8524633d9131 | M100 | 313 |
2022-04-13T01:07:10.858267Z | sensor-9 | 0.693091598493419 | 278ca033-2b5e-4f2c-b493-00319b275aea | M100 | 254 |
2022-04-13T01:07:11.203834Z | sensor-9 | 0.52415808840249778 | 4ea27181-392d-4947-b811-ad5af02a54bb | M100 | 331 |
2022-04-13T01:07:14.431908Z | sensor-9 | 0.35430645405452 | 0af415c2-59dc-4a50-89c3-9a18ae5d621f | M100 | 268 |
… | … | … | … | … | … |
Perform aggregation based on comparison between adjacent rows
The following query calculates the average time difference in milliseconds between calls to sensor-9
.
TransformedSensorsData
| where SensorName == 'sensor-9'
| sort by Timestamp asc
| extend timeDiffInMilliseconds = datetime_diff('millisecond', Timestamp, prev(Timestamp, 1))
| summarize avg(timeDiffInMilliseconds)
Output
avg_timeDiffInMilliseconds |
---|
30.726900061254298 |
Extend row with data from the previous row
In the following query, as part of the serialization done with the serialize operator, a new column previous_session_type
is added with data from the previous row. Since there was no session prior to the first session, the column is empty in the first row.
ConferenceSessions
| where conference == 'Build 2019'
| serialize previous_session_type = prev(session_type)
| project time_and_duration, session_title, session_type, previous_session_type
Output
time_and_duration | session_title | session_type | previous_session_type |
---|---|---|---|
Mon, May 6, 8:30-10:00 am | Vision Keynote - Satya Nadella | Keynote | |
Mon, May 6, 1:20-1:40 pm | Azure Data Explorer: Advanced Time Series analysis | Expo Session | Keynote |
Mon, May 6, 2:00-3:00 pm | Azure’s Data Platform - Powering Modern Applications and Cloud Scale Analytics at Petabyte Scale | Breakout | Expo Session |
Mon, May 6, 4:00-4:20 pm | How BASF is using Azure Data Services | Expo Session | Breakout |
Mon, May 6, 6:50 - 7:10 pm | Azure Data Explorer: Operationalize your ML models | Expo Session | Expo Session |
… | … | … | … |
17.3 - row_cumsum()
Calculates the cumulative sum of a column in a serialized row set.
Syntax
row_cumsum(
term [,
restart] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
term | int, long, or real | ✔️ | The expression indicating the value to be summed. |
restart | bool | Indicates when the accumulation operation should be restarted, or set back to 0. It can be used to indicate partitions in the data. |
Returns
The function returns the cumulative sum of its argument.
Examples
The following example shows how to calculate the cumulative sum of the first few even integers.
datatable (a:long) [
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
]
| where a%2==0
| serialize cs=row_cumsum(a)
a | cs |
---|---|
2 | 2 |
4 | 6 |
6 | 12 |
8 | 20 |
10 | 30 |
This example shows how to calculate the cumulative sum (here, of salary
)
when the data is partitioned (here, by name
):
datatable (name:string, month:int, salary:long)
[
"Alice", 1, 1000,
"Bob", 1, 1000,
"Alice", 2, 2000,
"Bob", 2, 1950,
"Alice", 3, 1400,
"Bob", 3, 1450,
]
| order by name asc, month asc
| extend total=row_cumsum(salary, name != prev(name))
name | month | salary | total |
---|---|---|---|
Alice | 1 | 1000 | 1000 |
Alice | 2 | 2000 | 3000 |
Alice | 3 | 1400 | 4400 |
Bob | 1 | 1000 | 1000 |
Bob | 2 | 1950 | 2950 |
Bob | 3 | 1450 | 4400 |
17.4 - row_number()
Returns the current row’s index in a serialized row set.
The row index starts by default at 1
for the first row, and is incremented by 1
for each additional row.
Optionally, the row index can start at a different value than 1
.
Additionally, the row index may be reset according to some provided predicate.
Syntax
row_number(
[StartingIndex [,
Restart]] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
StartingIndex | long | The value of the row index to start at or restart to. The default value is 1. | |
restart | bool | Indicates when the numbering is to be restarted to the StartingIndex value. The default is false . |
Returns
The function returns the row index of the current row as a value of type long
.
Examples
The following example returns a table with two columns, the first column (a
)
with numbers from 10
down to 1
, and the second column (rn
) with numbers
from 1
up to 10
:
range a from 1 to 10 step 1
| sort by a desc
| extend rn=row_number()
The following example is similar to the above, only the second column (rn
)
starts at 7
:
range a from 1 to 10 step 1
| sort by a desc
| extend rn=row_number(7)
The last example shows how one can partition the data and number the rows
per each partition. Here, we partition the data by Airport
:
datatable (Airport:string, Airline:string, Departures:long)
[
"TLV", "LH", 1,
"TLV", "LY", 100,
"SEA", "LH", 1,
"SEA", "BA", 2,
"SEA", "LY", 0
]
| sort by Airport asc, Departures desc
| extend Rank=row_number(1, prev(Airport) != Airport)
Running this query produces the following result:
Airport | Airline | Departures | Rank |
---|---|---|---|
SEA | BA | 2 | 1 |
SEA | LH | 1 | 2 |
SEA | LY | 0 | 3 |
TLV | LY | 100 | 1 |
TLV | LH | 1 | 2 |
17.5 - row_rank_dense()
Returns the current row’s dense rank in a serialized row set.
The row rank starts by default at 1
for the first row, and is incremented by 1
whenever the provided Term is different than the previous row’s Term.
Syntax
row_rank_dense
(
Term )
Parameters
Name | Type | Required | Description |
---|---|---|---|
Term | string | ✔️ | An expression indicating the value to consider for the rank. The rank is increased whenever the Term changes. |
restart | bool | Indicates when the numbering is to be restarted to the StartingIndex value. The default is false . |
Returns
Returns the row rank of the current row as a value of type long
.
Example
The following query shows how to rank the Airline
by the number of departures from the SEA Airport
using dense rank.
datatable (Airport:string, Airline:string, Departures:long)
[
"SEA", "LH", 3,
"SEA", "LY", 100,
"SEA", "UA", 3,
"SEA", "BA", 2,
"SEA", "EL", 3
]
| sort by Departures asc
| extend Rank=row_rank_dense(Departures)
Output
Airport | Airline | Departures | Rank |
---|---|---|---|
SEA | BA | 2 | 1 |
SEA | LH | 3 | 2 |
SEA | UA | 3 | 2 |
SEA | EL | 3 | 2 |
SEA | LY | 100 | 3 |
The following example shows how to rank the Airline
by the number of departures per each partition. Here, we partition the data by Airport
:
datatable (Airport:string, Airline:string, Departures:long)
[
"SEA", "LH", 3,
"SEA", "LY", 100,
"SEA", "UA", 3,
"SEA", "BA", 2,
"SEA", "EL", 3,
"AMS", "EL", 1,
"AMS", "BA", 1
]
| sort by Airport desc, Departures asc
| extend Rank=row_rank_dense(Departures, prev(Airport) != Airport)
Output
Airport | Airline | Departures | Rank |
---|---|---|---|
SEA | BA | 2 | 1 |
SEA | LH | 3 | 2 |
SEA | UA | 3 | 2 |
SEA | EL | 3 | 2 |
SEA | LY | 100 | 3 |
AMS | EL | 1 | 1 |
AMS | BA | 1 | 1 |
17.6 - row_rank_min()
Returns the current row’s minimal rank in a serialized row set.
The rank is the minimal row number that the current row’s Term appears in.
Syntax
row_rank_min
(
Term )
Parameters
Name | Type | Required | Description |
---|---|---|---|
Term | string | ✔️ | An expression indicating the value to consider for the rank. The rank is the minimal row number for Term. |
restart | bool | Indicates when the numbering is to be restarted to the StartingIndex value. The default is false . |
Returns
Returns the row rank of the current row as a value of type long
.
Example
The following query shows how to rank the Airline
by the number of departures from the SEA Airport
.
datatable (Airport:string, Airline:string, Departures:long)
[
"SEA", "LH", 3,
"SEA", "LY", 100,
"SEA", "UA", 3,
"SEA", "BA", 2,
"SEA", "EL", 3
]
| sort by Departures asc
| extend Rank=row_rank_min(Departures)
Output
Airport | Airline | Departures | Rank |
---|---|---|---|
SEA | BA | 2 | 1 |
SEA | LH | 3 | 2 |
SEA | UA | 3 | 2 |
SEA | EL | 3 | 2 |
SEA | LY | 100 | 5 |
17.7 - row_window_session()
Calculates session start values of a column in a serialized row set.
Syntax
row_window_session
(
Expr ,
MaxDistanceFromFirst ,
MaxDistanceBetweenNeighbors [,
Restart] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
Expr | datetime | ✔️ | An expression whose values are grouped together in sessions. When Expr results in a null value, the next value starts a new session. |
MaxDistanceFromFirst | timespan | ✔️ | Determines when a new session starts using the maximum distance between the current Expr value and its value at the beginning of the session. |
MaxDistanceBetweenNeighbors | timespan | ✔️ | Another criterion for starting a new session using the maximum distance from one value of Expr to the next. |
Restart | boolean | If specified, every value that evaluates to true immediately restarts the session. |
Returns
The function returns the values at the beginning of each session. It uses the following conceptual calculation model:
Iterates over the input sequence of Expr values in order.
For each value, it decides whether to create a new session.
If a new session is created, the function returns the current value of Expr. Otherwise, it returns the previous value of Expr.
MaxDistanceFromFirst. plus MaxDistanceBetweenNeighbors.
Examples
The following example calculates session start values for a table, datatable
, with a sequence ID column and a Timestamp column to record the time of each record. The data is sorted by the sequence IDs and timestamps and then the example returns values for ID, Timestamp, and a new SessionStarted column. A session can’t exceed one hour. It continues for as long as records are less than five minutes apart and the ID stays the same. The example includes records that are less than five minutes apart.
datatable (ID:string, Timestamp:datetime) [
"1", datetime(2024-04-11 10:00:00),
"2", datetime(2024-04-11 10:18:00),
"1", datetime(2024-04-11 11:00:00),
"3", datetime(2024-04-11 11:30:00),
"2", datetime(2024-04-11 13:30:00),
"2", datetime(2024-04-11 10:16:00)
]
| sort by ID asc, Timestamp asc
| extend SessionStarted = row_window_session(Timestamp, 1h, 5m, ID != prev(ID))
Output
ID | Timestamp | SessionStarted |
---|---|---|
1 | 2024-04-11T10:00:00Z | 2024-04-11T10:00:00Z |
1 | 2024-04-11T11:00:00Z | 2024-04-11T11:00:00Z |
2 | 2024-04-11T10:16:00Z | 2024-04-11T10:16:00Z |
2 | 2024-04-11T10:18:00Z | 2024-04-11T10:16:00Z |
2 | 2024-04-11T13:30:00Z | 2024-04-11T13:30:00Z |
3 | 2024-04-11T11:30:00Z | 2024-04-11T11:30:00Z |
Related content
17.8 - Window functions
Window functions operate on multiple rows (records) in a row set at a time. Unlike aggregation functions, window functions require that the rows in the row set be serialized (have a specific order to them). Window functions may depend on the order to determine the result.
Window functions can only be used on serialized sets. The easiest way to serialize a row set is to use the serialize operator. This operator “freezes” the order of rows in an arbitrary manner. If the order of serialized rows is semantically important, use the sort operator to force a particular order.
The serialization process has a non-trivial cost associated with it. For example, it might prevent query parallelism in many scenarios. Therefore, don’t apply serialization unnecessarily. If necessary, rearrange the query to perform serialization on the smallest row set possible.
Serialized row set
An arbitrary row set (such as a table, or the output of a tabular operator) can be serialized in one of the following ways:
- By sorting the row set. See below for a list of operators that emit sorted row sets.
- By using the serialize operator.
Many tabular operators serialize output whenever the input is already serialized, even if the operator doesn’t itself guarantee that the result is serialized. For example, this property is guaranteed for the extend operator, the project operator, and the where operator.
Operators that emit serialized row sets by sorting
Operators that preserve the serialized row set property
18 - Add a comment in KQL
Indicates user-provided text. Comments can be inserted on a separate line, nested at the end, or within a KQL query or command. The comment text isn’t evaluated.
Syntax
//
comment
Remarks
Use the two slashes (//) to add comments. The following table lists the keyboard shortcuts that you can use to comment or uncomment text.
Hot Key | Description |
---|---|
Ctrl +K +C | Comment current line or selected lines. |
Ctrl +K +U | Uncomment current line or selected lines. |
Example
This example returns a count of events in the New York state:
// Return the count of events in the New York state from the StormEvents table
StormEvents
| where State == "NEW YORK" // Filter the records where the State is "NEW YORK"
| count
19 - Debug Kusto Query Language inline Python using Visual Studio Code
You can embed Python code in Kusto Query Language queries using the python() plugin. The plugin runtime is hosted in a sandbox, an isolated and secure Python environment. The python() plugin capability extends Kusto Query Language native functionalities with the huge archive of OSS Python packages. This extension enables you to run advanced algorithms, such as machine learning, artificial intelligence, statistical, and time series as part of the query.
Prerequisites
An Azure subscription. Create a free Azure account.
An Azure Data Explorer cluster and database. Create a cluster and database.
Install Python Anaconda Distribution. In Advanced Options, select Add Anaconda to my PATH environment variable.
Install Visual Studio Code.
Enable the Python plugin. For more information, see Manage language extensions in your Azure Data Explorer cluster.
A database. Create a KQL database.
Install Python Anaconda Distribution. In Advanced Options, select Add Anaconda to my PATH environment variable.
Install Visual Studio Code.
Enable Python debugging in Visual Studio Code
In your client application, prefix a query containing inline Python with
set query_python_debug;
Run the query.
- Kusto Explorer: Visual Studio Code is automatically launched with the debug_python.py script.
- Kusto Web UI:
- Download and save debug_python.py, df.txt, and kargs.txt. In window, select Allow. Save files in selected directory.
- Right-click debug_python.py and open with Visual Studio Code. The debug_python.py script contains the inline Python code, from the KQL query, prefixed by the template code to initialize the input dataframe from df.txt and the dictionary of parameters from kargs.txt.
In Visual Studio Code, launch the Visual Studio Code debugger: Run > Start Debugging (F5), select Python configuration. The debugger launches and automatically sets a breakpoint to debug the inline code.
In your client application, prefix a query containing inline Python with
set query_python_debug;
Run the query.
- Kusto Explorer: Visual Studio Code is automatically launched with the debug_python.py script.
- KQL queryset:
- Download and save debug_python.py, df.txt, and kargs.txt. In window, select Allow. Save files in selected directory.
- Right-click debug_python.py and open with Visual Studio Code. The debug_python.py script contains the inline Python code, from the KQL query, prefixed by the template code to initialize the input dataframe from df.txt and the dictionary of parameters from kargs.txt.
In Visual Studio Code, launch the Visual Studio Code debugger: Run > Start Debugging (F5), select Python configuration. The debugger launches and automatically sets a breakpoint to debug the inline code.
How does inline Python debugging in Visual Studio Code work?
- The query is parsed and executed in the server until the required
| evaluate python()
clause is reached. - The Python sandbox is invoked but instead of running the code, it serializes the input table, the dictionary of parameters, and the code, and sends them back to the client.
- These three objects are saved in three files: df.txt, kargs.txt, and debug_python.py in the selected directory (Web UI) or in the client %TEMP% directory (Kusto Explorer).
- Visual Studio Code is launched, preloaded with the debug_python.py file that contains a prefix code to initialize df and kargs from their respective files, followed by the Python script embedded in the KQL query.
Query example
Run the following KQL query in your client application:
range x from 1 to 4 step 1 | evaluate python(typeof(*, x4:int), 'exp = kargs["exp"]\n' 'result = df\n' 'result["x4"] = df["x"].pow(exp)\n' , bag_pack('exp', 4))
See the resulting table:
x x4 1 1 2 16 3 81 4 256 Run the same KQL query in your client application using
set query_python_debug;
:set query_python_debug; range x from 1 to 4 step 1 | evaluate python(typeof(*, x4:int), 'exp = kargs["exp"]\n' 'result = df\n' 'result["x4"] = df["x"].pow(exp)\n' , bag_pack('exp', 4))
Visual Studio Code is launched:
Visual Studio Code debugs and prints ‘result’ dataframe in the debug console:
20 - Set timeouts
It’s possible to customize the timeout length for your queries and management commands. In this article, you’ll learn how to set a custom timeout in various tools such as the Azure Data Explorer web UI, Kusto.Explorer, Kusto.Cli, Power BI, and when using an SDK. Certain tools have their own default timeout values, but it may be helpful to adjust these values based on the complexity and expected runtime of your queries.
Azure Data Explorer web UI
This section describes how to configure a custom query timeout and admin command timeout in the Azure Data Explorer web UI.
Prerequisites
- A Microsoft account or a Microsoft Entra user identity. An Azure subscription isn’t required.
- An Azure Data Explorer cluster and database. Create a cluster and database.
Set timeout length
Sign in to the Azure Data Explorer web UI with your Microsoft account or Microsoft Entra user identity credentials.
In the top menu, select the Settings icon.
From the left menu, select Connection.
Under the Query timeout (in minutes) setting, use the slider to choose the desired query timeout length.
Under the Admin command timeout (in minutes) setting, use the slider to choose the desired admin command timeout length.
Close the settings window, and the changes will be saved automatically.
Kusto.Explorer
This section describes how to configure a custom query timeout and admin command timeout in the Kusto.Explorer.
Prerequisites
- Download and install the Kusto.Explorer tool.
- An Azure Data Explorer cluster and database. Create a cluster and database.
Set timeout length
Open the Kusto.Explorer tool.
In the top menu, select the Tools tab.
On the right-hand side, select Options.
In the left menu, select Connections.
In the Query Server Timeout setting, enter the desired timeout length. The maximum is 1 hour.
Under the Admin Command Server Timeout setting, enter the desired timeout length. The maximum is 1 hour.
Select OK to save the changes.
Kusto.Cli
This section describes how to configure a custom server timeout in the Kusto.Cli.
Prerequisites
- Install the Kusto.Cli by downloading the package Microsoft.Azure.Kusto.Tools.
Set timeout length
Run the following command to set the servertimeout client request property with the desired timeout length as a valid timespan value up to 1 hour.
Kusto.Cli.exe <ConnectionString> -execute:"#crp servertimeout=<timespan>" -execute:"…"
Alternatively, use the following command to set the norequesttimeout client request property, which will set the timeout to the maximum value of 1 hour.
Kusto.Cli.exe <ConnectionString> -execute:"#crp norequesttimeout=true" -execute:"…"
Once set, the client request property applies to all future values until the app is restarted or another value gets set. To retrieve the current value, use:
Kusto.Cli.exe <ConnectionString> -execute:"#crp servertimeout"
Power BI
This section describes how to configure a custom server timeout in Power BI.
Prerequisites
Set timeout length
Connect to your Azure Data Explorer cluster from Power BI desktop.
In the top menu, select Transform Data.
In the top menu, select Advanced Query Editor.
In the pop-up window, set the timeout option in the fourth parameter of the
AzureDataExplorer.Contents
method. The following example shows how to set a timeout length of 59 minutes.let Source = AzureDataExplorer.Contents(<cluster>, <database>, <table>, [Timeout=#duration(0,0,59,0)]) in Source
Select Done to apply the changes.
SDKs
To learn how to set timeouts with the SDKs, see Customize query behavior with client request properties.
Related content
21 - Syntax conventions for reference documentation
This article outlines the syntax conventions followed in the Kusto Query Language (KQL) and management commands reference documentation.
A good place to start learning Kusto Query Language is to understand the overall query structure. The first thing you notice when looking at a Kusto query is the use of the pipe symbol (|
). The structure of a Kusto query starts with getting your data from a data source and then passing the data across a pipeline, and each step provides some level of processing and then passes the data to the next step. At the end of the pipeline, you get your final result. In effect, this is our pipeline:
Get Data | Filter | Summarize | Sort | Select
This concept of passing data down the pipeline makes for an intuitive structure, as it’s easy to create a mental picture of your data at each step.
To illustrate this, let’s take a look at the following query, which looks at Microsoft Entra sign-in logs. As you read through each line, you can see the keywords that indicate what’s happening to the data. We’ve included the relevant stage in the pipeline as a comment in each line.
SigninLogs // Get data
| evaluate bag_unpack(LocationDetails) // Ignore this line for now; we'll come back to it at the end.
| where RiskLevelDuringSignIn == 'none' // Filter
and TimeGenerated >= ago(7d) // Filter
| summarize Count = count() by city // Summarize
| sort by Count desc // Sort
| take 5 // Select
Because the output of every step serves as the input for the following step, the order of the steps can determine the query’s results and affect its performance. It’s crucial that you order the steps according to what you want to get out of the query.
Syntax conventions
Convention | Description |
---|---|
Block | String literals to be entered exactly as shown. |
Italic | Parameters to be provided a value upon use of the function or command. |
[ ] | Denotes that the enclosed item is optional. |
( ) | Denotes that at least one of the enclosed items is required. |
| (pipe) | Used within square or round brackets to denote that you may specify one of the items separated by the pipe character. In this form, the pipe is equivalent to the logical OR operator. When in a block (` |
[, …] | Indicates that the preceding parameter can be repeated multiple times, separated by commas. |
; | Query statement terminator. |
Examples
Scalar function
This example shows the syntax and an example usage of the hash function, followed by an explanation of how each syntax component translates into the example usage.
Syntax
hash(
source [,
mod])
Example usage
hash("World")
- The name of the function,
hash
, and the opening parenthesis are entered exactly as shown. - “World” is passed as an argument for the required source parameter.
- No argument is passed for the mod parameter, which is optional as indicated by the square brackets.
- The closing parenthesis is entered exactly as shown.
Tabular operator
This example shows the syntax and an example usage of the sort operator, followed by an explanation of how each syntax component translates into the example usage.
Syntax
T | sort by
column [asc
| desc
] [nulls first
| nulls last
] [,
…]
Example usage
StormEvents
| sort by State asc, StartTime desc
- The StormEvents table is passed as an argument for the required T parameter.
| sort by
is entered exactly as shown. In this case, the pipe character is part of the tabular expression statement syntax, as represented by the block text. To learn more, see What is a query statement.- The State column is passed as an argument for the required column parameter with the optional
asc
flag. - After a comma, another set of arguments is passed: the StartTime column with the optional
desc
flag. The [,
…] syntax indicates that more argument sets may be passed but aren’t required.
Working with optional parameters
To provide an argument for an optional parameter that comes after another optional parameter, you must provide an argument for the prior parameter. This requirement is because arguments must follow the order specified in the syntax. If you don’t have a specific value to pass for the parameter, use an empty value of the same type.
Example of sequential optional parameters
Consider the syntax for the http_request plugin:
evaluate
http_request
(
Uri [,
RequestHeaders [,
Options]] )
RequestHeaders and Options are optional parameters of type dynamic. To provide an argument for the Options parameter, you must also provide an argument for the RequestHeaders parameter. The following example shows how to provide an empty value for the first optional parameter, RequestHeaders, in order to be able to specify a value for the second optional parameter, Options.
evaluate http_request ( "https://contoso.com/", dynamic({}), dynamic({ EmployeeName: Nicole }) )
Related content
22 - T-SQL
The query editor supports the use of T-SQL in addition to its primary query language, Kusto query language (KQL). While KQL is the recommended query language, T-SQL can be useful for tools that are unable to use KQL.
Query with T-SQL
To run a T-SQL query, begin the query with an empty T-SQL comment line: --
. The --
syntax tells the query editor to interpret the following query as T-SQL and not KQL.
Example
--
SELECT * FROM StormEvents
T-SQL to Kusto Query Language
The query editor supports the ability to translate T-SQL queries into KQL. This translation feature can be helpful for users who are familiar with SQL and want to learn more about KQL.
To get the equivalent KQL for a T-SQL SELECT
statement, add the keyword explain
before the query. The output will be the KQL version of the query, which can be useful for understanding the corresponding KQL syntax and concepts.
Remember to preface T-SQL queries with a T-SQL comment line, --
, to tell the query editor to interpret the following query as T-SQL and not KQL.
Example
--
explain
SELECT top(10) *
FROM StormEvents
ORDER BY DamageProperty DESC
Output
StormEvents
| project
StartTime,
EndTime,
EpisodeId,
EventId,
State,
EventType,
InjuriesDirect,
InjuriesIndirect,
DeathsDirect,
DeathsIndirect,
DamageProperty,
DamageCrops,
Source,
BeginLocation,
EndLocation,
BeginLat,
BeginLon,
EndLat,
EndLon,
EpisodeNarrative,
EventNarrative,
StormSummary
| sort by DamageProperty desc nulls first
| take int(10)
Run stored functions
When using T-SQL, we recommend that you create optimized KQL queries and encapsulate them in stored functions, as doing so minimizes T-SQL code and may increase performance. For example, if you have a stored function as described in the following table, you can execute it as shown in the code example.
Name | Parameters | Body | Folder | DocString |
---|---|---|---|---|
MyFunction | (myLimit: long) | {StormEvents | take myLimit} | MyFolder | Demo function with parameter |
SELECT * FROM kusto.MyFunction(10)
Set request properties
Request properties control how a query executes and returns results. To set request properties with T-SQL, preface your query with one or more statements with the following syntax:
Syntax
DECLARE
@__kql_set_
requestPropertyName type =
value;
Parameters
Name | Type | Required | Description |
---|---|---|---|
requestPropertyName | string | ✔️ | The name of the request property to set. |
type | string | ✔️ | The T-SQL data type of the value. |
value | scalar | ✔️ | The value to assign to the request property. |
Examples
The following table shows examples for how to set request properties with T-SQL.
Request property | Example |
---|---|
query_datetimescope_to | DECLARE @__kql_set_query_datetimescope_to DATETIME = ‘2023-03-31 03:02:01’; |
request_app_name | DECLARE @__kql_set_request_app_name NVARCHAR = ‘kuku’; |
query_results_cache_max_age | DECLARE @__kql_set_query_results_cache_max_age TIME = ‘00:05:00’; |
truncationmaxsize | DECLARE @__kql_set_truncationmaxsize BIGINT = 4294967297; |
maxoutputcolumns | DECLARE @__kql_set_maxoutputcolumns INT = 3001; |
notruncation | DECLARE @__kql_set_notruncation BIT = 1; |
norequesttimeout | DECLARE @__kql_set_norequesttimeout BIT = 0; |
To set request properties with KQL, see set statement.
Coverage
The query environment offers limited support for T-SQL. The following table outlines the T-SQL statements and features that aren’t supported or are partially supported.
T-SQL statement or feature | Description |
---|---|
CREATE , INSERT , DROP , and ALTER | Not supported |
Schema or data modifications | Not supported |
ANY , ALL , and EXISTS | Not supported |
WITHIN GROUP | Not supported |
TOP PERCENT | Not supported |
TOP WITH TIES | Evaluated as regular TOP |
TRUNCATE | Returns the nearest value |
SELECT * | Column order may differ from expectation. Use column names if order matters. |
AT TIME ZONE | Not supported |
SQL cursors | Not supported |
Correlated subqueries | Not supported |
Recursive CTEs | Not supported |
Dynamic statements | Not supported |
Flow control statements | Only IF THEN ELSE statements with an identical schema for THEN and ELSE are supported. |
Duplicate column names | Not supported. The original name is preserved for one column. |
Data types | Data returned may differ in type from SQL Server. For example, TINYINT and SMALLINT have no equivalent in Kusto, and may return as INT32 or INT64 instead of BYTE or INT16 . |
Related content
- Learn about SQL Server emulation in Azure Data Explorer
- Use the SQL to Kusto Query Language cheat sheet