1 - Best practices for Kusto Query Language (KQL) graph semantics

Learn about the best practices for Kusto Query Language (KQL) graph semantics.

Best practices for Kusto Query Language (KQL) graph semantics

This article explains how to use the graph semantics feature in KQL effectively and efficiently for different use cases and scenarios. It shows how to create and query graphs with the syntax and operators, and how to integrate them with other KQL features and functions. It also helps users avoid common pitfalls or errors. For instance, creating graphs that exceed memory or performance limits, or applying unsuitable or incompatible filters, projections, or aggregations.

Size of graph

The make-graph operator creates an in-memory representation of a graph. It consists of the graph structure itself and its properties. When making a graph, use appropriate filters, projections, and aggregations to select only the relevant nodes and edges and their properties.

The following example shows how to reduce the number of nodes and edges and their properties. In this scenario, Bob changed manager from Alice to Eve and the user only wants to see the latest state of the graph for their organization. To reduce the size of the graph, the nodes are first filtered by the organization property and then the property is removed from the graph using the project-away operator. The same happens for edges. Then summarize operator together with arg_max is used to get the last known state of the graph.

let allEmployees = datatable(organization: string, name:string, age:long)
[
  "R&D", "Alice", 32,
  "R&D","Bob", 31,
  "R&D","Eve", 27,
  "R&D","Mallory", 29,
  "Marketing", "Alex", 35
];
let allReports = datatable(employee:string, manager:string, modificationDate: datetime)
[
  "Bob", "Alice", datetime(2022-05-23),
  "Bob", "Eve", datetime(2023-01-01),
  "Eve", "Mallory", datetime(2022-05-23),
  "Alice", "Dave", datetime(2022-05-23)
];
let filteredEmployees =
    allEmployees
    | where organization == "R&D"
    | project-away age, organization;
let filteredReports =
    allReports
    | summarize arg_max(modificationDate, *) by employee
    | project-away modificationDate;
filteredReports
| make-graph employee --> manager with filteredEmployees on name
| graph-match (employee)-[hasManager*2..5]-(manager)
  where employee.name == "Bob"
  project employee = employee.name, topManager = manager.name

Output

employeetopManager
BobMallory

Last known state of the graph

The Size of graph example demonstrated how to get the last known state of the edges of a graph by using summarize operator and the arg_max aggregation function. Obtaining the last known state is a compute-intensive operation.

Consider creating a materialized view to improve the query performance, as follows:

  1. Create tables that have some notion of version as part of their model. We recommend using a datetime column that you can later use to create a graph time series.

    .create table employees (organization: string, name:string, stateOfEmployment:string, properties:dynamic, modificationDate:datetime)
    
    .create table reportsTo (employee:string, manager:string, modificationDate: datetime)
    
  2. Create a materialized view for each table and use the arg_max aggregation function to determine the last known state of employees and the reportsTo relation.

    .create materialized-view employees_MV on table employees
    {
        employees
        | summarize arg_max(modificationDate, *) by name
    }
    
    .create materialized-view reportsTo_MV on table reportsTo
    {
        reportsTo
        | summarize arg_max(modificationDate, *) by employee
    }
    
  3. Create two functions that ensure that only the materialized component of the materialized view is used and other filters and projections are applied.

    .create function currentEmployees () {
        materialized_view('employees_MV')
        | where stateOfEmployment == "employed"
    }
    
    .create function reportsTo_lastKnownState () {
        materialized_view('reportsTo_MV')
        | project-away modificationDate
    }
    

The resulting query using materialized makes the query faster and more efficient for larger graphs. It also enables higher concurrency and lower latency queries for the latest state of the graph. The user can still query the graph history based on the employees and reportsTo tables, if needed

let filteredEmployees =
    currentEmployees
    | where organization == "R&D"
    | project-away organization;
reportsTo_lastKnownState
| make-graph employee --> manager with filteredEmployees on name
| graph-match (employee)-[hasManager*2..5]-(manager)
  where employee.name == "Bob"
  project employee = employee.name, reportingPath = map(hasManager, manager)

Graph time travel

Some scenarios require you to analyze data based on the state of a graph at a specific point in time. Graph time travel uses a combination of time filters and summarizes using the arg_max aggregation function.

The following KQL statement creates a function with a parameter that defines the interesting point in time for the graph. It returns a ready-made graph.

.create function graph_time_travel (interestingPointInTime:datetime ) {
    let filteredEmployees =
        employees
        | where modificationDate < interestingPointInTime
        | summarize arg_max(modificationDate, *) by name;
    let filteredReports =
        reportsTo
        | where modificationDate < interestingPointInTime
        | summarize arg_max(modificationDate, *) by employee
        | project-away modificationDate;
    filteredReports
    | make-graph employee --> manager with filteredEmployees on name
}

With the function in place, the user can craft a query to get the top manager of Bob based on the graph in June 2022.

graph_time_travel(datetime(2022-06-01))
| graph-match (employee)-[hasManager*2..5]-(manager)
  where employee.name == "Bob"
  project employee = employee.name, reportingPath = map(hasManager, manager)

Output

employeetopManager
BobDave

Dealing with multiple node and edge types

Sometimes it’s required to contextualize time series data with a graph that consists of multiple node types. One way of handling this scenario is creating a general-purpose property graph that is represented by a canonical model.

Occasionally, you might need to contextualize time series data with a graph that has multiple node types. You could approach the problem by creating a general-purpose property graph that is based on a canonical model, such as the following.

  • nodes
    • nodeId (string)
    • label (string)
    • properties (dynamic)
  • edges
    • source (string)
    • destination (string)
    • label (string)
    • properties (dynamic)

The following example shows how to transform the data into a canonical model and how to query it. The base tables for the nodes and edges of the graph have different schemas.

This scenario involves a factory manager who wants to find out why equipment isn’t working well and who is responsible for fixing it. The manager decides to use a graph that combines the asset graph of the production floor and the maintenance staff hierarchy which changes every day.

The following graph shows the relations between assets and their time series, such as speed, temperature, and pressure. The operators and the assets, such as pump, are connected via the operates edge. The operators themselves report up to management.

Infographic on the property graph scenario.

The data for those entities can be stored directly in your cluster or acquired using query federation to a different service, such as Azure Cosmos DB, Azure SQL, or Azure Digital Twin. To illustrate the example, the following tabular data is created as part of the query:

let sensors = datatable(sensorId:string, tagName:string, unitOfMeasuree:string)
[
  "1", "temperature", "°C",
  "2", "pressure", "Pa",
  "3", "speed", "m/s"
];
let timeseriesData = datatable(sensorId:string, timestamp:string, value:double, anomaly: bool )
[
    "1", datetime(2023-01-23 10:00:00), 32, false,
    "1", datetime(2023-01-24 10:00:00), 400, true,
    "3", datetime(2023-01-24 09:00:00), 9, false
];
let employees = datatable(name:string, age:long)
[
  "Alice", 32,
  "Bob", 31,
  "Eve", 27,
  "Mallory", 29,
  "Alex", 35,
  "Dave", 45
];
let allReports = datatable(employee:string, manager:string)
[
  "Bob", "Alice",
  "Alice", "Dave",
  "Eve", "Mallory",
  "Alex", "Dave"
];
let operates = datatable(employee:string, machine:string, timestamp:datetime)
[
  "Bob", "Pump", datetime(2023-01-23),
  "Eve", "Pump", datetime(2023-01-24),
  "Mallory", "Press", datetime(2023-01-24),
  "Alex", "Conveyor belt", datetime(2023-01-24),
];
let assetHierarchy = datatable(source:string, destination:string)
[
  "1", "Pump",
  "2", "Pump",
  "Pump", "Press",
  "3", "Conveyor belt"
];

The employees, sensors, and other entities and relationships don’t share a canonical data model. You can use the union operator to combine and canonize the data.

The following query joins the sensor data with the time series data to find the sensors that have abnormal readings. Then, it uses a projection to create a common model for the graph nodes.

let nodes =
    union
        (
            sensors
            | join kind=leftouter
            (
                timeseriesData
                | summarize hasAnomaly=max(anomaly) by sensorId
            ) on sensorId
            | project nodeId = sensorId, label = "tag", properties = pack_all(true)
        ),
        ( employees | project nodeId = name, label = "employee", properties = pack_all(true));

The edges are transformed in a similar way.

let edges =
    union
        ( assetHierarchy | extend label = "hasParent" ),
        ( allReports | project source = employee, destination = manager, label = "reportsTo" ),
        ( operates | project source = employee, destination = machine, properties = pack_all(true), label = "operates" );

With the canonized nodes and edges data, you can create a graph using the make-graph operator, as follows:

let graph = edges
| make-graph source --> destination with nodes on nodeId;

Once created, define the path pattern and project the information required. The pattern starts at a tag node followed by a variable length edge to an asset. That asset is operated by an operator that reports to a top manager via a variable length edge, called reportsTo. The constraints section of the graph-match operator, in this instance where, reduces the tags to the ones that have an anomaly and were operated on a specific day.

graph
| graph-match (tag)-[hasParent*1..5]->(asset)<-[operates]-(operator)-[reportsTo*1..5]->(topManager)
    where tag.label=="tag" and tobool(tag.properties.hasAnomaly) and
        startofday(todatetime(operates.properties.timestamp)) == datetime(2023-01-24)
        and topManager.label=="employee"
    project
        tagWithAnomaly = tostring(tag.properties.tagName),
        impactedAsset = asset.nodeId,
        operatorName = operator.nodeId,
        responsibleManager = tostring(topManager.nodeId)

Output

tagWithAnomalyimpactedAssetoperatorNameresponsibleManager
temperaturePumpEveMallory

The projection in graph-match outputs the information that the temperature sensor showed an anomaly on the specified day. It was operated by Eve who ultimately reports to Mallory. With this information, the factory manager can reach out to Eve and potentially Mallory to get a better understanding of the anomaly.

2 - Graph operators

Learn how to use KQL graph operators.

Kusto Query Language (KQL) graph operators enable graph analysis of data by representing tabular data as a graph with nodes and edges. This setup lets us use graph operations to study the connections and relationships between different data points.

Graph analysis is typically comprised of the following steps:

  1. Prepare and preprocess the data using tabular operators
  2. Build a graph from the prepared tabular data using make-graph
  3. Perform graph analysis using graph-match
  4. Transform the results of the graph analysis back into tabular form using graph-to-table
  5. Continue the query with tabular operators

Supported graph operators

The following table describes the supported graph operators.

OperatorDescription
make-graphBuilds a graph from tabular data.
graph-matchSearches for patterns in a graph.
graph-to-tableBuilds nodes or edges tables from a graph.
graph-shortest-pathsFinds the shortest paths from a given set of source nodes to a set of target nodes.
graph-mark-componentsFinds and marks all connected components.

Graph model

A graph is modeled as a directed property graph that represents the data as a network of vertices, or nodes, connected by edges. Both nodes and edges can have properties that store more information about them, and a node in the graph must have a unique identifier. A pair of nodes can have multiple edges between them that have different properties or direction. There’s no special distinction of labels in the graph, and any property can act as a label.

Graph lifetime

A graph is a transient object. It’s built in each query that contains graph operators and ceases to exist once the query is completed. To persist a graph, it has to first be transformed back into tabular form and then stored as edges or nodes tables.

Limitations and recommendations

The graph object is built in memory on the fly for each graph query. As such, there’s a performance cost for building the graph and a limit to the size of the graph that can be built.

Although it isn’t strictly enforced, we recommend building graphs with at most 10 million elements (nodes and edges). The actual memory limit for the graph is determined by query operators memory limit.

3 - graph-mark-components operator (Preview)

Learn how to use the graph-mark-components operator to find and mark all connected components of a graph.

The graph-mark-components operator finds all connected components of a graph and marks each node with a component identifier.

Syntax

G | graph-mark-components [kind = Kind] [with_component_id = ComponentId]

Parameters

NameTypeRequiredDescription
Gstring✔️The graph source.
KindstringThe connected component kind, either weak (default) or strong. A weak component is a set of nodes connected by a path, ignoring the direction of edges. A strong component is a set of nodes connected in both directions, considering the edges’ directions.
ComponentIdstringThe property name that denotes the component identifier. The default property name is ComponentId.

Returns

The graph-mark-components operator returns a graph result, where each node has a component identifier in the ComponentId property. The identifier is a zero-based consecutive index of the components. Each component index is chosen arbitrarily and might not be consistent across runs.

Examples

The examples in this section show how to use the syntax to help you get started.

Find families by their relationships

The following example creates a graph from a set of child-parent pairs and identifies connected components using a family identifier.

let ChildOf = datatable(child:string, parent:string) 
[ 
  "Alice", "Bob",  
  "Carol", "Alice",  
  "Carol", "Dave",  
  "Greg", "Alice",  
  "Greg", "Dave",  
  "Howard", "Alice",  
  "Howard", "Dave",  
  "Eve", "Frank",  
  "Frank", "Mallory",
  "Eve", "Kirk",
]; 
ChildOf 
| make-graph child --> parent with_node_id=name
| graph-mark-components with_component_id = family
| graph-to-table nodes

Output

namefamily
Alice0
Bob0
Carol0
Dave0
Greg0
Howard0
Eve1
Frank1
Mallory1
Kirk1

Find a greatest common ancestor for each family

The following example uses the connected component family identifier and the graph-match operator to identify the greatest ancestor of each family in a set of child-parent data.

let ChildOf = datatable(child:string, parent:string) 
[ 
  "Alice", "Bob",  
  "Carol", "Alice",  
  "Carol", "Dave",  
  "Greg", "Alice",  
  "Greg", "Dave",  
  "Howard", "Alice",  
  "Howard", "Dave",  
  "Eve", "Frank",  
  "Frank", "Mallory",
  "Eve", "Kirk",
]; 
ChildOf 
| make-graph child --> parent with_node_id=name
| graph-mark-components with_component_id = family
| graph-match (descendant)-[childOf*1..5]->(ancestor)
  project name = ancestor.name, lineage = map(childOf, child), family = ancestor.family
| summarize (generations, name) = arg_max(array_length(lineage),name) by family

Output

familygenerationsname
12Mallory
02Bob

4 - graph-match operator

Learn how to use the graph-match operator to search for all occurrences of a graph pattern in a graph.

The graph-match operator searches for all occurrences of a graph pattern in an input graph source.

Syntax

G | graph-match [cycles = CyclesOption] Pattern [where Constraints] project [ColumnName =] Expression [, …]

Parameters

NameTypeRequiredDescription
Gstring✔️The input graph source.
Patternstring✔️One or more comma delimited sequences of graph node elements connected by graph edge elements using graph notations. See Graph pattern notation.
ConstraintsstringA Boolean expression composed of properties of named variables in the Pattern. Each graph element (node/edge) has a set of properties that were attached to it during the graph construction. The constraints define which elements (nodes and edges) are matched by the pattern. A property is referenced by the variable name followed by a dot (.) and the property name.
Expressionstring✔️The project clause converts each pattern to a row in a tabular result. The project expressions must be scalar and reference properties of named variables defined in the Pattern. A property is referenced by the variable name followed by a dot (.) and the attribute name.
CyclesOptionstringControls whether cycles are matched in the Pattern, allowed values: all, none, unique_edges. If all is specified, then all cycles are matched, if none is specified cycles aren’t matched, if unique_edges (default) is specified, cycles are matched but only if the cycles don’t include the same edge more than once.

Graph pattern notation

The following table shows the supported graph notation:

ElementNamed variableAnonymous
Node(n)()
Directed edge: left to right-[e]->-->
Directed edge: right to left<-[e]-<--
Any direction edge-[e]---
Variable length edge-[e*3..5]--[*3..5]-

Variable length edge

A variable length edge allows a specific pattern to be repeated multiple times within defined limits. This type of edge is denoted by an asterisk (*), followed by the minimum and maximum occurrence values in the format min..max. Both the minimum and maximum values must be integer scalars. Any sequence of edges falling within this occurrence range can match the variable edge of the pattern, if all the edges in the sequence satisfy the constraints outlined in the where clause.

Multiple sequences

Multiple comma delimited sequences are used to express nonlinear patterns. To describe the connection between different sequences, they have to share one or more variable name of a node. For example, to represent a star pattern with node n at the center connected to nodes a,b,c, and d, the following pattern could be used:

(a)--(n)--(b),(c)--(n)--(d)

Only single connected component patterns are supported.

Returns

The graph-match operator returns a tabular result, where each record corresponds to a match of the pattern in the graph.
The returned columns are defined in the operator’s project clause using properties of edges and/or nodes defined in the pattern. Properties and functions of properties of variable length edges are returned as a dynamic array, each value in the array corresponds to an occurrence of the variable length edge.

Examples

The examples in this section show how to use the syntax to help you get started.

All employees in a manager’s organization

The following example represents an organizational hierarchy. It demonstrates how a variable length edge could be used to find employees of different levels of the hierarchy in a single query. The nodes in the graph represent employees and the edges are from an employee to their manager. After we build the graph using make-graph, we search for employees in Alice’s organization that are younger than 30.

let employees = datatable(name:string, age:long) 
[ 
  "Alice", 32,  
  "Bob", 31,  
  "Eve", 27,  
  "Joe", 29,  
  "Chris", 45, 
  "Alex", 35,
  "Ben", 23,
  "Richard", 39,
]; 
let reports = datatable(employee:string, manager:string) 
[ 
  "Bob", "Alice",  
  "Chris", "Alice",  
  "Eve", "Bob",
  "Ben", "Chris",
  "Joe", "Alice", 
  "Richard", "Bob"
]; 
reports 
| make-graph employee --> manager with employees on name 
| graph-match (alice)<-[reports*1..5]-(employee)
  where alice.name == "Alice" and employee.age < 30
  project employee = employee.name, age = employee.age, reportingPath = map(reports, manager)

Output

employeeagereportingPath
Joe29[
“Alice”
]
Eve27[
“Alice”,
“Bob”
]
Ben23[
“Alice”,
“Chris”
]

Attack path

The following example builds a graph from the Actions and Entities tables. The entities are people and systems, and the actions describe different relations between entities. Following the make-graph operator that builds the graph is a call to graph-match with a graph pattern that searches for attack paths to the "Apollo" system.

let Entities = datatable(name:string, type:string, age:long) 
[ 
  "Alice", "Person", 23,  
  "Bob", "Person", 31,  
  "Eve", "Person", 17,  
  "Mallory", "Person", 29,  
  "Apollo", "System", 99 
]; 
let Actions = datatable(source:string, destination:string, action_type:string) 
[ 
  "Alice", "Bob", "communicatesWith",  
  "Alice", "Apollo", "trusts",  
  "Bob", "Apollo", "hasPermission",  
  "Eve", "Alice", "attacks",  
  "Mallory", "Alice", "attacks",  
  "Mallory", "Bob", "attacks"  
]; 
Actions 
| make-graph source --> destination with Entities on name 
| graph-match (mallory)-[attacks]->(compromised)-[hasPermission]->(apollo) 
  where mallory.name == "Mallory" and apollo.name == "Apollo" and attacks.action_type == "attacks" and hasPermission.action_type == "hasPermission" 
  project Attacker = mallory.name, Compromised = compromised.name, System = apollo.name

Output

AttackerCompromisedSystem
MalloryBobApollo

Star pattern

The following example is similar to the previous attack path example, but with an extra constraint: we want the compromised entity to also communicate with Alice. The graph-match pattern prefix is the same as the previous example and we add another sequence with the compromised as a link between the sequences.

let Entities = datatable(name:string, type:string, age:long) 
[ 
  "Alice", "Person", 23,  
  "Bob", "Person", 31,  
  "Eve", "Person", 17,  
  "Mallory", "Person", 29,  
  "Apollo", "System", 99 
]; 
let Actions = datatable(source:string, destination:string, action_type:string) 
[ 
  "Alice", "Bob", "communicatesWith",  
  "Alice", "Apollo", "trusts",  
  "Bob", "Apollo", "hasPermission",  
  "Eve", "Alice", "attacks",  
  "Mallory", "Alice", "attacks",  
  "Mallory", "Bob", "attacks"  
]; 
Actions 
| make-graph source --> destination with Entities on name 
| graph-match (mallory)-[attacks]->(compromised)-[hasPermission]->(apollo), (compromised)-[communicates]-(alice) 
  where mallory.name == "Mallory" and apollo.name == "Apollo" and attacks.action_type == "attacks" and hasPermission.action_type == "hasPermission" and alice.name == "Alice"
  project Attacker = mallory.name, Compromised = compromised.name, System = apollo.name

Output

AttackerCompromisedSystem
MalloryBobApollo

5 - graph-shortest-paths Operator (Preview)

Learn how to use the graph-shortest-paths operator to efficiently find the shortest paths from a given set of source nodes to a set of target nodes within a graph

The graph-shortest-paths operator finds the shortest paths between a set of source nodes and a set of target nodes in a graph and returns a table with the results.

Syntax

G | graph-shortest-paths [output = OutputOption] Pattern where Predicate project [ColumnName =] Expression [, …]

Parameters

NameTypeRequiredDescription
Gstring✔️The graph source, typically the output from a make-graph operation.
Patternstring✔️A path pattern that describes the path to find. Patterns must include at least one variable length edge and can’t contain multiple sequences.
PredicateexpressionA boolean expression that consists of properties of named variables in the pattern and constants.
Expressionexpression✔️A scalar expression that defines the output row for each found path, using constants and references to properties of named variables in the pattern.
OutputOptionstringSpecifies the search output as any (default) or all. Output is specified as any for a single shortest path per source/target pair and all for all shortest paths of equal minimum length.

Path pattern notation

The following table shows the supported path pattern notations.

ElementNamed variableAnonymous element
Node(n)()
Directed edge from left to right-[e]->-->
Directed edge from right to left<-[e]-<--
Any direction edge-[e]---
Variable length edge-[e*3..5]--[*3..5]-

Variable length edge

A variable length edge allows a specific pattern to repeat multiple times within defined limits. An asterisk (*) denotes this type of edge, followed by the minimum and maximum occurrence values in the format min..max. These values must be integer scalars. Any sequence of edges within this range can match the variable edge of the pattern, provided all the edges in the sequence meet the where clause constraints.

Returns

The graph-shortest-paths operator returns a tabular result, where each record corresponds to a path found in the graph. The returned columns are defined in the operator’s project clause using properties of nodes and edges defined in the pattern. Properties and functions of properties of variable length edges, are returned as a dynamic array. Each value in the array corresponds to an occurrence of the variable length edge.

Examples

This section provides practical examples demonstrating how to use the graph-shortest-paths operator in different scenarios.

Find any shortest path between two train stations

The following example demonstrates how to use the graph-shortest-paths operator to find the shortest path between two stations in a transportation network. The query constructs a graph from the data in connections and finds the shortest path from the "South-West" to the "North" station, considering paths up to five connections long. Since the default output is any, it finds any shortest path.

let connections = datatable(from_station:string, to_station:string, line:string) 
[ 
  "Central", "North", "red",
  "North", "Central", "red", 
  "Central", "South",  "red", 
  "South", "Central",  "red", 
  "South", "South-West", "red", 
  "South-West", "South", "red", 
  "South-West", "West", "red", 
  "West", "South-West", "red", 
  "Central", "East", "blue", 
  "East", "Central", "blue", 
  "Central", "West", "blue",
  "West", "Central", "blue",
]; 
connections 
| make-graph from_station --> to_station with_node_id=station
| graph-shortest-paths (start)-[connections*1..5]->(destination)
  where start.station == "South-West" and destination.station == "North"
  project from = start.station, path = map(connections, to_station), line = map(connections, line), to = destination.station

Output

frompathlineto
South-West[
“South”,
“Central”,
“North”
]
[
“red”,
“red”,
“red”
]
North

Find all shortest paths between two train stations

The following example, like the previous example, finds the shortest paths in a transportation network. However, it uses output=all, so returns all shortest paths.

let connections = datatable(from_station:string, to_station:string, line:string) 
[ 
  "Central", "North", "red",
  "North", "Central", "red", 
  "Central", "South",  "red", 
  "South", "Central",  "red", 
  "South", "South-West", "red", 
  "South-West", "South", "red", 
  "South-West", "West", "red", 
  "West", "South-West", "red", 
  "Central", "East", "blue", 
  "East", "Central", "blue", 
  "Central", "West", "blue",
  "West", "Central", "blue",
]; 
connections 
| make-graph from_station --> to_station with_node_id=station
| graph-shortest-paths output=all (start)-[connections*1..5]->(destination)
  where start.station == "South-West" and destination.station == "North"
  project from = start.station, path = map(connections, to_station), line = map(connections, line), to = destination.station

Output

frompathlineto
South-West[
“South”,
“Central”,
“North”
]
[
“red”,
“red”,
“red”
]
North
South-West[
“West”,
“Central”,
“North”
]
[
“red”,
“blue”,
“red”
]
North

6 - graph-to-table operator

Learn how to use the graph-to-table operator to export nodes or edges from a graph to tables.

The graph-to-table operator exports nodes or edges from a graph to tables.

Syntax

Nodes

G | graph-to-table nodes [ with_node_id=ColumnName ]

Edges

G | graph-to-table edges [ with_source_id=ColumnName ] [ with_target_id=ColumnName ] [ as TableName ]

Nodes and edges

G | graph-to-table nodes as NodesTableName [ with_node_id=ColumnName ], edges as EdgesTableName [ with_source_id=ColumnName ] [ with_target_id=ColumnName ]

Parameters

NameTypeRequiredDescription
Gstring✔️The input graph source.
NodesTableNamestringThe name of the exported nodes table.
EdgesTableNamestringThe name of the exported edges table.
ColumnNamestringExport the node hash ID, source node hash ID, or target node hash ID with the given column name.

Returns

Nodes

The graph-to-table operator returns a tabular result, in which each row corresponds to a node in the source graph. The returned columns are the node’s properties. When with_node_id is provided, the node hash column is of long type.

Edges

The graph-to-table operator returns a tabular result, in which each row corresponds to an edge in the source graph. The returned columns are the node’s properties. When with_source_id or with_target_id are provided, the node hash column is of long type.

Nodes and edges

The graph-to-table operator returns two tabular results, matching the previous descriptions.

Examples

The following examples use the make-graph operator to build a graph from edges and nodes tables. The nodes represent people and systems, and the edges are different relations between nodes. Then, each example shows a different usage of graph-to-table.

Get edges

In this example, the graph-to-table operator exports the edges from a graph to a table. The with_source_id and with_target_id parameters export the node hash for source and target nodes of each edge.

let nodes = datatable(name:string, type:string, age:long) 
[ 
	"Alice", "Person", 23,  
	"Bob", "Person", 31,  
	"Eve", "Person", 17,  
	"Mallory", "Person", 29,  
	"Trent", "System", 99 
]; 
let edges = datatable(source:string, destination:string, edge_type:string) 
[ 
	"Alice", "Bob", "communicatesWith",  
	"Alice", "Trent", "trusts",  
	"Bob", "Trent", "hasPermission",  
	"Eve", "Alice", "attacks",  
	"Mallory", "Alice", "attacks",  
	"Mallory", "Bob", "attacks"  
]; 
edges 
| make-graph source --> destination with nodes on name
| graph-to-table edges with_source_id=SourceId with_target_id=TargetId

Output

SourceIdTargetIdsourcedestinationedge_type
-3122868243544336885-7133945255344544237AliceBobcommunicatesWith
-31228682435443368852533909231875758225AliceTrenttrusts
-71339452553445442372533909231875758225BobTrenthasPermission
4363395278938690453-3122868243544336885EveAliceattacks
3855580634910899594-3122868243544336885MalloryAliceattacks
3855580634910899594-7133945255344544237MalloryBobattacks

Get nodes

In this example, the graph-to-table operator exports the nodes from a graph to a table. The with_node_id parameter exports the node hash.

let nodes = datatable(name:string, type:string, age:long) 
[ 
	"Alice", "Person", 23,  
	"Bob", "Person", 31,  
	"Eve", "Person", 17,
	"Trent", "System", 99
]; 
let edges = datatable(source:string, destination:string, edge_type:string) 
[ 
	"Alice", "Bob", "communicatesWith",  
	"Alice", "Trent", "trusts",  
	"Bob", "Trent", "hasPermission",  
	"Eve", "Alice", "attacks",  
	"Mallory", "Alice", "attacks",  
	"Mallory", "Bob", "attacks"
]; 
edges 
| make-graph source --> destination with nodes on name
| graph-to-table nodes with_node_id=NodeId

Output

NodeIdnametypeage
-3122868243544336885AlicePerson23
-7133945255344544237BobPerson31
4363395278938690453EvePerson17
2533909231875758225TrentSystem99
3855580634910899594Mallory

Get nodes and edges

In this example, the graph-to-table operator exports the nodes and edges from a graph to a table.

let nodes = datatable(name:string, type:string, age:long) 
[ 
	"Alice", "Person", 23,  
	"Bob", "Person", 31,  
	"Eve", "Person", 17,
	"Trent", "System", 99
]; 
let edges = datatable(source:string, destination:string, edge_type:string) 
[ 
	"Alice", "Bob", "communicatesWith",  
	"Alice", "Trent", "trusts",  
	"Bob", "Trent", "hasPermission",  
	"Eve", "Alice", "attacks",  
	"Mallory", "Alice", "attacks",  
	"Mallory", "Bob", "attacks"
]; 
edges 
| make-graph source --> destination with nodes on name
| graph-to-table nodes as N with_node_id=NodeId, edges as E with_source_id=SourceId;
N; 
E

Output table 1

NodeIdnametypeage
-3122868243544336885AlicePerson23
-7133945255344544237BobPerson31
4363395278938690453EvePerson17
2533909231875758225TrentSystem99
3855580634910899594Mallory

Output table 2

SourceIdsourcedestinationedge_type
-3122868243544336885AliceBobcommunicatesWith
-3122868243544336885AliceTrenttrusts
-7133945255344544237BobTrenthasPermission
4363395278938690453EveAliceattacks
3855580634910899594MalloryAliceattacks
3855580634910899594MalloryBobattacks

7 - Kusto Query Language (KQL) graph semantics overview

Learn about how to contextualize data in queries using KQL graph semantics

Kusto Query Language (KQL) graph semantics overview

Graph semantics in Kusto Query Language (KQL) allows you to model and query data as graphs. The structure of a graph comprises nodes and edges that connect them. Both nodes and edges can have properties that describe them.

Graphs are useful for representing complex and dynamic data that involve many-to-many, hierarchical, or networked relationships, such as social networks, recommendation systems, connected assets, or knowledge graphs. For example, the following graph illustrates a social network that consists of four nodes and three edges. Each node has a property for its name, such as Bob, and each edge has a property for its type, such as reportsTo.

Diagram that shows a social network as a graph.

Graphs store data differently from relational databases, which use tables and need indexes and joins to connect related data. In graphs, each node has a direct pointer to its neighbors (adjacency), so there’s no need to index or join anything, making it easy and fast to traverse the graph. Graph queries can use the graph structure and meaning to do complex and powerful operations, such as finding paths, patterns, shortest distances, communities, or centrality measures.

You can create and query graphs using KQL graph semantics, which has a simple and intuitive syntax that works well with the existing KQL features. You can also mix graph queries with other KQL features, such as time-based, location-based, and machine-learning queries, to do more advanced and powerful data analysis. By using KQL with graph semantics, you get the speed and scale of KQL queries with the flexibility and expressiveness of graphs.

For example, you can use:

  • Time-based queries to analyze the evolution of a graph over time, such as how the network structure or the node properties change
  • Geospatial queries to analyze the spatial distribution or proximity of nodes and edges, such as how the location or distance affects the relationship
  • Machine learning queries to apply various algorithms or models to graph data, such as clustering, classification, or anomaly detection

How does it work?

Every query of the graph semantics in Kusto requires creating a new graph representation. You use a graph operator that converts tabular expressions for edges and optionally nodes into a graph representation of the data. Once the graph is created, you can apply different operations to further enhance or examine the graph data.

The graph semantics extension uses an in-memory graph engine that works on the data in the memory of your cluster, making graph analysis interactive and fast. The memory consumption of a graph representation is affected by the number of nodes and edges and their respective properties. The graph engine uses a property graph model that supports arbitrary properties for nodes and edges. It also integrates with all the existing scalar operators of KQL, which gives users the ability to write expressive and complex graph queries that can use the full power and functionality of KQL.

Why use graph semantics in KQL?

There are several reasons to use graph semantics in KQL, such as the following examples:

  • KQL doesn’t support recursive joins, so you have to explicitly define the traversals you want to run (see Scenario: Friends of a friend). You can use the make-graph operator to define hops of variable length, which is useful when the relationship distance or depth isn’t fixed. For example, you can use this operator to find all the resources that are connected in a graph or all the places you can reach from a source in a transportation network.

  • Time-aware graphs are a unique feature of graph semantics in KQL that allow users to model graph data as a series of graph manipulation events over time. Users can examine how the graph evolves over time, such as how the graph’s network structure or the node properties change, or how the graph events or anomalies happen. For example, users can use time series queries to discover trends, patterns, or outliers in the graph data, such as how the network density, centrality, or modularity change over time

  • The intellisense feature of the KQL query editor assists users in writing and executing queries in the query language. It provides syntax highlighting, autocompletion, error checking, and suggestions. It also helps users with the graph semantics extension by offering graph-specific keywords, operators, functions, and examples to guide users through the graph creation and querying process.

Limits

The following are some of the main limits of the graph semantics feature in KQL:

  • You can only create or query graphs that fit into the memory of one cluster node.
  • Graph data isn’t persisted or distributed across cluster nodes, and is discarded after the query execution.

Therefore, When using the graph semantics feature in KQL, you should consider the memory consumption and performance implications of creating and querying large or dense graphs. Where possible, you should use filters, projections, and aggregations to reduce the graph size and complexity.

8 - make-graph operator

Learn how to use the graph-to-table operator to build a graph structure from tabular inputs of edges and nodes.

The make-graph operator builds a graph structure from tabular inputs of edges and nodes.

Syntax

Edges | make-graph SourceNodeId --> TargetNodeId [ with Nodes1 on NodeId1 [, Nodes2 on NodeId2 ]]

Edges | make-graph SourceNodeId --> TargetNodeId [ with_node_id= DefaultNodeId ]

Parameters

NameTypeRequiredDescription
Edgesstring✔️The tabular source containing the edges of the graph, each row represents an edge in the graph.
SourceNodeIdstring✔️The column in Edges with the source node IDs of the edges.
TargetNodeIdstring✔️The column in Edges with the target node IDs of the edges.
NodesstringThe tabular expressions containing the properties of the nodes in the graph.
NodesIdstringThe columns with the node IDs in Nodes.
DefaultNodeIdstringThe name of the column for the default node ID.

Returns

The make-graph operator returns a graph expression and must be followed by a graph operator. Each row in the source Edges expression becomes an edge in the graph with properties that are the column values of the row. Each row in the Nodes tabular expression becomes a node in the graph with properties that are the column values of the row. Nodes that appear in the Edges table but don’t have a corresponding row in the Nodes table are created as nodes with the corresponding node ID and empty properties.

Users can handle node information in the following ways:

  1. No node information required: make-graph completes with source and target.
  2. Explicit node properties: use up to two tabular expressions using “with Nodes1 on NodeId1 [, Nodes2 on NodeId2 ].”
  3. Default node identifier: use “with_node_id= DefaultNodeId.”

Example

Edges and nodes graph

The following example builds a graph from edges and nodes tables. The nodes represent people and systems, and the edges represent different relationships between nodes. The make-graph operator builds the graph. Then, the graph-match operator is used with a graph pattern to search for attack paths leading to the "Trent" system node.

let nodes = datatable(name:string, type:string, age:int) 
[ 
  "Alice", "Person", 23,  
  "Bob", "Person", 31,  
  "Eve", "Person", 17,  
  "Mallory", "Person", 29,  
  "Trent", "System", 99 
]; 
let edges = datatable(Source:string, Destination:string, edge_type:string) 
[ 
  "Alice", "Bob", "communicatesWith",  
  "Alice", "Trent", "trusts",  
  "Bob", "Trent", "hasPermission",  
  "Eve", "Alice", "attacks",  
  "Mallory", "Alice", "attacks",  
  "Mallory", "Bob", "attacks"  
]; 
edges 
| make-graph Source --> Destination with nodes on name 
| graph-match (mallory)-[attacks]->(compromised)-[hasPermission]->(trent) 
  where mallory.name == "Mallory" and trent.name == "Trent" and attacks.edge_type == "attacks" and hasPermission.edge_type == "hasPermission" 
  project Attacker = mallory.name, Compromised = compromised.name, System = trent.name

Output

AttackerCompromisedSystem
MalloryBobTrent

Default node identifier

The following example builds a graph using only edges, with the name property as the default node identifier. This approach is useful when creating a graph from a tabular expression of edges, ensuring that the node identifier is available for the constraints section of the subsequent graph-match operator.

let edges = datatable(source:string, destination:string, edge_type:string) 
[ 
  "Alice", "Bob", "communicatesWith",  
  "Alice", "Trent", "trusts",  
  "Bob", "Trent", "hasPermission",  
  "Eve", "Alice", "attacks",  
  "Mallory", "Alice", "attacks",  
  "Mallory", "Bob", "attacks"  
]; 
edges 
| make-graph source --> destination with_node_id=name
| graph-match (mallory)-[attacks]->(compromised)-[hasPermission]->(trent) 
  where mallory.name == "Mallory" and trent.name == "Trent" and attacks.edge_type == "attacks" and hasPermission.edge_type == "hasPermission" 
  project Attacker = mallory.name, Compromised = compromised.name, System = trent.name

Output

AttackerCompromisedSystem
MalloryBobTrent

9 - Scenarios for using Kusto Query Language (KQL) graph semantics

Learn about common scenarios for using Kusto Query Language (KQL) graph semantics.

What are common scenarios for using Kusto Query Language (KQL) graph semantics?

Graph semantics in Kusto Query Language (KQL) allows you to model and query data as graphs. There are many scenarios where graphs are useful for representing complex and dynamic data that involve many-to-many, hierarchical, or networked relationships, such as social networks, recommendation systems, connected assets, or knowledge graphs.

In this article, you learn about the following common scenarios for using KQL graph semantics:

Friends of a friend

One common use case for graphs is to model and query social networks, where nodes are users and edges are friendships or interactions. For example, imagine we have a table called Users that has data about users, such as their name and organization, and a table called Knows that has data about the friendships between users as shown in the following diagram:

Diagram that shows a graph of friends of a friend.

Without using graph semantics in KQL, you could create a graph to find friends of a friend by using multiple joins, as follows:

let Users = datatable (UserId: string, name: string, org: string)[]; // nodes
let Knows = datatable (FirstUser: string, SecondUser: string)[]; // edges
Users
| where org == "Contoso"
| join kind=inner (Knows) on $left.UserId == $right.FirstUser
| join kind=innerunique(Users) on $left.SecondUser == $right.UserId
| join kind=inner (Knows) on $left.SecondUser == $right.FirstUser
| join kind=innerunique(Users) on $left.SecondUser1 == $right.UserId
| where UserId != UserId1
| project name, name1, name2

You can use graph semantics in KQL to perform the same query in a more intuitive and efficient way. The following query uses the make-graph operator to create a directed graph from FirstUser to SecondUser and enriches the properties on the nodes with the columns provided by the Users table. Once the graph is instantiated, the graph-match operator provides the friend-of-a-friend pattern including filters and a projection that results in a tabular output.

let Users = datatable (UserId:string , name:string , org:string)[]; // nodes
let Knows = datatable (FirstUser:string , SecondUser:string)[]; // edges
Knows
| make-graph FirstUser --> SecondUser with Users on UserId
| graph-match (user)-->(middle_man)-->(friendOfAFriend)
    where user.org == "Contoso" and user.UserId != friendOfAFriend.UserId
    project contoso_person = user.name, middle_man = middle_man.name, kusto_friend_of_friend = friendOfAFriend.name

Insights from log data

In some use cases, you want to gain insights from a simple flat table containing time series information, such as log data. The data in each row is a string that contains raw data. To create a graph from this data, you must first identify the entities and relationships that are relevant to the graph analysis. For example, suppose you have a table called rawLogs from a web server that contains information about requests, such as the timestamp, the source IP address, the destination resource, and much more.

The following table shows an example of the raw data:

let rawLogs = datatable (rawLog: string) [
    "31.56.96.51 - - [2019-01-22 03:54:16 +0330] \"GET /product/27 HTTP/1.1\" 200 5379 \"https://www.contoso.com/m/filter/b113\" \"some client\" \"-\"",
    "31.56.96.51 - - [2019-01-22 03:55:17 +0330] \"GET /product/42 HTTP/1.1\" 200 5667 \"https://www.contoso.com/m/filter/b113\" \"some client\" \"-\"",
    "54.36.149.41 - - [2019-01-22 03:56:14 +0330] \"GET /product/27 HTTP/1.1\" 200 30577 \"-\" \"some client\" \"-\""
];

One possible way to model a graph from this table is to treat the source IP addresses as nodes and the web requests to resources as edges. You can use the parse operator to extract the columns you need for the graph and then you can create a graph that represents the network traffic and interactions between different sources and destinations. To create the graph, you can use the make-graph operator specifying the source and destination columns as the edge endpoints, and optionally providing additional columns as edge or node properties.

The following query creates a graph from the raw logs:

let parsedLogs = rawLogs
    | parse rawLog with ipAddress: string " - - [" timestamp: datetime "] \"" httpVerb: string " " resource: string " " *
    | project-away rawLog;
let edges = parsedLogs;
let nodes =
    union
        (parsedLogs
        | distinct ipAddress
        | project nodeId = ipAddress, label = "IP address"),
        (parsedLogs | distinct resource | project nodeId = resource, label = "resource");
let graph = edges
    | make-graph ipAddress --> resource with nodes on nodeId;

This query parses the raw logs and creates a directed graph where the nodes are either IP addresses or resources and each edge is a request from the source to the destination, with the timestamp and HTTP verb as edge properties.

Diagram that shows a graph of the parsed log data.

Once the graph is created, you can use the graph-match operator to query the graph data using patterns, filters, and projections. For example, you can create a pattern that makes a simple recommendation based on the resources that other IP addresses requested within the last five minutes, as follows:

graph
| graph-match (startIp)-[request]->(resource)<--(otherIP)-[otherRequest]->(otherResource)
    where startIp.label == "IP address" and //start with an IP address
    resource.nodeId != otherResource.nodeId and //recommending a different resource
    startIp.nodeId != otherIP.nodeId and //only other IP addresses are interesting
    (request.timestamp - otherRequest.timestamp < 5m) //filter on recommendations based on the last 5 minutes
    project Recommendation=otherResource.nodeId

Output

Recommendation
/product/42

The query returns “/product/42” as a recommendation based on a raw text-based log.