This is the multi-page printable view of this section. Click here to print.
Graph operators
- 1: Best practices for Kusto Query Language (KQL) graph semantics
- 2: Graph operators
- 3: graph-mark-components operator (Preview)
- 4: graph-match operator
- 5: graph-shortest-paths Operator (Preview)
- 6: graph-to-table operator
- 7: Kusto Query Language (KQL) graph semantics overview
- 8: make-graph operator
- 9: Scenarios for using Kusto Query Language (KQL) graph semantics
1 - Best practices for Kusto Query Language (KQL) graph semantics
Best practices for Kusto Query Language (KQL) graph semantics
This article explains how to use the graph semantics feature in KQL effectively and efficiently for different use cases and scenarios. It shows how to create and query graphs with the syntax and operators, and how to integrate them with other KQL features and functions. It also helps users avoid common pitfalls or errors. For instance, creating graphs that exceed memory or performance limits, or applying unsuitable or incompatible filters, projections, or aggregations.
Size of graph
The make-graph operator creates an in-memory representation of a graph. It consists of the graph structure itself and its properties. When making a graph, use appropriate filters, projections, and aggregations to select only the relevant nodes and edges and their properties.
The following example shows how to reduce the number of nodes and edges and their properties. In this scenario, Bob changed manager from Alice to Eve and the user only wants to see the latest state of the graph for their organization. To reduce the size of the graph, the nodes are first filtered by the organization property and then the property is removed from the graph using the project-away operator. The same happens for edges. Then summarize operator together with arg_max is used to get the last known state of the graph.
let allEmployees = datatable(organization: string, name:string, age:long)
[
"R&D", "Alice", 32,
"R&D","Bob", 31,
"R&D","Eve", 27,
"R&D","Mallory", 29,
"Marketing", "Alex", 35
];
let allReports = datatable(employee:string, manager:string, modificationDate: datetime)
[
"Bob", "Alice", datetime(2022-05-23),
"Bob", "Eve", datetime(2023-01-01),
"Eve", "Mallory", datetime(2022-05-23),
"Alice", "Dave", datetime(2022-05-23)
];
let filteredEmployees =
allEmployees
| where organization == "R&D"
| project-away age, organization;
let filteredReports =
allReports
| summarize arg_max(modificationDate, *) by employee
| project-away modificationDate;
filteredReports
| make-graph employee --> manager with filteredEmployees on name
| graph-match (employee)-[hasManager*2..5]-(manager)
where employee.name == "Bob"
project employee = employee.name, topManager = manager.name
Output
employee | topManager |
---|---|
Bob | Mallory |
Last known state of the graph
The Size of graph example demonstrated how to get the last known state of the edges of a graph by using summarize
operator and the arg_max
aggregation function. Obtaining the last known state is a compute-intensive operation.
Consider creating a materialized view to improve the query performance, as follows:
Create tables that have some notion of version as part of their model. We recommend using a
datetime
column that you can later use to create a graph time series..create table employees (organization: string, name:string, stateOfEmployment:string, properties:dynamic, modificationDate:datetime) .create table reportsTo (employee:string, manager:string, modificationDate: datetime)
Create a materialized view for each table and use the arg_max aggregation function to determine the last known state of employees and the reportsTo relation.
.create materialized-view employees_MV on table employees { employees | summarize arg_max(modificationDate, *) by name } .create materialized-view reportsTo_MV on table reportsTo { reportsTo | summarize arg_max(modificationDate, *) by employee }
Create two functions that ensure that only the materialized component of the materialized view is used and other filters and projections are applied.
.create function currentEmployees () { materialized_view('employees_MV') | where stateOfEmployment == "employed" } .create function reportsTo_lastKnownState () { materialized_view('reportsTo_MV') | project-away modificationDate }
The resulting query using materialized makes the query faster and more efficient for larger graphs. It also enables higher concurrency and lower latency queries for the latest state of the graph. The user can still query the graph history based on the employees and reportsTo tables, if needed
let filteredEmployees =
currentEmployees
| where organization == "R&D"
| project-away organization;
reportsTo_lastKnownState
| make-graph employee --> manager with filteredEmployees on name
| graph-match (employee)-[hasManager*2..5]-(manager)
where employee.name == "Bob"
project employee = employee.name, reportingPath = map(hasManager, manager)
Graph time travel
Some scenarios require you to analyze data based on the state of a graph at a specific point in time. Graph time travel uses a combination of time filters and summarizes using the arg_max aggregation function.
The following KQL statement creates a function with a parameter that defines the interesting point in time for the graph. It returns a ready-made graph.
.create function graph_time_travel (interestingPointInTime:datetime ) {
let filteredEmployees =
employees
| where modificationDate < interestingPointInTime
| summarize arg_max(modificationDate, *) by name;
let filteredReports =
reportsTo
| where modificationDate < interestingPointInTime
| summarize arg_max(modificationDate, *) by employee
| project-away modificationDate;
filteredReports
| make-graph employee --> manager with filteredEmployees on name
}
With the function in place, the user can craft a query to get the top manager of Bob based on the graph in June 2022.
graph_time_travel(datetime(2022-06-01))
| graph-match (employee)-[hasManager*2..5]-(manager)
where employee.name == "Bob"
project employee = employee.name, reportingPath = map(hasManager, manager)
Output
employee | topManager |
---|---|
Bob | Dave |
Dealing with multiple node and edge types
Sometimes it’s required to contextualize time series data with a graph that consists of multiple node types. One way of handling this scenario is creating a general-purpose property graph that is represented by a canonical model.
Occasionally, you might need to contextualize time series data with a graph that has multiple node types. You could approach the problem by creating a general-purpose property graph that is based on a canonical model, such as the following.
- nodes
- nodeId (string)
- label (string)
- properties (dynamic)
- edges
- source (string)
- destination (string)
- label (string)
- properties (dynamic)
The following example shows how to transform the data into a canonical model and how to query it. The base tables for the nodes and edges of the graph have different schemas.
This scenario involves a factory manager who wants to find out why equipment isn’t working well and who is responsible for fixing it. The manager decides to use a graph that combines the asset graph of the production floor and the maintenance staff hierarchy which changes every day.
The following graph shows the relations between assets and their time series, such as speed, temperature, and pressure. The operators and the assets, such as pump, are connected via the operates edge. The operators themselves report up to management.
The data for those entities can be stored directly in your cluster or acquired using query federation to a different service, such as Azure Cosmos DB, Azure SQL, or Azure Digital Twin. To illustrate the example, the following tabular data is created as part of the query:
let sensors = datatable(sensorId:string, tagName:string, unitOfMeasuree:string)
[
"1", "temperature", "°C",
"2", "pressure", "Pa",
"3", "speed", "m/s"
];
let timeseriesData = datatable(sensorId:string, timestamp:string, value:double, anomaly: bool )
[
"1", datetime(2023-01-23 10:00:00), 32, false,
"1", datetime(2023-01-24 10:00:00), 400, true,
"3", datetime(2023-01-24 09:00:00), 9, false
];
let employees = datatable(name:string, age:long)
[
"Alice", 32,
"Bob", 31,
"Eve", 27,
"Mallory", 29,
"Alex", 35,
"Dave", 45
];
let allReports = datatable(employee:string, manager:string)
[
"Bob", "Alice",
"Alice", "Dave",
"Eve", "Mallory",
"Alex", "Dave"
];
let operates = datatable(employee:string, machine:string, timestamp:datetime)
[
"Bob", "Pump", datetime(2023-01-23),
"Eve", "Pump", datetime(2023-01-24),
"Mallory", "Press", datetime(2023-01-24),
"Alex", "Conveyor belt", datetime(2023-01-24),
];
let assetHierarchy = datatable(source:string, destination:string)
[
"1", "Pump",
"2", "Pump",
"Pump", "Press",
"3", "Conveyor belt"
];
The employees, sensors, and other entities and relationships don’t share a canonical data model. You can use the union operator to combine and canonize the data.
The following query joins the sensor data with the time series data to find the sensors that have abnormal readings. Then, it uses a projection to create a common model for the graph nodes.
let nodes =
union
(
sensors
| join kind=leftouter
(
timeseriesData
| summarize hasAnomaly=max(anomaly) by sensorId
) on sensorId
| project nodeId = sensorId, label = "tag", properties = pack_all(true)
),
( employees | project nodeId = name, label = "employee", properties = pack_all(true));
The edges are transformed in a similar way.
let edges =
union
( assetHierarchy | extend label = "hasParent" ),
( allReports | project source = employee, destination = manager, label = "reportsTo" ),
( operates | project source = employee, destination = machine, properties = pack_all(true), label = "operates" );
With the canonized nodes and edges data, you can create a graph using the make-graph operator, as follows:
let graph = edges
| make-graph source --> destination with nodes on nodeId;
Once created, define the path pattern and project the information required. The pattern starts at a tag node followed by a variable length edge to an asset. That asset is operated by an operator that reports to a top manager via a variable length edge, called reportsTo. The constraints section of the graph-match operator, in this instance where, reduces the tags to the ones that have an anomaly and were operated on a specific day.
graph
| graph-match (tag)-[hasParent*1..5]->(asset)<-[operates]-(operator)-[reportsTo*1..5]->(topManager)
where tag.label=="tag" and tobool(tag.properties.hasAnomaly) and
startofday(todatetime(operates.properties.timestamp)) == datetime(2023-01-24)
and topManager.label=="employee"
project
tagWithAnomaly = tostring(tag.properties.tagName),
impactedAsset = asset.nodeId,
operatorName = operator.nodeId,
responsibleManager = tostring(topManager.nodeId)
Output
tagWithAnomaly | impactedAsset | operatorName | responsibleManager |
---|---|---|---|
temperature | Pump | Eve | Mallory |
The projection in graph-match outputs the information that the temperature sensor showed an anomaly on the specified day. It was operated by Eve who ultimately reports to Mallory. With this information, the factory manager can reach out to Eve and potentially Mallory to get a better understanding of the anomaly.
Related content
2 - Graph operators
Kusto Query Language (KQL) graph operators enable graph analysis of data by representing tabular data as a graph with nodes and edges. This setup lets us use graph operations to study the connections and relationships between different data points.
Graph analysis is typically comprised of the following steps:
- Prepare and preprocess the data using tabular operators
- Build a graph from the prepared tabular data using make-graph
- Perform graph analysis using graph-match
- Transform the results of the graph analysis back into tabular form using graph-to-table
- Continue the query with tabular operators
Supported graph operators
The following table describes the supported graph operators.
Operator | Description |
---|---|
make-graph | Builds a graph from tabular data. |
graph-match | Searches for patterns in a graph. |
graph-to-table | Builds nodes or edges tables from a graph. |
graph-shortest-paths | Finds the shortest paths from a given set of source nodes to a set of target nodes. |
graph-mark-components | Finds and marks all connected components. |
Graph model
A graph is modeled as a directed property graph that represents the data as a network of vertices, or nodes, connected by edges. Both nodes and edges can have properties that store more information about them, and a node in the graph must have a unique identifier. A pair of nodes can have multiple edges between them that have different properties or direction. There’s no special distinction of labels in the graph, and any property can act as a label.
Graph lifetime
A graph is a transient object. It’s built in each query that contains graph operators and ceases to exist once the query is completed. To persist a graph, it has to first be transformed back into tabular form and then stored as edges or nodes tables.
Limitations and recommendations
The graph object is built in memory on the fly for each graph query. As such, there’s a performance cost for building the graph and a limit to the size of the graph that can be built.
Although it isn’t strictly enforced, we recommend building graphs with at most 10 million elements (nodes and edges). The actual memory limit for the graph is determined by query operators memory limit.
Related content
3 - graph-mark-components operator (Preview)
The graph-mark-components
operator finds all connected components of a graph and marks each node with a component identifier.
Syntax
G |
graph-mark-components
[kind
=
Kind] [with_component_id
=
ComponentId]
Parameters
Name | Type | Required | Description |
---|---|---|---|
G | string | ✔️ | The graph source. |
Kind | string | The connected component kind, either weak (default) or strong . A weak component is a set of nodes connected by a path, ignoring the direction of edges. A strong component is a set of nodes connected in both directions, considering the edges’ directions. | |
ComponentId | string | The property name that denotes the component identifier. The default property name is ComponentId . |
Returns
The graph-mark-components
operator returns a graph result, where each node has a component identifier in the ComponentId property. The identifier is a zero-based consecutive index of the components. Each component index is chosen arbitrarily and might not be consistent across runs.
Examples
The examples in this section show how to use the syntax to help you get started.
Find families by their relationships
The following example creates a graph from a set of child-parent pairs and identifies connected components using a family
identifier.
let ChildOf = datatable(child:string, parent:string)
[
"Alice", "Bob",
"Carol", "Alice",
"Carol", "Dave",
"Greg", "Alice",
"Greg", "Dave",
"Howard", "Alice",
"Howard", "Dave",
"Eve", "Frank",
"Frank", "Mallory",
"Eve", "Kirk",
];
ChildOf
| make-graph child --> parent with_node_id=name
| graph-mark-components with_component_id = family
| graph-to-table nodes
Output
name | family |
---|---|
Alice | 0 |
Bob | 0 |
Carol | 0 |
Dave | 0 |
Greg | 0 |
Howard | 0 |
Eve | 1 |
Frank | 1 |
Mallory | 1 |
Kirk | 1 |
Find a greatest common ancestor for each family
The following example uses the connected component family
identifier and the graph-match
operator to identify the greatest ancestor of each family in a set of child-parent data.
let ChildOf = datatable(child:string, parent:string)
[
"Alice", "Bob",
"Carol", "Alice",
"Carol", "Dave",
"Greg", "Alice",
"Greg", "Dave",
"Howard", "Alice",
"Howard", "Dave",
"Eve", "Frank",
"Frank", "Mallory",
"Eve", "Kirk",
];
ChildOf
| make-graph child --> parent with_node_id=name
| graph-mark-components with_component_id = family
| graph-match (descendant)-[childOf*1..5]->(ancestor)
project name = ancestor.name, lineage = map(childOf, child), family = ancestor.family
| summarize (generations, name) = arg_max(array_length(lineage),name) by family
Output
family | generations | name |
---|---|---|
1 | 2 | Mallory |
0 | 2 | Bob |
Related content
4 - graph-match operator
The graph-match
operator searches for all occurrences of a graph pattern in an input graph source.
Syntax
G |
graph-match
[cycles
=
CyclesOption] Pattern [where
Constraints] project
[ColumnName =
] Expression [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
G | string | ✔️ | The input graph source. |
Pattern | string | ✔️ | One or more comma delimited sequences of graph node elements connected by graph edge elements using graph notations. See Graph pattern notation. |
Constraints | string | A Boolean expression composed of properties of named variables in the Pattern. Each graph element (node/edge) has a set of properties that were attached to it during the graph construction. The constraints define which elements (nodes and edges) are matched by the pattern. A property is referenced by the variable name followed by a dot (. ) and the property name. | |
Expression | string | ✔️ | The project clause converts each pattern to a row in a tabular result. The project expressions must be scalar and reference properties of named variables defined in the Pattern. A property is referenced by the variable name followed by a dot (. ) and the attribute name. |
CyclesOption | string | Controls whether cycles are matched in the Pattern, allowed values: all , none , unique_edges . If all is specified, then all cycles are matched, if none is specified cycles aren’t matched, if unique_edges (default) is specified, cycles are matched but only if the cycles don’t include the same edge more than once. |
Graph pattern notation
The following table shows the supported graph notation:
Element | Named variable | Anonymous |
---|---|---|
Node | ( n) | () |
Directed edge: left to right | -[ e]-> | --> |
Directed edge: right to left | <-[ e]- | <-- |
Any direction edge | -[ e]- | -- |
Variable length edge | -[ e*3..5]- | -[*3..5]- |
Variable length edge
A variable length edge allows a specific pattern to be repeated multiple times within defined limits. This type of edge is denoted by an asterisk (*
), followed by the minimum and maximum occurrence values in the format min..
max. Both the minimum and maximum values must be integer scalars. Any sequence of edges falling within this occurrence range can match the variable edge of the pattern, if all the edges in the sequence satisfy the constraints outlined in the where
clause.
Multiple sequences
Multiple comma delimited sequences are used to express nonlinear patterns. To describe the connection between different sequences, they have to share one or more variable name of a node. For example, to represent a star pattern with node n at the center connected to nodes a,b,c, and d, the following pattern could be used:
(
a)--(
n)--(
b)
,(
c)--(
n)--(
d)
Only single connected component patterns are supported.
Returns
The graph-match
operator returns a tabular result, where each record corresponds to a match of the pattern in the graph.
The returned columns are defined in the operator’s project
clause using properties of edges and/or nodes defined in the pattern. Properties and functions of properties of variable length edges are returned as a dynamic array, each value in the array corresponds to an occurrence of the variable length edge.
Examples
The examples in this section show how to use the syntax to help you get started.
All employees in a manager’s organization
The following example represents an organizational hierarchy. It demonstrates how a variable length edge could be used to find employees of different levels of the hierarchy in a single query. The nodes in the graph represent employees and the edges are from an employee to their manager. After we build the graph using make-graph
, we search for employees in Alice
’s organization that are younger than 30
.
let employees = datatable(name:string, age:long)
[
"Alice", 32,
"Bob", 31,
"Eve", 27,
"Joe", 29,
"Chris", 45,
"Alex", 35,
"Ben", 23,
"Richard", 39,
];
let reports = datatable(employee:string, manager:string)
[
"Bob", "Alice",
"Chris", "Alice",
"Eve", "Bob",
"Ben", "Chris",
"Joe", "Alice",
"Richard", "Bob"
];
reports
| make-graph employee --> manager with employees on name
| graph-match (alice)<-[reports*1..5]-(employee)
where alice.name == "Alice" and employee.age < 30
project employee = employee.name, age = employee.age, reportingPath = map(reports, manager)
Output
employee | age | reportingPath |
---|---|---|
Joe | 29 | [ “Alice” ] |
Eve | 27 | [ “Alice”, “Bob” ] |
Ben | 23 | [ “Alice”, “Chris” ] |
Attack path
The following example builds a graph from the Actions
and Entities
tables. The entities are people and systems, and the actions describe different relations between entities. Following the make-graph
operator that builds the graph is a call to graph-match
with a graph pattern that searches for attack paths to the "Apollo"
system.
let Entities = datatable(name:string, type:string, age:long)
[
"Alice", "Person", 23,
"Bob", "Person", 31,
"Eve", "Person", 17,
"Mallory", "Person", 29,
"Apollo", "System", 99
];
let Actions = datatable(source:string, destination:string, action_type:string)
[
"Alice", "Bob", "communicatesWith",
"Alice", "Apollo", "trusts",
"Bob", "Apollo", "hasPermission",
"Eve", "Alice", "attacks",
"Mallory", "Alice", "attacks",
"Mallory", "Bob", "attacks"
];
Actions
| make-graph source --> destination with Entities on name
| graph-match (mallory)-[attacks]->(compromised)-[hasPermission]->(apollo)
where mallory.name == "Mallory" and apollo.name == "Apollo" and attacks.action_type == "attacks" and hasPermission.action_type == "hasPermission"
project Attacker = mallory.name, Compromised = compromised.name, System = apollo.name
Output
Attacker | Compromised | System |
---|---|---|
Mallory | Bob | Apollo |
Star pattern
The following example is similar to the previous attack path example, but with an extra constraint: we want the compromised entity to also communicate with Alice. The graph-match
pattern prefix is the same as the previous example and we add another sequence with the compromised as a link between the sequences.
let Entities = datatable(name:string, type:string, age:long)
[
"Alice", "Person", 23,
"Bob", "Person", 31,
"Eve", "Person", 17,
"Mallory", "Person", 29,
"Apollo", "System", 99
];
let Actions = datatable(source:string, destination:string, action_type:string)
[
"Alice", "Bob", "communicatesWith",
"Alice", "Apollo", "trusts",
"Bob", "Apollo", "hasPermission",
"Eve", "Alice", "attacks",
"Mallory", "Alice", "attacks",
"Mallory", "Bob", "attacks"
];
Actions
| make-graph source --> destination with Entities on name
| graph-match (mallory)-[attacks]->(compromised)-[hasPermission]->(apollo), (compromised)-[communicates]-(alice)
where mallory.name == "Mallory" and apollo.name == "Apollo" and attacks.action_type == "attacks" and hasPermission.action_type == "hasPermission" and alice.name == "Alice"
project Attacker = mallory.name, Compromised = compromised.name, System = apollo.name
Output
Attacker | Compromised | System |
---|---|---|
Mallory | Bob | Apollo |
Related content
5 - graph-shortest-paths Operator (Preview)
The graph-shortest-paths
operator finds the shortest paths between a set of source nodes and a set of target nodes in a graph and returns a table with the results.
Syntax
G |
graph-shortest-paths
[output
=
OutputOption] Pattern where
Predicate project
[ColumnName =
] Expression [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
G | string | ✔️ | The graph source, typically the output from a make-graph operation. |
Pattern | string | ✔️ | A path pattern that describes the path to find. Patterns must include at least one variable length edge and can’t contain multiple sequences. |
Predicate | expression | A boolean expression that consists of properties of named variables in the pattern and constants. | |
Expression | expression | ✔️ | A scalar expression that defines the output row for each found path, using constants and references to properties of named variables in the pattern. |
OutputOption | string | Specifies the search output as any (default) or all . Output is specified as any for a single shortest path per source/target pair and all for all shortest paths of equal minimum length. |
Path pattern notation
The following table shows the supported path pattern notations.
Element | Named variable | Anonymous element |
---|---|---|
Node | ( n) | () |
Directed edge from left to right | -[ e]-> | --> |
Directed edge from right to left | <-[ e]- | <-- |
Any direction edge | -[ e]- | -- |
Variable length edge | -[ e*3..5]- | -[*3..5]- |
Variable length edge
A variable length edge allows a specific pattern to repeat multiple times within defined limits. An asterisk (*
) denotes this type of edge, followed by the minimum and maximum occurrence values in the format min..
max. These values must be integer scalars. Any sequence of edges within this range can match the variable edge of the pattern, provided all the edges in the sequence meet the where
clause constraints.
Returns
The graph-shortest-paths
operator returns a tabular result, where each record corresponds to a path found in the graph. The returned columns are defined in the operator’s project
clause using properties of nodes and edges defined in the pattern. Properties and functions of properties of variable length edges, are returned as a dynamic array. Each value in the array corresponds to an occurrence of the variable length edge.
Examples
This section provides practical examples demonstrating how to use the graph-shortest-paths
operator in different scenarios.
Find any
shortest path between two train stations
The following example demonstrates how to use the graph-shortest-paths
operator to find the shortest path between two stations in a transportation network. The query constructs a graph from the data in connections
and finds the shortest path from the "South-West"
to the "North"
station, considering paths up to five connections long. Since the default output is any
, it finds any shortest path.
let connections = datatable(from_station:string, to_station:string, line:string)
[
"Central", "North", "red",
"North", "Central", "red",
"Central", "South", "red",
"South", "Central", "red",
"South", "South-West", "red",
"South-West", "South", "red",
"South-West", "West", "red",
"West", "South-West", "red",
"Central", "East", "blue",
"East", "Central", "blue",
"Central", "West", "blue",
"West", "Central", "blue",
];
connections
| make-graph from_station --> to_station with_node_id=station
| graph-shortest-paths (start)-[connections*1..5]->(destination)
where start.station == "South-West" and destination.station == "North"
project from = start.station, path = map(connections, to_station), line = map(connections, line), to = destination.station
Output
from | path | line | to |
---|---|---|---|
South-West | [ “South”, “Central”, “North” ] | [ “red”, “red”, “red” ] | North |
Find all shortest paths between two train stations
The following example, like the previous example, finds the shortest paths in a transportation network. However, it uses output=all
, so returns all shortest paths.
let connections = datatable(from_station:string, to_station:string, line:string)
[
"Central", "North", "red",
"North", "Central", "red",
"Central", "South", "red",
"South", "Central", "red",
"South", "South-West", "red",
"South-West", "South", "red",
"South-West", "West", "red",
"West", "South-West", "red",
"Central", "East", "blue",
"East", "Central", "blue",
"Central", "West", "blue",
"West", "Central", "blue",
];
connections
| make-graph from_station --> to_station with_node_id=station
| graph-shortest-paths output=all (start)-[connections*1..5]->(destination)
where start.station == "South-West" and destination.station == "North"
project from = start.station, path = map(connections, to_station), line = map(connections, line), to = destination.station
Output
from | path | line | to |
---|---|---|---|
South-West | [ “South”, “Central”, “North” ] | [ “red”, “red”, “red” ] | North |
South-West | [ “West”, “Central”, “North” ] | [ “red”, “blue”, “red” ] | North |
Related content
6 - graph-to-table operator
The graph-to-table
operator exports nodes or edges from a graph to tables.
Syntax
Nodes
G |
graph-to-table
nodes
[ with_node_id=
ColumnName ]
Edges
G |
graph-to-table
edges
[ with_source_id=
ColumnName ] [ with_target_id=
ColumnName ] [ as
TableName ]
Nodes and edges
G |
graph-to-table
nodes
as
NodesTableName [ with_node_id=
ColumnName ],
edges
as
EdgesTableName [ with_source_id=
ColumnName ] [ with_target_id=
ColumnName ]
Parameters
Name | Type | Required | Description |
---|---|---|---|
G | string | ✔️ | The input graph source. |
NodesTableName | string | The name of the exported nodes table. | |
EdgesTableName | string | The name of the exported edges table. | |
ColumnName | string | Export the node hash ID, source node hash ID, or target node hash ID with the given column name. |
Returns
Nodes
The graph-to-table
operator returns a tabular result, in which each row corresponds to a node in the source graph. The returned columns are the node’s properties. When with_node_id
is provided, the node hash column is of long
type.
Edges
The graph-to-table
operator returns a tabular result, in which each row corresponds to an edge in the source graph. The returned columns are the node’s properties. When with_source_id
or with_target_id
are provided, the node hash column is of long
type.
Nodes and edges
The graph-to-table
operator returns two tabular results, matching the previous descriptions.
Examples
The following examples use the make-graph
operator to build a graph from edges and nodes tables. The nodes represent people and systems, and the edges are different relations between nodes. Then, each example shows a different usage of graph-to-table
.
Get edges
In this example, the graph-to-table
operator exports the edges from a graph to a table. The with_source_id
and with_target_id
parameters export the node hash for source and target nodes of each edge.
let nodes = datatable(name:string, type:string, age:long)
[
"Alice", "Person", 23,
"Bob", "Person", 31,
"Eve", "Person", 17,
"Mallory", "Person", 29,
"Trent", "System", 99
];
let edges = datatable(source:string, destination:string, edge_type:string)
[
"Alice", "Bob", "communicatesWith",
"Alice", "Trent", "trusts",
"Bob", "Trent", "hasPermission",
"Eve", "Alice", "attacks",
"Mallory", "Alice", "attacks",
"Mallory", "Bob", "attacks"
];
edges
| make-graph source --> destination with nodes on name
| graph-to-table edges with_source_id=SourceId with_target_id=TargetId
Output
SourceId | TargetId | source | destination | edge_type |
---|---|---|---|---|
-3122868243544336885 | -7133945255344544237 | Alice | Bob | communicatesWith |
-3122868243544336885 | 2533909231875758225 | Alice | Trent | trusts |
-7133945255344544237 | 2533909231875758225 | Bob | Trent | hasPermission |
4363395278938690453 | -3122868243544336885 | Eve | Alice | attacks |
3855580634910899594 | -3122868243544336885 | Mallory | Alice | attacks |
3855580634910899594 | -7133945255344544237 | Mallory | Bob | attacks |
Get nodes
In this example, the graph-to-table
operator exports the nodes from a graph to a table. The with_node_id
parameter exports the node hash.
let nodes = datatable(name:string, type:string, age:long)
[
"Alice", "Person", 23,
"Bob", "Person", 31,
"Eve", "Person", 17,
"Trent", "System", 99
];
let edges = datatable(source:string, destination:string, edge_type:string)
[
"Alice", "Bob", "communicatesWith",
"Alice", "Trent", "trusts",
"Bob", "Trent", "hasPermission",
"Eve", "Alice", "attacks",
"Mallory", "Alice", "attacks",
"Mallory", "Bob", "attacks"
];
edges
| make-graph source --> destination with nodes on name
| graph-to-table nodes with_node_id=NodeId
Output
NodeId | name | type | age |
---|---|---|---|
-3122868243544336885 | Alice | Person | 23 |
-7133945255344544237 | Bob | Person | 31 |
4363395278938690453 | Eve | Person | 17 |
2533909231875758225 | Trent | System | 99 |
3855580634910899594 | Mallory |
Get nodes and edges
In this example, the graph-to-table
operator exports the nodes and edges from a graph to a table.
let nodes = datatable(name:string, type:string, age:long)
[
"Alice", "Person", 23,
"Bob", "Person", 31,
"Eve", "Person", 17,
"Trent", "System", 99
];
let edges = datatable(source:string, destination:string, edge_type:string)
[
"Alice", "Bob", "communicatesWith",
"Alice", "Trent", "trusts",
"Bob", "Trent", "hasPermission",
"Eve", "Alice", "attacks",
"Mallory", "Alice", "attacks",
"Mallory", "Bob", "attacks"
];
edges
| make-graph source --> destination with nodes on name
| graph-to-table nodes as N with_node_id=NodeId, edges as E with_source_id=SourceId;
N;
E
Output table 1
NodeId | name | type | age |
---|---|---|---|
-3122868243544336885 | Alice | Person | 23 |
-7133945255344544237 | Bob | Person | 31 |
4363395278938690453 | Eve | Person | 17 |
2533909231875758225 | Trent | System | 99 |
3855580634910899594 | Mallory |
Output table 2
SourceId | source | destination | edge_type |
---|---|---|---|
-3122868243544336885 | Alice | Bob | communicatesWith |
-3122868243544336885 | Alice | Trent | trusts |
-7133945255344544237 | Bob | Trent | hasPermission |
4363395278938690453 | Eve | Alice | attacks |
3855580634910899594 | Mallory | Alice | attacks |
3855580634910899594 | Mallory | Bob | attacks |
Related content
7 - Kusto Query Language (KQL) graph semantics overview
Kusto Query Language (KQL) graph semantics overview
Graph semantics in Kusto Query Language (KQL) allows you to model and query data as graphs. The structure of a graph comprises nodes and edges that connect them. Both nodes and edges can have properties that describe them.
Graphs are useful for representing complex and dynamic data that involve many-to-many, hierarchical, or networked relationships, such as social networks, recommendation systems, connected assets, or knowledge graphs. For example, the following graph illustrates a social network that consists of four nodes and three edges. Each node has a property for its name, such as Bob, and each edge has a property for its type, such as reportsTo.
Graphs store data differently from relational databases, which use tables and need indexes and joins to connect related data. In graphs, each node has a direct pointer to its neighbors (adjacency), so there’s no need to index or join anything, making it easy and fast to traverse the graph. Graph queries can use the graph structure and meaning to do complex and powerful operations, such as finding paths, patterns, shortest distances, communities, or centrality measures.
You can create and query graphs using KQL graph semantics, which has a simple and intuitive syntax that works well with the existing KQL features. You can also mix graph queries with other KQL features, such as time-based, location-based, and machine-learning queries, to do more advanced and powerful data analysis. By using KQL with graph semantics, you get the speed and scale of KQL queries with the flexibility and expressiveness of graphs.
For example, you can use:
- Time-based queries to analyze the evolution of a graph over time, such as how the network structure or the node properties change
- Geospatial queries to analyze the spatial distribution or proximity of nodes and edges, such as how the location or distance affects the relationship
- Machine learning queries to apply various algorithms or models to graph data, such as clustering, classification, or anomaly detection
How does it work?
Every query of the graph semantics in Kusto requires creating a new graph representation. You use a graph operator that converts tabular expressions for edges and optionally nodes into a graph representation of the data. Once the graph is created, you can apply different operations to further enhance or examine the graph data.
The graph semantics extension uses an in-memory graph engine that works on the data in the memory of your cluster, making graph analysis interactive and fast. The memory consumption of a graph representation is affected by the number of nodes and edges and their respective properties. The graph engine uses a property graph model that supports arbitrary properties for nodes and edges. It also integrates with all the existing scalar operators of KQL, which gives users the ability to write expressive and complex graph queries that can use the full power and functionality of KQL.
Why use graph semantics in KQL?
There are several reasons to use graph semantics in KQL, such as the following examples:
KQL doesn’t support recursive joins, so you have to explicitly define the traversals you want to run (see Scenario: Friends of a friend). You can use the make-graph operator to define hops of variable length, which is useful when the relationship distance or depth isn’t fixed. For example, you can use this operator to find all the resources that are connected in a graph or all the places you can reach from a source in a transportation network.
Time-aware graphs are a unique feature of graph semantics in KQL that allow users to model graph data as a series of graph manipulation events over time. Users can examine how the graph evolves over time, such as how the graph’s network structure or the node properties change, or how the graph events or anomalies happen. For example, users can use time series queries to discover trends, patterns, or outliers in the graph data, such as how the network density, centrality, or modularity change over time
The intellisense feature of the KQL query editor assists users in writing and executing queries in the query language. It provides syntax highlighting, autocompletion, error checking, and suggestions. It also helps users with the graph semantics extension by offering graph-specific keywords, operators, functions, and examples to guide users through the graph creation and querying process.
Limits
The following are some of the main limits of the graph semantics feature in KQL:
- You can only create or query graphs that fit into the memory of one cluster node.
- Graph data isn’t persisted or distributed across cluster nodes, and is discarded after the query execution.
Therefore, When using the graph semantics feature in KQL, you should consider the memory consumption and performance implications of creating and querying large or dense graphs. Where possible, you should use filters, projections, and aggregations to reduce the graph size and complexity.
Related content
8 - make-graph operator
The make-graph
operator builds a graph structure from tabular inputs of edges and nodes.
Syntax
Edges |
make-graph
SourceNodeId -->
TargetNodeId [ with
Nodes1 on
NodeId1 [,
Nodes2 on
NodeId2 ]]
Edges |
make-graph
SourceNodeId -->
TargetNodeId [ with_node_id=
DefaultNodeId ]
Parameters
Name | Type | Required | Description |
---|---|---|---|
Edges | string | ✔️ | The tabular source containing the edges of the graph, each row represents an edge in the graph. |
SourceNodeId | string | ✔️ | The column in Edges with the source node IDs of the edges. |
TargetNodeId | string | ✔️ | The column in Edges with the target node IDs of the edges. |
Nodes | string | The tabular expressions containing the properties of the nodes in the graph. | |
NodesId | string | The columns with the node IDs in Nodes. | |
DefaultNodeId | string | The name of the column for the default node ID. |
Returns
The make-graph
operator returns a graph expression and must be followed by a graph operator. Each row in the source Edges expression becomes an edge in the graph with properties that are the column values of the row. Each row in the Nodes tabular expression becomes a node in the graph with properties that are the column values of the row. Nodes that appear in the Edges table but don’t have a corresponding row in the Nodes table are created as nodes with the corresponding node ID and empty properties.
Users can handle node information in the following ways:
- No node information required:
make-graph
completes with source and target. - Explicit node properties: use up to two tabular expressions using “
with
Nodes1on
NodeId1 [,
Nodes2on
NodeId2 ].” - Default node identifier: use “
with_node_id=
DefaultNodeId.”
Example
Edges and nodes graph
The following example builds a graph from edges and nodes tables. The nodes represent people and systems, and the edges represent different relationships between nodes. The make-graph
operator builds the graph. Then, the graph-match
operator is used with a graph pattern to search for attack paths leading to the "Trent"
system node.
let nodes = datatable(name:string, type:string, age:int)
[
"Alice", "Person", 23,
"Bob", "Person", 31,
"Eve", "Person", 17,
"Mallory", "Person", 29,
"Trent", "System", 99
];
let edges = datatable(Source:string, Destination:string, edge_type:string)
[
"Alice", "Bob", "communicatesWith",
"Alice", "Trent", "trusts",
"Bob", "Trent", "hasPermission",
"Eve", "Alice", "attacks",
"Mallory", "Alice", "attacks",
"Mallory", "Bob", "attacks"
];
edges
| make-graph Source --> Destination with nodes on name
| graph-match (mallory)-[attacks]->(compromised)-[hasPermission]->(trent)
where mallory.name == "Mallory" and trent.name == "Trent" and attacks.edge_type == "attacks" and hasPermission.edge_type == "hasPermission"
project Attacker = mallory.name, Compromised = compromised.name, System = trent.name
Output
Attacker | Compromised | System |
---|---|---|
Mallory | Bob | Trent |
Default node identifier
The following example builds a graph using only edges, with the name
property as the default node identifier. This approach is useful when creating a graph from a tabular expression of edges, ensuring that the node identifier is available for the constraints section of the subsequent graph-match
operator.
let edges = datatable(source:string, destination:string, edge_type:string)
[
"Alice", "Bob", "communicatesWith",
"Alice", "Trent", "trusts",
"Bob", "Trent", "hasPermission",
"Eve", "Alice", "attacks",
"Mallory", "Alice", "attacks",
"Mallory", "Bob", "attacks"
];
edges
| make-graph source --> destination with_node_id=name
| graph-match (mallory)-[attacks]->(compromised)-[hasPermission]->(trent)
where mallory.name == "Mallory" and trent.name == "Trent" and attacks.edge_type == "attacks" and hasPermission.edge_type == "hasPermission"
project Attacker = mallory.name, Compromised = compromised.name, System = trent.name
Output
Attacker | Compromised | System |
---|---|---|
Mallory | Bob | Trent |
Related content
9 - Scenarios for using Kusto Query Language (KQL) graph semantics
What are common scenarios for using Kusto Query Language (KQL) graph semantics?
Graph semantics in Kusto Query Language (KQL) allows you to model and query data as graphs. There are many scenarios where graphs are useful for representing complex and dynamic data that involve many-to-many, hierarchical, or networked relationships, such as social networks, recommendation systems, connected assets, or knowledge graphs.
In this article, you learn about the following common scenarios for using KQL graph semantics:
Friends of a friend
One common use case for graphs is to model and query social networks, where nodes are users and edges are friendships or interactions. For example, imagine we have a table called Users that has data about users, such as their name and organization, and a table called Knows that has data about the friendships between users as shown in the following diagram:
Without using graph semantics in KQL, you could create a graph to find friends of a friend by using multiple joins, as follows:
let Users = datatable (UserId: string, name: string, org: string)[]; // nodes
let Knows = datatable (FirstUser: string, SecondUser: string)[]; // edges
Users
| where org == "Contoso"
| join kind=inner (Knows) on $left.UserId == $right.FirstUser
| join kind=innerunique(Users) on $left.SecondUser == $right.UserId
| join kind=inner (Knows) on $left.SecondUser == $right.FirstUser
| join kind=innerunique(Users) on $left.SecondUser1 == $right.UserId
| where UserId != UserId1
| project name, name1, name2
You can use graph semantics in KQL to perform the same query in a more intuitive and efficient way. The following query uses the make-graph operator to create a directed graph from FirstUser to SecondUser and enriches the properties on the nodes with the columns provided by the Users table. Once the graph is instantiated, the graph-match operator provides the friend-of-a-friend pattern including filters and a projection that results in a tabular output.
let Users = datatable (UserId:string , name:string , org:string)[]; // nodes
let Knows = datatable (FirstUser:string , SecondUser:string)[]; // edges
Knows
| make-graph FirstUser --> SecondUser with Users on UserId
| graph-match (user)-->(middle_man)-->(friendOfAFriend)
where user.org == "Contoso" and user.UserId != friendOfAFriend.UserId
project contoso_person = user.name, middle_man = middle_man.name, kusto_friend_of_friend = friendOfAFriend.name
Insights from log data
In some use cases, you want to gain insights from a simple flat table containing time series information, such as log data. The data in each row is a string that contains raw data. To create a graph from this data, you must first identify the entities and relationships that are relevant to the graph analysis. For example, suppose you have a table called rawLogs from a web server that contains information about requests, such as the timestamp, the source IP address, the destination resource, and much more.
The following table shows an example of the raw data:
let rawLogs = datatable (rawLog: string) [
"31.56.96.51 - - [2019-01-22 03:54:16 +0330] \"GET /product/27 HTTP/1.1\" 200 5379 \"https://www.contoso.com/m/filter/b113\" \"some client\" \"-\"",
"31.56.96.51 - - [2019-01-22 03:55:17 +0330] \"GET /product/42 HTTP/1.1\" 200 5667 \"https://www.contoso.com/m/filter/b113\" \"some client\" \"-\"",
"54.36.149.41 - - [2019-01-22 03:56:14 +0330] \"GET /product/27 HTTP/1.1\" 200 30577 \"-\" \"some client\" \"-\""
];
One possible way to model a graph from this table is to treat the source IP addresses as nodes and the web requests to resources as edges. You can use the parse operator to extract the columns you need for the graph and then you can create a graph that represents the network traffic and interactions between different sources and destinations. To create the graph, you can use the make-graph operator specifying the source and destination columns as the edge endpoints, and optionally providing additional columns as edge or node properties.
The following query creates a graph from the raw logs:
let parsedLogs = rawLogs
| parse rawLog with ipAddress: string " - - [" timestamp: datetime "] \"" httpVerb: string " " resource: string " " *
| project-away rawLog;
let edges = parsedLogs;
let nodes =
union
(parsedLogs
| distinct ipAddress
| project nodeId = ipAddress, label = "IP address"),
(parsedLogs | distinct resource | project nodeId = resource, label = "resource");
let graph = edges
| make-graph ipAddress --> resource with nodes on nodeId;
This query parses the raw logs and creates a directed graph where the nodes are either IP addresses or resources and each edge is a request from the source to the destination, with the timestamp and HTTP verb as edge properties.
Once the graph is created, you can use the graph-match operator to query the graph data using patterns, filters, and projections. For example, you can create a pattern that makes a simple recommendation based on the resources that other IP addresses requested within the last five minutes, as follows:
graph
| graph-match (startIp)-[request]->(resource)<--(otherIP)-[otherRequest]->(otherResource)
where startIp.label == "IP address" and //start with an IP address
resource.nodeId != otherResource.nodeId and //recommending a different resource
startIp.nodeId != otherIP.nodeId and //only other IP addresses are interesting
(request.timestamp - otherRequest.timestamp < 5m) //filter on recommendations based on the last 5 minutes
project Recommendation=otherResource.nodeId
Output
Recommendation |
---|
/product/42 |
The query returns “/product/42” as a recommendation based on a raw text-based log.