1 - join flavors

1.1 - fullouter join

Learn how to use the fullouter join flavor to merge the rows of two tables.

A fullouter join combines the effect of applying both left and right outer-joins. For columns of the table that lack a matching row, the result set contains null values. For those records that do match, a single row is produced in the result set containing fields populated from both tables.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=fullouter [ Hints ] RightTable on Conditions

Returns

Schema: All columns from both tables, including the matching keys.
Rows: All records from both tables with unmatched cells populated with null.

Example

This example query combines rows from both tables X and Y, filling in missing values with NULL where there’s no match in the other table. This allows you to see all possible combinations of keys from both tables.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=fullouter Y on Key

Output

KeyValue1Key1Value2
b3b10
b2b10
c4c20
c4c30
d40
a1

1.2 - inner join

Learn how to use the inner join flavor to merge the rows of two tables.

The inner join flavor is like the standard inner join from the SQL world. An output record is produced whenever a record on the left side has the same join key as the record on the right side.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=inner [ Hints ] RightTable on Conditions

Returns

Schema: All columns from both tables, including the matching keys.
Rows: Only matching rows from both tables.

Example

The example query combines rows from tables X and Y where the keys match, showing only the rows that exist in both tables.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'k',5,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40,
    'k',50
];
X | join kind=inner Y on Key

Output

KeyValue1Key1Value2
b3b10
b2b10
c4c20
c4c30
k5k50

1.3 - innerunique join

Learn how to use the innerunique join flavor to merge the rows of two tables.

The innerunique join flavor removes duplicate keys from the left side. This behavior ensures that the output contains a row for every combination of unique left and right keys.

By default, the innerunique join flavor is used if the kind parameter isn’t specified. This default implementation is useful in log/trace analysis scenarios, where you aim to correlate two events based on a shared correlation ID. It allows you to retrieve all instances of the phenomenon while disregarding duplicate trace records that contribute to the correlation.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=innerunique [ Hints ] RightTable on Conditions

Returns

Schema: All columns from both tables, including the matching keys.
Rows: All deduplicated rows from the left table that match rows from the right table.

Examples

Review the examples and run them in your Data Explorer query page.

Use the default innerunique join

The example query combines rows from tables X and Y where the keys match, showing only the rows that exist in both tables

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join Y on Key

Output

KeyValue1Key1Value2
b2b10
c4c20
c4c30

The query executed the default join, which is an inner join after deduplicating the left side based on the join key. The deduplication keeps only the first record. The resulting left side of the join after deduplication is:

KeyValue1
a1
b2
c4

Two possible outputs from innerunique join

let t1 = datatable(key: long, value: string)  
    [
    1, "val1.1",  
    1, "val1.2"  
];
let t2 = datatable(key: long, value: string)  
    [  
    1, "val1.3",
    1, "val1.4"  
];
t1
| join kind = innerunique
    t2
    on key

Output

keyvaluekey1value1
1val1.11val1.3
1val1.11val1.4
let t1 = datatable(key: long, value: string)  
    [
    1, "val1.1",  
    1, "val1.2"  
];
let t2 = datatable(key: long, value: string)  
    [  
    1, "val1.3", 
    1, "val1.4"  
];
t1
| join kind = innerunique
    t2
    on key

Output

keyvaluekey1value1
1val1.21val1.3
1val1.21val1.4
  • Kusto is optimized to push filters that come after the join, towards the appropriate join side, left or right, when possible.
  • Sometimes, the flavor used is innerunique and the filter is propagated to the left side of the join. The flavor is automatically propagated and the keys that apply to that filter appear in the output.
  • Use the previous example and add a filter where value == "val1.2" . It gives the second result and will never give the first result for the datasets:
let t1 = datatable(key: long, value: string)  
    [
    1, "val1.1",  
    1, "val1.2"  
];
let t2 = datatable(key: long, value: string)  
    [  
    1, "val1.3", 
    1, "val1.4"  
];
t1
| join kind = innerunique
    t2
    on key
| where value == "val1.2"

Output

keyvaluekey1value1
1val1.21val1.3
1val1.21val1.4

Get extended sign-in activities

Get extended activities from a login that some entries mark as the start and end of an activity.

let Events = MyLogTable | where type=="Event" ;
Events
| where Name == "Start"
| project Name, City, ActivityId, StartTime=timestamp
| join (Events
    | where Name == "Stop"
        | project StopTime=timestamp, ActivityId)
    on ActivityId
| project City, ActivityId, StartTime, StopTime, Duration = StopTime - StartTime
let Events = MyLogTable | where type=="Event" ;
Events
| where Name == "Start"
| project Name, City, ActivityIdLeft = ActivityId, StartTime=timestamp
| join (Events
        | where Name == "Stop"
        | project StopTime=timestamp, ActivityIdRight = ActivityId)
    on $left.ActivityIdLeft == $right.ActivityIdRight
| project City, ActivityId, StartTime, StopTime, Duration = StopTime - StartTime

1.4 - leftanti join

Learn how to use the leftanti join flavor to merge the rows of two tables.

The leftanti join flavor returns all records from the left side that don’t match any record from the right side. The anti join models the “NOT IN” query.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=leftanti [ Hints ] RightTable on Conditions

Returns

Schema: All columns from the left table.
Rows: All records from the left table that don’t match records from the right table.

Example

The example query combines rows from tables X and Y where there is no match in Y for the keys in X, effectively filtering out any rows in X that have corresponding rows in Y.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=leftanti Y on Key

Output

KeyValue1
a1

1.5 - leftouter join

Learn how to use the leftouter join flavor to merge the rows of two tables.

The leftouter join flavor returns all the records from the left side table and only matching records from the right side table.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=leftouter [ Hints ] RightTable on Conditions

Returns

Schema: All columns from both tables, including the matching keys.
Rows: All records from the left table and only matching rows from the right table.

Example

The result of a left outer join for tables X and Y always contains all records of the left table (X), even if the join condition doesn’t find any matching record in the right table (Y).

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=leftouter Y on Key

Output

KeyValue1Key1Value2
a1
b2b10
b3b10
c4c20
c4c30

1.6 - leftsemi join

Learn how to use the leftsemi join flavor to merge the rows of two tables.

The leftsemi join flavor returns all records from the left side that match a record from the right side. Only columns from the left side are returned.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=leftsemi [ Hints ] RightTable on Conditions

Returns

Schema: All columns from the left table.
Rows: All records from the left table that match records from the right table.

Example

This query filters and returns only those rows from table X that have a matching key in table Y.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=leftsemi Y on Key

Output

KeyValue1
b2
b3
c4

1.7 - rightanti join

Learn how to use the rightanti join flavor to merge the rows of two tables.

The rightanti join flavor returns all records from the right side that don’t match any record from the left side. The anti join models the “NOT IN” query.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=rightanti [ Hints ] RightTable on Conditions

Returns

Schema: All columns from the right table.
Rows: All records from the right table that don’t match records from the left table.

Example

This query filters and returns only those rows from table Y that do not have a matching key in table X.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=rightanti Y on Key

Output

KeyValue1
d40

1.8 - rightouter join

Learn how to use the rightouter join flavor to merge the rows of two tables.

The rightouter join flavor returns all the records from the right side and only matching records from the left side. This join flavor resembles the leftouter join flavor, but the treatment of the tables is reversed.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=rightouter [ Hints ] RightTable on Conditions

Returns

Schema: All columns from both tables, including the matching keys.
Rows: All records from the right table and only matching rows from the left table.

Example

This query returns all rows from table Y and any matching rows from table X, filling in NULL values where there is no match from X.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=rightouter Y on Key

Output

KeyValue1Key1Value2
b2b10
b3b10
c4c20
c4c30
d40

1.9 - rightsemi join

Learn how to use the rightsemi join flavor to merge the rows of two tables.

The rightsemi join flavor returns all records from the right side that match a record from the left side. Only columns from the right side are returned.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=rightsemi [ Hints ] RightTable on Conditions

Returns

Schema: All columns from the right table.
Rows: All records from the right table that match records from the left table.

Example

This query filters and returns only those rows from table Y that have a matching key in table X.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=rightsemi Y on Key

Output

KeyValue2
b10
c20
c30

2 - Broadcast join

Learn how to use the broadcast join execution strategy to distribute the join over nodes.

Today, regular joins are executed on a cluster single node. Broadcast join is an execution strategy of join that distributes the join over cluster nodes. This strategy is useful when the left side of the join is small (up to several tens of MBs). In this case, a broadcast join is more performant than a regular join.

Today, regular joins are executed on an Eventhouse single node. Broadcast join is an execution strategy of join that distributes the join over Eventhouse nodes. This strategy is useful when the left side of the join is small (up to several tens of MBs). In this case, a broadcast join is more performant than a regular join.

Use the lookup operator if the right side is smaller than the left side. The lookup operator runs in broadcast strategy by default when the right side is smaller than the left.

If left side of the join is a small dataset, then you may run join in broadcast mode using the following syntax (hint.strategy = broadcast):

leftSide 
| join hint.strategy = broadcast (factTable) on key

The performance improvement is more noticeable in scenarios where the join is followed by other operators such as summarize. See the following query for example:

leftSide 
| join hint.strategy = broadcast (factTable) on Key
| summarize dcount(Messages) by Timestamp, Key

3 - Cross-cluster join

Learn how to perform the Cross-cluster join operation to join datasets residing on different clusters.

A cross-cluster join involves joining data from datasets that reside in different clusters.

In a cross-cluster join, the query can be executed in three possible locations, each with a specific designation for reference throughout this document:

  • Local cluster: The cluster to which the request is sent, which is also known as the cluster hosting the database in context.
  • Left cluster: The cluster hosting the data on the left side of the join operation.
  • Right cluster: The cluster hosting the data on the right side of the join operation.

The cluster that runs the query fetches the data from the other cluster.

Syntax

[ cluster(ClusterName).database(DatabaseName).]LeftTable |
| join [ hint.remote=Strategy ] (
  [ cluster(ClusterName).database(DatabaseName).]RightTable |
) on Conditions

Parameters

NameTypeRequiredDescription
LeftTablestring✔️The left table or tabular expression whose rows are to be merged. Denoted as $left.
StrategystringDetermines the cluster on which to execute the join. Supported values are: left, right, local, and auto. For more information, see Strategies.
ClusterNamestringIf the data for the join resides outside of the local cluster, use the cluster() function to specify the cluster.
DatabaseNamestringIf the data for the join resides outside of the local database context, use the database() function to specify the database.
RightTablestring✔️The right table or tabular expression whose rows are to be merged. Denoted as $right.
Conditionsstring✔️Determines how rows from LeftTable are matched with rows from RightTable. If the columns you want to match have the same name in both tables, use the syntax ON ColumnName. Otherwise, use the syntax ON $left.LeftColumn == $right.RightColumn. To specify multiple conditions, you can either use the “and” keyword or separate them with commas. If you use commas, the conditions are evaluated using the “and” logical operator.

Strategies

The following list explains the supported values for the Strategy parameter:

  • left: Execute join on the cluster of the left table, or left cluster.
  • right: Execute join on the cluster of the right table, or right cluster.
  • local: Execute join on the cluster of the current cluster, or local cluster.
  • auto: (Default) Kusto makes the remoting decision.

How the auto strategy works

By default, the auto strategy determines where the cross-cluster join is executed based on the following rules:

  • If one of the tables is hosted in the local cluster, then the join is performed on the local cluster. For example, with the auto strategy, this query is executed on the local cluster:

    T | ... | join (cluster("B").database("DB").T2 | ...) on Col1
    
  • If both tables are hosted outside of the local cluster, then join is performed on the right cluster. For example, assuming neither cluster is the local cluster, the join would be executed on the right cluster:

    cluster("B").database("DB").T | ... | join (cluster("C").database("DB2").T2 | ...) on Col1
    

Performance considerations

For optimal performance, we recommend running the query on the cluster that contains the largest table.

In the following example, if the dataset produced by T | ... is smaller than one produced by cluster("B").database("DB").T2 | ... then it would be more efficient to execute the join operation on cluster B, in this case the right cluster instead of on the local cluster.

T | ... | join (cluster("B").database("DB").T2 | ...) on Col1

You can rewrite the query to use hint.remote=right to optimize the performance. In this way, the join operation is performed on the right cluster, even if left table is in the local cluster.

T | ... | join hint.remote=right (cluster("B").database("DB").T2 | ...) on Col1

4 - join operator

Learn how to use the join operator to merge the rows of two tables.

Merge the rows of two tables to form a new table by matching values of the specified columns from each table.

Kusto Query Language (KQL) offers many kinds of joins that each affect the schema and rows in the resultant table in different ways. For example, if you use an inner join, the table has the same columns as the left table, plus the columns from the right table. For best performance, if one table is always smaller than the other, use it as the left side of the join operator.

The following image provides a visual representation of the operation performed by each join. The color of the shading represents the columns returned, and the areas shaded represent the rows returned.

Diagram showing query join kinds.

Syntax

LeftTable | join [ kind = JoinFlavor ] [ Hints ] (RightTable) on Conditions

Parameters

NameTypeRequiredDescription
LeftTablestring✔️The left table or tabular expression, sometimes called the outer table, whose rows are to be merged. Denoted as $left.
JoinFlavorstringThe type of join to perform: innerunique, inner, leftouter, rightouter, fullouter, leftanti, rightanti, leftsemi, rightsemi. The default is innerunique. For more information about join flavors, see Returns.
HintsstringZero or more space-separated join hints in the form of Name = Value that control the behavior of the row-match operation and execution plan. For more information, see Hints.
RightTablestring✔️The right table or tabular expression, sometimes called the inner table, whose rows are to be merged. Denoted as $right.
Conditionsstring✔️Determines how rows from LeftTable are matched with rows from RightTable. If the columns you want to match have the same name in both tables, use the syntax ON ColumnName. Otherwise, use the syntax ON $left.LeftColumn == $right.RightColumn. To specify multiple conditions, you can either use the “and” keyword or separate them with commas. If you use commas, the conditions are evaluated using the “and” logical operator.

Hints

Hint keyValuesDescription
hint.remoteauto, left, local, rightSee Cross-Cluster Join
hint.strategy=broadcastSpecifies the way to share the query load on cluster nodes.See broadcast join
hint.shufflekey=<key>The shufflekey query shares the query load on cluster nodes, using a key to partition data.See shuffle query
hint.strategy=shuffleThe shuffle strategy query shares the query load on cluster nodes, where each node processes one partition of the data.See shuffle query
NameValuesDescription
hint.remoteauto, left, local, right
hint.strategy=broadcastSpecifies the way to share the query load on cluster nodes.See broadcast join
hint.shufflekey=<key>The shufflekey query shares the query load on cluster nodes, using a key to partition data.See shuffle query
hint.strategy=shuffleThe shuffle strategy query shares the query load on cluster nodes, where each node processes one partition of the data.See shuffle query

Returns

The return schema and rows depend on the join flavor. The join flavor is specified with the kind keyword. The following table shows the supported join flavors. To see examples for a specific join flavor, select the link in the Join flavor column.

Join flavorReturnsIllustration
innerunique (default)Inner join with left side deduplication
Schema: All columns from both tables, including the matching keys
Rows: All deduplicated rows from the left table that match rows from the right table
:::image type=“icon” source=“media/joinoperator/join-innerunique.png” border=“false”:::
innerStandard inner join
Schema: All columns from both tables, including the matching keys
Rows: Only matching rows from both tables
:::image type=“icon” source=“media/joinoperator/join-inner.png” border=“false”:::
leftouterLeft outer join
Schema: All columns from both tables, including the matching keys
Rows: All records from the left table and only matching rows from the right table
:::image type=“icon” source=“media/joinoperator/join-leftouter.png” border=“false”:::
rightouterRight outer join
Schema: All columns from both tables, including the matching keys
Rows: All records from the right table and only matching rows from the left table
:::image type=“icon” source=“media/joinoperator/join-rightouter.png” border=“false”:::
fullouterFull outer join
Schema: All columns from both tables, including the matching keys
Rows: All records from both tables with unmatched cells populated with null
:::image type=“icon” source=“media/joinoperator/join-fullouter.png” border=“false”:::
leftsemiLeft semi join
Schema: All columns from the left table
Rows: All records from the left table that match records from the right table
:::image type=“icon” source=“media/joinoperator/join-leftsemi.png” border=“false”:::
leftanti, anti, leftantisemiLeft anti join and semi variant
Schema: All columns from the left table
Rows: All records from the left table that don’t match records from the right table
:::image type=“icon” source=“media/joinoperator/join-leftanti.png” border=“false”:::
rightsemiRight semi join
Schema: All columns from the right table
Rows: All records from the right table that match records from the left table
:::image type=“icon” source=“media/joinoperator/join-rightsemi.png” border=“false”:::
rightanti, rightantisemiRight anti join and semi variant
Schema: All columns from the right table
Rows: All records from the right table that don’t match records from the left table
:::image type=“icon” source=“media/joinoperator/join-rightanti.png” border=“false”:::

Cross-join

KQL doesn’t provide a cross-join flavor. However, you can achieve a cross-join effect by using a placeholder key approach.

In the following example, a placeholder key is added to both tables and then used for the inner join operation, effectively achieving a cross-join-like behavior:

X | extend placeholder=1 | join kind=inner (Y | extend placeholder=1) on placeholder

5 - Joining within time window

Learn how to perform a time window join operation to match between two large datasets.

It’s often useful to join between two large datasets on some high-cardinality key, such as an operation ID or a session ID, and further limit the right-hand-side ($right) records that need to match up with each left-hand-side ($left) record by adding a restriction on the “time-distance” between datetime columns on the left and on the right.

The above operation differs from the usual join operation, since for the equi-join part of matching the high-cardinality key between the left and right datasets, the system can also apply a distance function and use it to considerably speed up the join.

Example to identify event sequences without time window

To identify event sequences within a relatively small time window, this example uses a table T with the following schema:

  • SessionId: A column of type string with correlation IDs.
  • EventType: A column of type string that identifies the event type of the record.
  • Timestamp: A column of type datetime indicates when the event described by the record happened.
SessionIdEventTypeTimestamp
0A2017-10-01T00:00:00Z
0B2017-10-01T00:01:00Z
1B2017-10-01T00:02:00Z
1A2017-10-01T00:03:00Z
3A2017-10-01T00:04:00Z
3B2017-10-01T00:10:00Z

The following query creates the dataset and then identifies all the session IDs in which event type A was followed by an event type B within a 1min time window.

let T = datatable(SessionId:string, EventType:string, Timestamp:datetime)
[
    '0', 'A', datetime(2017-10-01 00:00:00),
    '0', 'B', datetime(2017-10-01 00:01:00),
    '1', 'B', datetime(2017-10-01 00:02:00),
    '1', 'A', datetime(2017-10-01 00:03:00),
    '3', 'A', datetime(2017-10-01 00:04:00),
    '3', 'B', datetime(2017-10-01 00:10:00),
];
T
| where EventType == 'A'
| project SessionId, Start=Timestamp
| join kind=inner
    (
    T 
    | where EventType == 'B'
    | project SessionId, End=Timestamp
    ) on SessionId
| where (End - Start) between (0min .. 1min)
| project SessionId, Start, End 

Output

SessionIdStartEnd
02017-10-01 00:00:00.00000002017-10-01 00:01:00.0000000

Example optimized with time window

To optimize this query, we can rewrite it to account for the time window. THe time window is expressed as a join key. Rewrite the query so that the datetime values are “discretized” into buckets whose size is half the size of the time window. Use equi-join to compare the bucket IDs.

The query finds pairs of events within the same session (SessionId) where an ‘A’ event is followed by a ‘B’ event within 1 minute. It projects the session ID, the start time of the ‘A’ event, and the end time of the ‘B’ event.

let T = datatable(SessionId:string, EventType:string, Timestamp:datetime)
[
    '0', 'A', datetime(2017-10-01 00:00:00),
    '0', 'B', datetime(2017-10-01 00:01:00),
    '1', 'B', datetime(2017-10-01 00:02:00),
    '1', 'A', datetime(2017-10-01 00:03:00),
    '3', 'A', datetime(2017-10-01 00:04:00),
    '3', 'B', datetime(2017-10-01 00:10:00),
];
let lookupWindow = 1min;
let lookupBin = lookupWindow / 2.0;
T 
| where EventType == 'A'
| project SessionId, Start=Timestamp, TimeKey = bin(Timestamp, lookupBin)
| join kind=inner
    (
    T 
    | where EventType == 'B'
    | project SessionId, End=Timestamp,
              TimeKey = range(bin(Timestamp-lookupWindow, lookupBin),
                              bin(Timestamp, lookupBin),
                              lookupBin)
    | mv-expand TimeKey to typeof(datetime)
    ) on SessionId, TimeKey 
| where (End - Start) between (0min .. lookupWindow)
| project SessionId, Start, End 

Output

SessionIdStartEnd
02017-10-01 00:00:00.00000002017-10-01 00:01:00.0000000

5 million data query

The next query emulates an extensive dataset of 5M records and approximately 1M Session IDs and runs the query with the time window technique.

let T = range x from 1 to 5000000 step 1
| extend SessionId = rand(1000000), EventType = rand(3), Time=datetime(2017-01-01)+(x * 10ms)
| extend EventType = case(EventType < 1, "A",
                          EventType < 2, "B",
                          "C");
let lookupWindow = 1min;
let lookupBin = lookupWindow / 2.0;
T 
| where EventType == 'A'
| project SessionId, Start=Time, TimeKey = bin(Time, lookupBin)
| join kind=inner
    (
    T 
    | where EventType == 'B'
    | project SessionId, End=Time, 
              TimeKey = range(bin(Time-lookupWindow, lookupBin), 
                              bin(Time, lookupBin),
                              lookupBin)
    | mv-expand TimeKey to typeof(datetime)
    ) on SessionId, TimeKey 
| where (End - Start) between (0min .. lookupWindow)
| project SessionId, Start, End 
| count 

Output

Count
3344