1 - next()

Learn how to use the next() function to return the value of the next column at an offset.

Returns the value of a column in a row that is at some offset following the current row in a serialized row set.

Syntax

next(column, [ offset, default_value ])

Parameters

NameTypeRequiredDescription
columnstring✔️The column from which to get the values.
offsetintThe amount of rows to move from the current row. Default is 1.
default_valuescalarThe default value when there’s no value in the next row. When no default value is specified, null is used.

Examples

Filter data based on comparison between adjacent rows

The following query returns rows that show breaks longer than a quarter of a second between calls to sensor-9.

TransformedSensorsData
| where SensorName == 'sensor-9'
| sort by Timestamp asc
| extend timeDiffInMilliseconds = datetime_diff('millisecond', next(Timestamp, 1), Timestamp)
| where timeDiffInMilliseconds > 250

Output

TimestampSensorNameValuePublisherIdMachineIdtimeDiff
2022-04-13T00:58:53.048506Zsensor-90.39217481975439894fdbd39ab-82ac-4ca0-99ed-2f83daf3f9bbM100251
2022-04-13T01:07:09.63713Zsensor-90.46645392778288297e3ed081e-501b-4d59-8e60-8524633d9131M100313
2022-04-13T01:07:10.858267Zsensor-90.693091598493419278ca033-2b5e-4f2c-b493-00319b275aeaM100254
2022-04-13T01:07:11.203834Zsensor-90.524158088402497784ea27181-392d-4947-b811-ad5af02a54bbM100331
2022-04-13T01:07:14.431908Zsensor-90.354306454054520af415c2-59dc-4a50-89c3-9a18ae5d621fM100268

Perform aggregation based on comparison between adjacent rows

The following query calculates the average time difference in milliseconds between calls to sensor-9.

TransformedSensorsData
| where SensorName == 'sensor-9'
| sort by Timestamp asc
| extend timeDiffInMilliseconds = datetime_diff('millisecond', next(Timestamp, 1), Timestamp)
| summarize avg(timeDiffInMilliseconds)

Output

avg_timeDiffInMilliseconds
30.726900061254298

Extend row with data from the next row

In the following query, as part of the serialization done with the serialize operator, a new column next_session_type is added with data from the next row.

ConferenceSessions
| where conference == 'Build 2019'
| serialize next_session_type = next(session_type)
| project time_and_duration, session_title, session_type, next_session_type

Output

time_and_durationsession_titlesession_typenext_session_type
Mon, May 6, 8:30-10:00 amVision Keynote - Satya NadellaKeynoteExpo Session
Mon, May 6, 1:20-1:40 pmAzure Data Explorer: Advanced Time Series analysisExpo SessionBreakout
Mon, May 6, 2:00-3:00 pmAzure’s Data Platform - Powering Modern Applications and Cloud Scale Analytics at Petabyte ScaleBreakoutExpo Session
Mon, May 6, 4:00-4:20 pmHow BASF is using Azure Data ServicesExpo SessionExpo Session
Mon, May 6, 6:50 - 7:10 pmAzure Data Explorer: Operationalize your ML modelsExpo SessionExpo Session

2 - prev()

Learn how to use the prev() function to return the value of a specific column in a specified row.

Returns the value of a specific column in a specified row. The specified row is at a specified offset from the current row in a serialized row set.

Syntax

prev(column, [ offset ], [ default_value ] )

Parameters

NameTypeRequiredDescription
columnstring✔️The column from which to get the values.
offsetintThe offset to go back in rows. The default is 1.
default_valuescalarThe default value to be used when there are no previous rows from which to take the value. The default is null.

Examples

Filter data based on comparison between adjacent rows

The following query returns rows that show breaks longer than a quarter of a second between calls to sensor-9.

TransformedSensorsData
| where SensorName == 'sensor-9'
| sort by Timestamp asc
| extend timeDiffInMilliseconds = datetime_diff('millisecond', Timestamp, prev(Timestamp, 1))
| where timeDiffInMilliseconds > 250

Output

TimestampSensorNameValuePublisherIdMachineIdtimeDiff
2022-04-13T00:58:53.048506Zsensor-90.39217481975439894fdbd39ab-82ac-4ca0-99ed-2f83daf3f9bbM100251
2022-04-13T01:07:09.63713Zsensor-90.46645392778288297e3ed081e-501b-4d59-8e60-8524633d9131M100313
2022-04-13T01:07:10.858267Zsensor-90.693091598493419278ca033-2b5e-4f2c-b493-00319b275aeaM100254
2022-04-13T01:07:11.203834Zsensor-90.524158088402497784ea27181-392d-4947-b811-ad5af02a54bbM100331
2022-04-13T01:07:14.431908Zsensor-90.354306454054520af415c2-59dc-4a50-89c3-9a18ae5d621fM100268

Perform aggregation based on comparison between adjacent rows

The following query calculates the average time difference in milliseconds between calls to sensor-9.

TransformedSensorsData
| where SensorName == 'sensor-9'
| sort by Timestamp asc
| extend timeDiffInMilliseconds = datetime_diff('millisecond', Timestamp, prev(Timestamp, 1))
| summarize avg(timeDiffInMilliseconds)

Output

avg_timeDiffInMilliseconds
30.726900061254298

Extend row with data from the previous row

In the following query, as part of the serialization done with the serialize operator, a new column previous_session_type is added with data from the previous row. Since there was no session prior to the first session, the column is empty in the first row.

ConferenceSessions
| where conference == 'Build 2019'
| serialize previous_session_type = prev(session_type)
| project time_and_duration, session_title, session_type, previous_session_type

Output

time_and_durationsession_titlesession_typeprevious_session_type
Mon, May 6, 8:30-10:00 amVision Keynote - Satya NadellaKeynote
Mon, May 6, 1:20-1:40 pmAzure Data Explorer: Advanced Time Series analysisExpo SessionKeynote
Mon, May 6, 2:00-3:00 pmAzure’s Data Platform - Powering Modern Applications and Cloud Scale Analytics at Petabyte ScaleBreakoutExpo Session
Mon, May 6, 4:00-4:20 pmHow BASF is using Azure Data ServicesExpo SessionBreakout
Mon, May 6, 6:50 - 7:10 pmAzure Data Explorer: Operationalize your ML modelsExpo SessionExpo Session

3 - row_cumsum()

Learn how to use the row_cumsum() function to calculate the cumulative sum of a column in a serialized row set.

Calculates the cumulative sum of a column in a serialized row set.

Syntax

row_cumsum( term [, restart] )

Parameters

NameTypeRequiredDescription
termint, long, or real✔️The expression indicating the value to be summed.
restartboolIndicates when the accumulation operation should be restarted, or set back to 0. It can be used to indicate partitions in the data.

Returns

The function returns the cumulative sum of its argument.

Examples

The following example shows how to calculate the cumulative sum of the first few even integers.

datatable (a:long) [
    1, 2, 3, 4, 5, 6, 7, 8, 9, 10
]
| where a%2==0
| serialize cs=row_cumsum(a)
acs
22
46
612
820
1030

This example shows how to calculate the cumulative sum (here, of salary) when the data is partitioned (here, by name):

datatable (name:string, month:int, salary:long)
[
    "Alice", 1, 1000,
    "Bob",   1, 1000,
    "Alice", 2, 2000,
    "Bob",   2, 1950,
    "Alice", 3, 1400,
    "Bob",   3, 1450,
]
| order by name asc, month asc
| extend total=row_cumsum(salary, name != prev(name))
namemonthsalarytotal
Alice110001000
Alice220003000
Alice314004400
Bob110001000
Bob219502950
Bob314504400

4 - row_number()

Learn how to use the row_number() to return the current row’s index in a serialized row set.

Returns the current row’s index in a serialized row set.

The row index starts by default at 1 for the first row, and is incremented by 1 for each additional row. Optionally, the row index can start at a different value than 1. Additionally, the row index may be reset according to some provided predicate.

Syntax

row_number( [StartingIndex [, Restart]] )

Parameters

NameTypeRequiredDescription
StartingIndexlongThe value of the row index to start at or restart to. The default value is 1.
restartboolIndicates when the numbering is to be restarted to the StartingIndex value. The default is false.

Returns

The function returns the row index of the current row as a value of type long.

Examples

The following example returns a table with two columns, the first column (a) with numbers from 10 down to 1, and the second column (rn) with numbers from 1 up to 10:

range a from 1 to 10 step 1
| sort by a desc
| extend rn=row_number()

The following example is similar to the above, only the second column (rn) starts at 7:

range a from 1 to 10 step 1
| sort by a desc
| extend rn=row_number(7)

The last example shows how one can partition the data and number the rows per each partition. Here, we partition the data by Airport:

datatable (Airport:string, Airline:string, Departures:long)
[
  "TLV", "LH", 1,
  "TLV", "LY", 100,
  "SEA", "LH", 1,
  "SEA", "BA", 2,
  "SEA", "LY", 0
]
| sort by Airport asc, Departures desc
| extend Rank=row_number(1, prev(Airport) != Airport)

Running this query produces the following result:

AirportAirlineDeparturesRank
SEABA21
SEALH12
SEALY03
TLVLY1001
TLVLH12

5 - row_rank_dense()

Learn how to use the row_rank_dense() function to return the current row’s dense rank in a serialized row set.

Returns the current row’s dense rank in a serialized row set.

The row rank starts by default at 1 for the first row, and is incremented by 1 whenever the provided Term is different than the previous row’s Term.

Syntax

row_rank_dense ( Term )

Parameters

NameTypeRequiredDescription
Termstring✔️An expression indicating the value to consider for the rank. The rank is increased whenever the Term changes.
restartboolIndicates when the numbering is to be restarted to the StartingIndex value. The default is false.

Returns

Returns the row rank of the current row as a value of type long.

Example

The following query shows how to rank the Airline by the number of departures from the SEA Airport using dense rank.

datatable (Airport:string, Airline:string, Departures:long)
[
  "SEA", "LH", 3,
  "SEA", "LY", 100,
  "SEA", "UA", 3,
  "SEA", "BA", 2,
  "SEA", "EL", 3
]
| sort by Departures asc
| extend Rank=row_rank_dense(Departures)

Output

AirportAirlineDeparturesRank
SEABA21
SEALH32
SEAUA32
SEAEL32
SEALY1003

The following example shows how to rank the Airline by the number of departures per each partition. Here, we partition the data by Airport:

datatable (Airport:string, Airline:string, Departures:long)
[
  "SEA", "LH", 3,
  "SEA", "LY", 100,
  "SEA", "UA", 3,
  "SEA", "BA", 2,
  "SEA", "EL", 3,
  "AMS", "EL", 1,
  "AMS", "BA", 1
]
| sort by Airport desc, Departures asc
| extend Rank=row_rank_dense(Departures, prev(Airport) != Airport)

Output

AirportAirlineDeparturesRank
SEABA21
SEALH32
SEAUA32
SEAEL32
SEALY1003
AMSEL11
AMSBA11

6 - row_rank_min()

Learn how to use the row_rank_min() function to return the current row’s minimal rank in a serialized row set.

Returns the current row’s minimal rank in a serialized row set.

The rank is the minimal row number that the current row’s Term appears in.

Syntax

row_rank_min ( Term )

Parameters

NameTypeRequiredDescription
Termstring✔️An expression indicating the value to consider for the rank. The rank is the minimal row number for Term.
restartboolIndicates when the numbering is to be restarted to the StartingIndex value. The default is false.

Returns

Returns the row rank of the current row as a value of type long.

Example

The following query shows how to rank the Airline by the number of departures from the SEA Airport.

datatable (Airport:string, Airline:string, Departures:long)
[
  "SEA", "LH", 3,
  "SEA", "LY", 100,
  "SEA", "UA", 3,
  "SEA", "BA", 2,
  "SEA", "EL", 3
]
| sort by Departures asc
| extend Rank=row_rank_min(Departures)

Output

AirportAirlineDeparturesRank
SEABA21
SEALH32
SEAUA32
SEAEL32
SEALY1005

7 - row_window_session()

Learn how to use the row_window_session() function to calculate session start values of a column in a serialized row set.

Calculates session start values of a column in a serialized row set.

Syntax

row_window_session ( Expr , MaxDistanceFromFirst , MaxDistanceBetweenNeighbors [, Restart] )

Parameters

NameTypeRequiredDescription
Exprdatetime✔️An expression whose values are grouped together in sessions. When Expr results in a null value, the next value starts a new session.
MaxDistanceFromFirsttimespan✔️Determines when a new session starts using the maximum distance between the current Expr value and its value at the beginning of the session.
MaxDistanceBetweenNeighborstimespan✔️Another criterion for starting a new session using the maximum distance from one value of Expr to the next.
RestartbooleanIf specified, every value that evaluates to true immediately restarts the session.

Returns

The function returns the values at the beginning of each session. It uses the following conceptual calculation model:

  1. Iterates over the input sequence of Expr values in order.

  2. For each value, it decides whether to create a new session.

  3. If a new session is created, the function returns the current value of Expr. Otherwise, it returns the previous value of Expr.

MaxDistanceFromFirst. plus MaxDistanceBetweenNeighbors.

Examples

The following example calculates session start values for a table, datatable, with a sequence ID column and a Timestamp column to record the time of each record. The data is sorted by the sequence IDs and timestamps and then the example returns values for ID, Timestamp, and a new SessionStarted column. A session can’t exceed one hour. It continues for as long as records are less than five minutes apart and the ID stays the same. The example includes records that are less than five minutes apart.

datatable (ID:string, Timestamp:datetime) [
    "1", datetime(2024-04-11 10:00:00),
    "2", datetime(2024-04-11 10:18:00),
    "1", datetime(2024-04-11 11:00:00),
    "3", datetime(2024-04-11 11:30:00),
    "2", datetime(2024-04-11 13:30:00),
    "2", datetime(2024-04-11 10:16:00)
]
| sort by ID asc, Timestamp asc
| extend SessionStarted = row_window_session(Timestamp, 1h, 5m, ID != prev(ID))

Output

IDTimestampSessionStarted
12024-04-11T10:00:00Z2024-04-11T10:00:00Z
12024-04-11T11:00:00Z2024-04-11T11:00:00Z
22024-04-11T10:16:00Z2024-04-11T10:16:00Z
22024-04-11T10:18:00Z2024-04-11T10:16:00Z
22024-04-11T13:30:00Z2024-04-11T13:30:00Z
32024-04-11T11:30:00Z2024-04-11T11:30:00Z

8 - Window functions

Learn how to use window functions on rows in a serialized set.

Window functions operate on multiple rows (records) in a row set at a time. Unlike aggregation functions, window functions require that the rows in the row set be serialized (have a specific order to them). Window functions may depend on the order to determine the result.

Window functions can only be used on serialized sets. The easiest way to serialize a row set is to use the serialize operator. This operator “freezes” the order of rows in an arbitrary manner. If the order of serialized rows is semantically important, use the sort operator to force a particular order.

The serialization process has a non-trivial cost associated with it. For example, it might prevent query parallelism in many scenarios. Therefore, don’t apply serialization unnecessarily. If necessary, rearrange the query to perform serialization on the smallest row set possible.

Serialized row set

An arbitrary row set (such as a table, or the output of a tabular operator) can be serialized in one of the following ways:

  1. By sorting the row set. See below for a list of operators that emit sorted row sets.
  2. By using the serialize operator.

Many tabular operators serialize output whenever the input is already serialized, even if the operator doesn’t itself guarantee that the result is serialized. For example, this property is guaranteed for the extend operator, the project operator, and the where operator.

Operators that emit serialized row sets by sorting

Operators that preserve the serialized row set property