externaldata operator

Learn how to use the externaldata operator to return a data table of the given schema whose data was parsed from the specified storage artifact.

The externaldata operator returns a table whose schema is defined in the query itself, and whose data is read from an external storage artifact, such as a blob in Azure Blob Storage or a file in Azure Data Lake Storage.

Syntax

externaldata (columnName:columnType [, …] ) [ storageConnectionString [, …] ] [with ( propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
columnName, columnTypestring✔️A list of column names and their types. This list defines the schema of the table.
storageConnectionStringstring✔️A storage connection string of the storage artifact to query.
propertyName, propertyValuestringA list of optional supported properties that determines how to interpret the data retrieved from storage.

Supported properties

PropertyTypeDescription
formatstringThe data format. If unspecified, an attempt is made to detect the data format from file extension. The default is CSV. All ingestion data formats are supported.
ignoreFirstRecordboolIf set to true, the first record in every file is ignored. This property is useful when querying CSV files with headers.
ingestionMappingstringIndicates how to map data from the source file to the actual columns in the operator result set. See data mappings.

Returns

The externaldata operator returns a data table of the given schema whose data was parsed from the specified storage artifact, indicated by the storage connection string.

Examples

The examples query data in an external storage file.

Fetch a list of user IDs stored in Azure Blob Storage

The following example shows how to find all records in a table whose UserID column falls into a known set of IDs, held (one per line) in an external storage file. Since the data format isn’t specified, the detected data format is TXT.

Users
| where UserID in ((externaldata (UserID:string) [
    @"https://storageaccount.blob.core.windows.net/storagecontainer/users.txt" 
      h@"?...SAS..." // Secret token needed to access the blob
    ]))
| ...

Query multiple data files

The following example queries multiple data files stored in external storage.

externaldata(Timestamp:datetime, ProductId:string, ProductDescription:string)
[
  h@"https://mycompanystorage.blob.core.windows.net/archivedproducts/2019/01/01/part-00000-7e967c99-cf2b-4dbb-8c53-ce388389470d.csv.gz?...SAS...",
  h@"https://mycompanystorage.blob.core.windows.net/archivedproducts/2019/01/02/part-00000-ba356fa4-f85f-430a-8b5a-afd64f128ca4.csv.gz?...SAS...",
  h@"https://mycompanystorage.blob.core.windows.net/archivedproducts/2019/01/03/part-00000-acb644dc-2fc6-467c-ab80-d1590b23fc31.csv.gz?...SAS..."
]
with(format="csv")
| summarize count() by ProductId

The above example can be thought of as a quick way to query multiple data files without defining an external table.

Query hierarchical data formats

To query hierarchical data format, such as JSON, Parquet, Avro, or ORC, ingestionMapping must be specified in the operator properties. In this example, there’s a JSON file stored in Azure Blob Storage with the following contents:

{
  "timestamp": "2019-01-01 10:00:00.238521",   
  "data": {    
    "tenant": "e1ef54a6-c6f2-4389-836e-d289b37bcfe0",   
    "method": "RefreshTableMetadata"   
  }   
}   
{
  "timestamp": "2019-01-01 10:00:01.845423",   
  "data": {   
    "tenant": "9b49d0d7-b3e6-4467-bb35-fa420a25d324",   
    "method": "GetFileList"   
  }   
}
...

To query this file using the externaldata operator, a data mapping must be specified. The mapping dictates how to map JSON fields to the operator result set columns:

externaldata(Timestamp: datetime, TenantId: guid, MethodName: string)
[ 
   h@'https://mycompanystorage.blob.core.windows.net/events/2020/09/01/part-0000046c049c1-86e2-4e74-8583-506bda10cca8.json?...SAS...'
]
with(format='multijson', ingestionMapping='[{"Column":"Timestamp","Properties":{"Path":"$.timestamp"}},{"Column":"TenantId","Properties":{"Path":"$.data.tenant"}},{"Column":"MethodName","Properties":{"Path":"$.data.method"}}]')

The MultiJSON format is used here because single JSON records are spanned into multiple lines.

For more info on mapping syntax, see data mappings.