This is the multi-page printable view of this section. Click here to print.
External tables
1 - Azure SQL external tables
1.1 - Create and alter Azure SQL external tables
Creates or alters an Azure SQL external table in the database in which the command is executed.
Supported Azure SQL external table types
- SQL Server
- MySQL
- PostgreSQL
- Cosmos DB
Permissions
To .create
requires at least Database User permissions and to .alter
requires at least Table Admin permissions.
To .create
, .alter
, or .create-or-alter
an external table using managed identity authentication requires Database Admin permissions. This method is supported for SQL Server and Cosmos DB external tables.
Syntax
(.create
| .alter
| .create-or-alter
) external
table
TableName (
Schema)
kind
=
sql
[ table
=
SqlTableName ] (
SqlConnectionString)
[with
(
[ sqlDialect
=
SqlDialect ] ,
[ Property ,
… ])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
TableName | string | ✔️ | The name of the external table. The name must follow the rules for entity names, and an external table can’t have the same name as a regular table in the same database. |
Schema | string | ✔️ | The external data schema is a comma-separated list of one or more column names and data types, where each item follows the format: ColumnName : ColumnType. |
SqlTableName | string | The name of the SQL table not including the database name. For example, “MySqlTable” and not “db1.MySqlTable”. If the name of the table contains a period ("."), use [‘Name.of.the.table’] notation. | |
This specification is required for all types of tables except for Cosmos DB, as for Cosmos DB the collection name is part of the connection string. | |||
SqlConnectionString | string | ✔️ | The connection string to the SQL server. |
SqlDialect | string | Indicates the type of Azure SQL external table. SQL Server is the default. For MySQL, specify MySQL . For PostgreSQL, specify PostgreSQL . For Cosmos DB, specify CosmosDbSql . | |
Property | string | A key-value property pair in the format PropertyName = PropertyValue. See optional properties. |
Optional properties
Property | Type | Description |
---|---|---|
folder | string | The table’s folder. |
docString | string | A string documenting the table. |
firetriggers | true /false | If true , instructs the target system to fire INSERT triggers defined on the SQL table. The default is false . (For more information, see BULK INSERT and System.Data.SqlClient.SqlBulkCopy) |
createifnotexists | true / false | If true , the target SQL table is created if it doesn’t already exist; the primarykey property must be provided in this case to indicate the result column that is the primary key. The default is false . |
primarykey | string | If createifnotexists is true , the resulting column name is used as the SQL table’s primary key if it’s created by this command. |
Authentication and authorization
To interact with an external Azure SQL table, you must specify authentication means as part of the SqlConnectionString. The SqlConnectionString defines the resource to access and its authentication information.
For more information, see Azure SQL external table authentication methods.
Examples
The following examples show how to create each type of Azure SQL external table.
SQL Server
.create external table MySqlExternalTable (x:long, s:string)
kind=sql
table=MySqlTable
(
h@'Server=tcp:myserver.database.windows.net,1433;Authentication=Active Directory Integrated;Initial Catalog=mydatabase;'
)
with
(
docstring = "Docs",
folder = "ExternalTables",
createifnotexists = true,
primarykey = x,
firetriggers=true
)
Output
TableName | TableType | Folder | DocString | Properties |
---|---|---|---|---|
MySqlExternalTable | Sql | ExternalTables | Docs | { “TargetEntityKind”: “sqltable`”, “TargetEntityName”: “MySqlTable”, “TargetEntityConnectionString”: “Server=tcp:myserver.database.windows.net,1433;Authentication=Active Directory Integrated;Initial Catalog=mydatabase;”, “FireTriggers”: true, “CreateIfNotExists”: true, “PrimaryKey”: “x” } |
MySQL
.create external table MySqlExternalTable (x:long, s:string)
kind=sql
table=MySqlTable
(
h@'Server=myserver.mysql.database.windows.net;Port = 3306;UID = USERNAME;Pwd = PASSWORD;Database = mydatabase;'
)
with
(
sqlDialect = "MySql",
docstring = "Docs",
folder = "ExternalTables",
)
PostgreSQL
.create external table PostgreSqlExternalTable (x:long, s:string)
kind=sql
table=PostgreSqlTable
(
h@'Host = hostname.postgres.database.azure.com; Port = 5432; Database= db; User Id=user; Password=pass; Timeout = 30;'
)
with
(
sqlDialect = "PostgreSQL",
docstring = "Docs",
folder = "ExternalTables",
)
Cosmos DB
.create external table CosmosDBSQLExternalTable (x:long, s:string)
kind=sql
(
h@'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=MyDatabase;Collection=MyCollection;AccountKey=' h'R8PM...;'
)
with
(
sqlDialect = "CosmosDbSQL",
docstring = "Docs",
folder = "ExternalTables",
)
Related content
1.2 - Query SQL external tables
You can query a SQL external table just as you would query an Azure Data Explorer or a table in a KQL Database.
How it works
Azure SQL external table queries are translated from Kusto Query Language (KQL) to SQL. The operators after the external_table function call, such as where, project, count, and so on, are pushed down and translated into a single SQL query to be executed against the target SQL table.
Example
For example, consider an external table named MySqlExternalTable
with two columns x
and s
. In this case, the following KQL query is translated into the following SQL query.
KQL query
external_table(MySqlExternalTable)
| where x > 5
| count
SQL query
SELECT COUNT(*) FROM (SELECT x, s FROM MySqlTable WHERE x > 5) AS Subquery1
1.3 - Use row-level security with Azure SQL external tables
Apply row-level security on Azure SQL external tables
This document describes how to apply a row-level security (RLS) solution with SQL external tables. row-level security implements data isolation at the user level, restricting the access to data based on the current user credential. However, Kusto external tables don’t support RLS policy definitions, so data isolation on external SQL tables require a different approach. The following solution employs using row-level security in SQL Server, and Microsoft Entra ID Impersonation in the SQL Server connection string. This combination provides the same behavior as applying user access control with RLS on standard Kusto tables, such that the users querying the SQL External Table are able to only see the records addressed to them, based on the row-level security policy defined in the source database.
Prerequisites
ALTER ANY SECURITY POLICY
permission on the SQL Server- Table admin level permissions on the Kusto-side SQL external table
Sample table
The example source is a SQL Server table called SourceTable
, with the following schema. The systemuser
column contains the user email to whom the data record belongs. This is the same user who should have access to this data.
CREATE TABLE SourceTable (
id INT,
region VARCHAR(5),
central VARCHAR(5),
systemuser VARCHAR(200)
)
Configure row-level security in the source SQL Server - SQL Server side
For general information on SQL Server row-level security, see row-level security in SQL Server.
Create a SQL Function with the logic for the data access policy. In this example, the row-level security is based on the current user’s email matching the
systemuser
column. This logic could be modified to meet any other business requirement.CREATE SCHEMA Security; GO CREATE FUNCTION Security.mySecurityPredicate(@CheckColumn AS nvarchar(100)) RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT 1 AS mySecurityPredicate_result WHERE @CheckColumn = ORIGINAL_LOGIN() OR USER_NAME() = 'Manager'; GO
Create the Security Policy on the table
SourceTable
with passing the column name as the parameter:CREATE SECURITY POLICY SourceTableFilter ADD FILTER PREDICATE Security.mySecurityPredicate(systemuser) ON dbo.SourceTable WITH (STATE = ON) GO
[!NOTE] At this point, the data is already restricted by the
mySecurityPredicate
function logic.
Allow user access to SQL Server - SQL Server side
The following steps depend on the SQL Server version that you’re using.
Create a sign in and User for each Microsoft Entra ID credential that is going to access the data stored in SQL Server:
CREATE LOGIN [user@domain.com] FROM EXTERNAL PROVIDER --MASTER CREATE USER [user@domain.com] FROM EXTERNAL PROVIDER --DATABASE
Grant SELECT on the Security function to the Microsoft Entra ID user:
GRANT SELECT ON Security.mySecurityPredicate to [user@domain.com]
Grant SELECT on the
SourceTable
to the Microsoft Entra ID user:GRANT SELECT ON dbo.SourceTable to [user@domain.com]
Define SQL external table connection String - Kusto side
For more information on the connection string, see SQL External Table Connection Strings.
Create a SQL External Table with using Connection String with
Active Directory Integrated
authentication type. For more information, see Microsoft Entra integrated (impersonation)..create external table SQLSourceTable (id:long, region:string, central:string, systemser:string) kind=sql table=SourceTable ( h@'Server=tcp:[sql server endpoint],1433;Authentication=Active Directory Integrated;Initial Catalog=[database name];' ) with ( docstring = "Docs", folder = "ExternalTables", createifnotexists = false, primarykey = 'id' )
Connection String:
Server=tcp:[sql server endpoint],1433;Authentication=Active Directory Integrated;Initial Catalog=[database name];
Validate the data isolation based on the Microsoft Entra ID, like it would work with row-level security on in Kusto. In this case, the data is filtered based on the SourceTable’s
systemuser
column, matching the Microsoft Entra ID user (email address) from the Kusto impersonation:external_table('SQLSourceTable')
[!NOTE] The policy can be disabled and enabled again, on the SQL Server side, for testing purposes.
To disable and enable the policy, use the following SQL commands:
ALTER SECURITY POLICY SourceTableFilter
WITH (STATE = OFF);
ALTER SECURITY POLICY SourceTableFilter
WITH (STATE = ON);
With the Security Policy enabled on the SQL Server side, Kusto users only see the records matching their Microsoft Entra IDs, as the result of the query against the SQL External table. With the Security Policy disabled, all users are able to access the full table content as the result of the query against the SQL External table.
Related content
2 - Azure Storage external tables
2.1 - Create and alter Azure Storage delta external tables
The commands in this article can be used to create or alter a delta external table in the database from which the command is executed. A delta external table references Delta Lake table data located in Azure Blob Storage, Azure Data Lake Store Gen1, or Azure Data Lake Store Gen2.
To accelerate queries over external delta tables, see Query acceleration policy.
Permissions
To .create
requires at least Database User permissions, and to .alter
requires at least Table Admin permissions.
To .create-or-alter
an external table using managed identity authentication requires AllDatabasesAdmin permissions.
Syntax
(.create
| .alter
| .create-or-alter
) external
table
TableName [(
Schema)
] kind
=
delta
(
StorageConnectionString )
[with
(
Property [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
TableName | string | ✔️ | An external table name that adheres to the entity names rules. An external table can’t have the same name as a regular table in the same database. |
Schema | string | The optional external data schema is a comma-separated list of one or more column names and data types, where each item follows the format: ColumnName : ColumnType. If not specified, it will be automatically inferred from the delta log based on the latest delta table version. | |
StorageConnectionString | string | ✔️ | delta table root folder path, including credentials. Can point to Azure Blob Storage blob container, Azure Data Lake Gen 2 file system or Azure Data Lake Gen 1 container. The external table storage type is determined by the provided connection string. See storage connection strings. |
Property | string | A key-value property pair in the format PropertyName = PropertyValue. See optional properties. |
Authentication and authorization
The authentication method to access an external table is based on the connection string provided during its creation, and the permissions required to access the table vary depending on the authentication method.
The supported authentication methods are the same as those supported by Azure Storage external tables.
Optional properties
Property | Type | Description |
---|---|---|
folder | string | Table’s folder |
docString | string | String documenting the table |
compressed | bool | Only relevant for the export scenario. If set to true, the data is exported in the format specified by the compressionType property. For the read path, compression is automatically detected. |
compressionType | string | Only relevant for the export scenario. The compression type of exported files. For non-Parquet files, only gzip is allowed. For Parquet files, possible values include gzip , snappy , lz4_raw , brotli , and zstd . Default is gzip . For the read path, compression type is automatically detected. |
namePrefix | string | If set, specifies the prefix of the files. On write operations, all files will be written with this prefix. On read operations, only files with this prefix are read. |
fileExtension | string | If set, specifies extension of the files. On write, files names will end with this suffix. On read, only files with this file extension will be read. |
encoding | string | Specifies how the text is encoded: UTF8NoBOM (default) or UTF8BOM . |
dryRun | bool | If set, the external table definition isn’t persisted. This option is useful for validating the external table definition, especially in conjunction with the filesPreview or sampleUris parameter. |
Examples
Create or alter a delta external table with an inferred schema
In the following external table, the schema is automatically inferred from the latest delta table version.
.create-or-alter external table ExternalTable
kind=delta
(
h@'https://storageaccount.blob.core.windows.net/container1;secretKey'
)
Create a delta external table with a custom schema
In the following external table, a custom schema is specified and overrides the schema of the delta table. If, at some later time, you need to replace the custom schema with the schema based on the latest delta table version, run the .alter
| .create-or-alter
command without specifying a schema, like in the previous example.
.create external table ExternalTable (Timestamp:datetime, x:long, s:string)
kind=delta
(
h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secretKey'
)
Limitations
- Time travel is not supported. Only the latest delta table version is used.
Related content
2.2 - Create and alter Azure Storage external tables
The commands in this article can be used to create or alter an Azure Storage external table in the database from which the command is executed. An Azure Storage external table references data located in Azure Blob Storage, Azure Data Lake Store Gen1, or Azure Data Lake Store Gen2.
Permissions
To .create
requires at least Database User permissions, and to .alter
requires at least Table Admin permissions.
To .create-or-alter
an external table using managed identity authentication requires AllDatabasesAdmin permissions.
Syntax
(.create
| .alter
| .create-or-alter
) external
table
TableName (
Schema)
kind
=
storage
[partition
by
(
Partitions)
[pathformat
=
(
PathFormat)
]] dataformat
=
DataFormat (
StorageConnectionString [,
…] )
[with
(
Property [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
TableName | string | ✔️ | An external table name that adheres to the entity names rules. An external table can’t have the same name as a regular table in the same database. |
Schema | string | ✔️ | The external data schema is a comma-separated list of one or more column names and data types, where each item follows the format: ColumnName : ColumnType. If the schema is unknown, use infer_storage_schema to infer the schema based on external file contents. |
Partitions | string | A comma-separated list of columns by which the external table is partitioned. Partition column can exist in the data file itself, or as part of the file path. See partitions formatting to learn how this value should look. | |
PathFormat | string | An external data folder URI path format to use with partitions. See path format. | |
DataFormat | string | ✔️ | The data format, which can be any of the ingestion formats. We recommend using the Parquet format for external tables to improve query and export performance, unless you use JSON paths mapping. When using an external table for export scenario, you’re limited to the following formats: CSV , TSV , JSON and Parquet . |
StorageConnectionString | string | ✔️ | One or more comma-separated paths to Azure Blob Storage blob containers, Azure Data Lake Gen 2 file systems or Azure Data Lake Gen 1 containers, including credentials. The external table storage type is determined by the provided connection strings. See storage connection strings. |
Property | string | A key-value property pair in the format PropertyName = PropertyValue. See optional properties. |
Authentication and authorization
The authentication method to access an external table is based on the connection string provided during its creation, and the permissions required to access the table vary depending on the authentication method.
The following table lists the supported authentication methods for Azure Storage external tables and the permissions needed to read or write to the table.
Authentication method | Azure Blob Storage / Data Lake Storage Gen2 | Data Lake Storage Gen1 |
---|---|---|
Impersonation | Read permissions: Storage Blob Data Reader Write permissions: Storage Blob Data Contributor | Read permissions: Reader Write permissions: Contributor |
Managed identity | Read permissions: Storage Blob Data Reader Write permissions: Storage Blob Data Contributor | Read permissions: Reader Write permissions: Contributor |
Shared Access (SAS) token | Read permissions: List + Read Write permissions: Write | This authentication method isn’t supported in Gen1. |
Microsoft Entra access token | No additional permissions required. | No additional permissions required. |
Storage account access key | No additional permissions required. | This authentication method isn’t supported in Gen1. |
Path format
The PathFormat parameter allows you to specify the format for the external data folder URI path in addition to partitions. It consists of a sequence of partition elements and text separators. A partition element refers to a partition that is declared in the partition by
clause, and the text separator is any text enclosed in quotes. Consecutive partition elements must be set apart using the text separator.
[ StringSeparator ] Partition [ StringSeparator ] [Partition [ StringSeparator ] …]
To construct the original file path prefix, partition elements are rendered as strings and separated with corresponding text separators. You can use the datetime_pattern
macro (datetime_pattern(
DateTimeFormat,
PartitionName)
) to specify the format used for rendering a datetime partition value. The macro adheres to the .NET format specification, and allows format specifiers to be enclosed in curly brackets. For example, the following two formats are equivalent:
- ‘year=‘yyyy’/month=‘MM
- year={yyyy}/month={MM}
By default, datetime values are rendered using the following formats:
Partition function | Default format |
---|---|
startofyear | yyyy |
startofmonth | yyyy/MM |
startofweek | yyyy/MM/dd |
startofday | yyyy/MM/dd |
bin( Column, 1d) | yyyy/MM/dd |
bin( Column, 1h) | yyyy/MM/dd/HH |
bin( Column, 1m) | yyyy/MM/dd/HH/mm |
Virtual columns
When data is exported from Spark, partition columns (that are provided to the dataframe writer’s partitionBy
method) aren’t written to data files.
This process avoids data duplication because the data is already present in the folder names (for example, column1=<value>/column2=<value>/
), and Spark can recognize it upon read.
External tables support reading this data in the form of virtual colums
. Virtual columns can be of either type string
or datetime
, and are specified using the following syntax:
.create external table ExternalTable (EventName:string, Revenue:double)
kind=storage
partition by (CustomerName:string, Date:datetime)
pathformat=("customer=" CustomerName "/date=" datetime_pattern("yyyyMMdd", Date))
dataformat=parquet
(
h@'https://storageaccount.blob.core.windows.net/container1;secretKey'
)
To filter by virtual columns in a query, specify partition names in query predicate:
external_table("ExternalTable")
| where Date between (datetime(2020-01-01) .. datetime(2020-02-01))
| where CustomerName in ("John.Doe", "Ivan.Ivanov")
Optional properties
Property | Type | Description |
---|---|---|
folder | string | Table’s folder |
docString | string | String documenting the table |
compressed | bool | Only relevant for the export scenario. If set to true, the data is exported in the format specified by the compressionType property. For the read path, compression is automatically detected. |
compressionType | string | Only relevant for the export scenario. The compression type of exported files. For non-Parquet files, only gzip is allowed. For Parquet files, possible values include gzip , snappy , lz4_raw , brotli , and zstd . Default is gzip . For the read path, compression type is automatically detected. |
includeHeaders | string | For delimited text formats (CSV, TSV, …), specifies whether files contain a header. Possible values are: All (all files contain a header), FirstFile (first file in a folder contains a header), None (no files contain a header). |
namePrefix | string | If set, specifies the prefix of the files. On write operations, all files will be written with this prefix. On read operations, only files with this prefix are read. |
fileExtension | string | If set, specifies the extension of the files. On write, files names will end with this suffix. On read, only files with this file extension will be read. |
encoding | string | Specifies how the text is encoded: UTF8NoBOM (default) or UTF8BOM . |
sampleUris | bool | If set, the command result provides several examples of simulated external data files URI as they’re expected by the external table definition. This option helps validate whether the Partitions and PathFormat parameters are defined properly. |
filesPreview | bool | If set, one of the command result tables contains a preview of .show external table artifacts command. Like sampleUri , the option helps validate the Partitions and PathFormat parameters of external table definition. |
validateNotEmpty | bool | If set, the connection strings are validated for having content in them. The command will fail if the specified URI location doesn’t exist, or if there are insufficient permissions to access it. |
dryRun | bool | If set, the external table definition isn’t persisted. This option is useful for validating the external table definition, especially in conjunction with the filesPreview or sampleUris parameter. |
File filtering logic
When querying an external table, performance is improved by filtering out irrelevant external storage files. The process of iterating files and deciding whether a file should be processed is as follows:
Build a URI pattern that represents a place where files are found. Initially, the URI pattern equals a connection string provided as part of the external table definition. If there are any partitions defined, they’re rendered using PathFormat, then appended to the URI pattern.
For all files found under the URI pattern(s) created, check that:
- Partition values match predicates used in a query.
- Blob name starts with
NamePrefix
, if such a property is defined. - Blob name ends with
FileExtension
, if such a property is defined.
Once all the conditions are met, the file is fetched and processed.
Examples
Non-partitioned external table
In the following non-partitioned external table, the files are expected to be placed directly under the container(s) defined:
.create external table ExternalTable (x:long, s:string)
kind=storage
dataformat=csv
(
h@'https://storageaccount.blob.core.windows.net/container1;secretKey'
)
Partitioned by date
In the following external table partitioned by date, the files are expected to be placed under directories of the default datetime format yyyy/MM/dd
:
.create external table ExternalTable (Timestamp:datetime, x:long, s:string)
kind=storage
partition by (Date:datetime = bin(Timestamp, 1d))
dataformat=csv
(
h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secretKey'
)
Partitioned by month
In the following external table partitioned by month, the directory format is year=yyyy/month=MM
:
.create external table ExternalTable (Timestamp:datetime, x:long, s:string)
kind=storage
partition by (Month:datetime = startofmonth(Timestamp))
pathformat=(datetime_pattern("'year='yyyy'/month='MM", Month))
dataformat=csv
(
h@'https://storageaccount.blob.core.windows.net/container1;secretKey'
)
Partitioned by name and date
In the following external table, the data is partitioned first by customer name and then by date, meaning that the expected directory structure is, for example, customer_name=Softworks/2019/02/01
:
.create external table ExternalTable (Timestamp:datetime, CustomerName:string)
kind=storage
partition by (CustomerNamePart:string = CustomerName, Date:datetime = startofday(Timestamp))
pathformat=("customer_name=" CustomerNamePart "/" Date)
dataformat=csv
(
h@'https://storageaccount.blob.core.windows.net/container1;secretKey'
)
Partitioned by hash and date
The following external table is partitioned first by customer name hash (modulo ten), then by date. The expected directory structure is, for example, customer_id=5/dt=20190201
, and data file names end with the .txt
extension:
.create external table ExternalTable (Timestamp:datetime, CustomerName:string)
kind=storage
partition by (CustomerId:long = hash(CustomerName, 10), Date:datetime = startofday(Timestamp))
pathformat=("customer_id=" CustomerId "/dt=" datetime_pattern("yyyyMMdd", Date))
dataformat=csv
(
h@'https://storageaccount.blob.core.windows.net/container1;secretKey'
)
with (fileExtension = ".txt")
Filter by partition columns in a query
To filter by partition columns in a query, specify original column name in query predicate:
external_table("ExternalTable")
| where Timestamp between (datetime(2020-01-01) .. datetime(2020-02-01))
| where CustomerName in ("John.Doe", "Ivan.Ivanov")
Sample Output
TableName | TableType | Folder | DocString | Properties | ConnectionStrings | Partitions | PathFormat |
---|---|---|---|---|---|---|---|
ExternalTable | Blob | ExternalTables | Docs | {“Format”:“Csv”,“Compressed”:false,“CompressionType”:null,“FileExtension”:null,“IncludeHeaders”:“None”,“Encoding”:null,“NamePrefix”:null} | [“https://storageaccount.blob.core.windows.net/container1;*******”] | [{“Mod”:10,“Name”:“CustomerId”,“ColumnName”:“CustomerName”,“Ordinal”:0},{“Function”:“StartOfDay”,“Name”:“Date”,“ColumnName”:“Timestamp”,“Ordinal”:1}] | “customer_id=” CustomerId “/dt=” datetime_pattern(“yyyyMMdd”,Date) |