COPY INTO (Transact-SQL) - Azure Synapse Analytics and Microsoft Fabric (original) (raw)

Applies to: Azure Synapse Analytics

This article explains how to use the COPY statement in Azure Synapse Analytics for loading data from external storage accounts. The COPY statement provides the most flexibility for high-throughput data ingestion into Azure Synapse Analytics.

Note

For Warehouse in Microsoft Fabric, see COPY INTO.

Use COPY for the following capabilities:

Note

To load complex data types from Parquet files, turn on automatic table creation by using AUTO_CREATE_TABLE.

For comprehensive examples and quickstarts using the COPY statement, see:

Syntax

COPY INTO [ schema. ] table_name
[ (Column_list) ]
FROM '<external_location>' [ , ...n ]
WITH
 (
 [ FILE_TYPE = { 'CSV' | 'PARQUET' | 'ORC' } ]
 [ , FILE_FORMAT = EXTERNAL FILE FORMAT OBJECT ]
 [ , CREDENTIAL = (AZURE CREDENTIAL) ]
 [ , ERRORFILE = ' [ http(s)://storageaccount/container ] /errorfile_directory [ / ] ] '
 [ , ERRORFILE_CREDENTIAL = (AZURE CREDENTIAL) ]
 [ , MAXERRORS = max_errors ]
 [ , COMPRESSION = { 'Gzip' | 'DefaultCodec' | 'Snappy' } ]
 [ , FIELDQUOTE = 'string_delimiter' ]
 [ , FIELDTERMINATOR =  'field_terminator' ]
 [ , ROWTERMINATOR = 'row_terminator' ]
 [ , FIRSTROW = first_row ]
 [ , DATEFORMAT = 'date_format' ]
 [ , ENCODING = { 'UTF8' | 'UTF16' } ]
 [ , IDENTITY_INSERT = { 'ON' | 'OFF' } ]
 [ , AUTO_CREATE_TABLE = { 'ON' | 'OFF' } ]
)

Arguments

schema_name

Optional if the default schema for the user performing the operation is the schema of the specified table. If you don't specify schema, and the default schema of the user performing the COPY operation is different from the schema of the specified table, the COPY operation is canceled and an error message is returned.

table_name

The name of the table to COPY data into. The target table can be a temporary or permanent table and must already exist in the database. For automatic schema detection mode, don't provide a column list.

(column_list)

An optional list of one or more columns used to map source data fields to target table columns for loading data.

Don't specify a column_list when AUTO_CREATE_TABLE = 'ON'.

column_list must be enclosed in parentheses and delimited by commas. The column list is of the following format:

[(Column_name [default Default_value] [Field_number] [,...n])]

When you don't specify a column list, COPY maps columns based on the source and target order: Input field 1 goes to target column 1, field 2 goes to column 2, and so on.

External locations

The location where the files containing the data are staged. Currently, Azure Data Lake Storage (ADLS) Gen2 and Azure Blob Storage are supported:

Note

The .blob endpoint is available for ADLS Gen2 as well and currently yields the best performance. Use the .blob endpoint when .dfs isn't required for your authentication method.

You can include wildcards in the path where:

Note

For best performance, avoid specifying wildcards that expand over a larger number of files. If possible, list multiple file locations instead of specifying wildcards.

You can specify multiple file locations only from the same storage account and container through a comma-separated list such as:

FILE_TYPE = { 'CSV' | 'PARQUET' | 'ORC' }

FILE_TYPE specifies the format of the external data.

Note

The 'Delimited Text' file type in PolyBase is replaced by the 'CSV' file format. You can configure the default comma delimiter through the FIELDTERMINATOR parameter.

FILE_FORMAT = external_file_format_name

FILE_FORMAT applies to Parquet and ORC files only. It specifies the name of the external file format object that stores the file type and compression method for the external data. To create an external file format, use CREATE EXTERNAL FILE FORMAT.

CREDENTIAL (IDENTITY = '', SECRET = '')

CREDENTIAL specifies the authentication mechanism to access the external storage account. Authentication methods are:

CSV Parquet ORC
Azure Blob Storage SAS/MSI/SERVICE PRINCIPAL/KEY/Entra SAS/KEY SAS/KEY
Azure Data Lake Gen2 SAS/MSI/SERVICE PRINCIPAL/KEY/Entra SAS (blob 1 )/MSI (dfs 2 )/SERVICE PRINCIPAL/KEY/Entra SAS (blob 1 )/MSI (dfs 2 )/SERVICE PRINCIPAL/KEY/Entra

1 The blob endpoint (.blob.core.windows.net) in your external location path is required for this authentication method.

2 The dfs endpoint (.dfs.core.windows.net) in your external location path is required for this authentication method.

Note

ERRORFILE = Directory Location

ERRORFILE only applies to CSV. It specifies the directory within the COPY statement where the rejected rows and the corresponding error file are written. You can specify the full path from the storage account or the path relative to the container. If the specified path doesn't exist, the warehouse creates one. A child directory is created with the name _rejectedrows. The _ character ensures that the directory is escaped for other data processing unless explicitly named in the location parameter.

Note

When you pass a relative path to ERRORFILE, make it relative to the container path you specify in external_location.

Within this directory, the warehouse creates a folder based on the time of load submission in the format YearMonthDay -HourMinuteSecond (for example, 20180330-173205). In this folder, the process writes two types of files: the reason (error) file and the data (row) file. Each file prepends the queryID, distributionID, and a file GUID. Because the data and the reason are in separate files, corresponding files have a matching prefix.

If ERRORFILE has the full path of the storage account defined, COPY uses ERRORFILE_CREDENTIAL to connect to that storage. Otherwise, it uses the value you specify for CREDENTIAL. When you use the same credential for the source data and ERRORFILE, restrictions that apply to ERRORFILE_CREDENTIAL also apply.

ERRORFILE_CREDENTIAL = (IDENTITY = '', SECRET = '')

ERRORFILE_CREDENTIAL only applies to CSV files. Supported data source and authentication methods are:

Note

Use the OAuth 2.0 token endpoint V1

Using a storage account key with ERRORFILE_CREDENTIAL isn't supported.

Note

If you use the same storage account for your error file and specify the ERRORFILE path relative to the root of the container, you don't need to specify the ERROR_CREDENTIAL.

MAXERRORS = max_errors

MAXERRORS specifies the maximum number of reject rows allowed in the load before the COPY operation fails. Each row that the COPY operation can't import is ignored and counted as one error. If you don't specify a value for maximum number of errors, the default is 0.

MAXERRORS can't be used with AUTO_CREATE_TABLE.

When FILE_TYPE is PARQUET, exceptions that are caused by data type conversion errors (for example, Parquet binary to SQL integer) still cause COPY INTO to fail, ignoring MAXERRORS.

COMPRESSION = { 'DefaultCodec ' | 'Snappy' | 'GZIP' | 'NONE'}

COMPRESSION is optional and specifies the data compression method for the external data.

The COPY command autodetects the compression type based on the file extension when you don't specify this parameter:

The COPY command requires that gzip files don't contain any trailing garbage to operate normally. The gzip format strictly requires that files be composed of valid members without any additional information before, between, or after them. Any deviation from this format, such as the presence of trailing non-gzip data, results in the failure of the COPY command. To ensure COPY runs successfully, make sure to verify there's no trailing garbage at the end of gzip files.

FIELDQUOTE = 'field_quote'

FIELDQUOTE applies to CSV and specifies a single character that's used as the quote character (string delimiter) in the CSV file. If you don't specify this value, the quote character (") is used as the quote character as defined in the RFC 4180 standard. Hexadecimal notation is also supported for FIELDQUOTE. Extended ASCII and multibyte characters aren't supported with UTF-8 for FIELDQUOTE.

Note

FIELDQUOTE characters are escaped in string columns where there's a presence of a double FIELDQUOTE (delimiter).

FIELDTERMINATOR = 'field_terminator'

FIELDTERMINATOR only applies to CSV. Specifies the field terminator that's used in the CSV file. You can specify the field terminator by using hexadecimal notation. The field terminator can be multicharacter. The default field terminator is a (,). Extended ASCII and multibyte characters aren't supported with UTF-8 for FIELDTERMINATOR.

ROWTERMINATOR = 'row_terminator'

ROWTERMINATOR only applies to CSV. Specifies the row terminator that's used in the CSV file. You can specify the row terminator by using hexadecimal notation. The row terminator can be multicharacter. By default, the row terminator is \r\n.

The COPY command prefixes the \r character when specifying \n (newline) resulting in \r\n. To specify only the \n character, use hexadecimal notation (0x0A). When specifying multicharacter row terminators in hexadecimal, don't specify 0x between each character.

Extended ASCII and multibyte characters aren't supported with UTF-8 for ROWTERMINATOR.

FIRSTROW = First_row_int

FIRSTROW applies to CSV and specifies the row number that's read first in all files for the COPY command. Values start from 1, which is the default value. If you set the value to 2, the first row in every file (header row) is skipped when the data is loaded. Rows are skipped based on the existence of row terminators.

DATEFORMAT = { 'mdy' | 'dmy' | 'ymd' | 'ydm' | 'myd' | 'dym' }

DATEFORMAT only applies to CSV and specifies the date format of the date mapping to SQL Server date formats. For an overview of all Transact-SQL date and time data types and functions, see Date and Time Data Types and Functions (Transact-SQL). DATEFORMAT within the COPY command takes precedence over DATEFORMAT configured at the session level.

ENCODING = 'UTF8' | 'UTF16'

ENCODING only applies to CSV. Default is UTF8. Specifies the data encoding standard for the files loaded by the COPY command.

IDENTITY_INSERT = 'ON' | 'OFF'

IDENTITY_INSERT specifies whether the identity value or values in the imported data file are to be used for the identity column. If IDENTITY_INSERT is OFF (default), the identity values for this column are verified, but not imported. Note the following behavior with the COPY command:

Azure Synapse Analytics automatically assigns unique values based on the seed and increment values specified during table creation.

AUTO_CREATE_TABLE = { 'ON' | 'OFF' }

AUTO_CREATE_TABLE specifies if the table can be automatically created by working alongside automatic schema discovery. AUTO_CREATE_TABLE is available only for Parquet files in Azure Synapse Analytics.

Note

The automatic table creation works alongside automatic schema discovery. The automatic table creation isn't enabled by default.

Permissions

The user running the COPY command must have the following permissions:

Requires INSERT and ADMINISTER BULK OPERATIONS permissions. In Azure Synapse Analytics, INSERT, and ADMINISTER DATABASE BULK OPERATIONS permissions are required.

Additionally, if the user executing the COPY command also intends to generate a new table and load data into it, they require CREATE TABLE and ALTER ON SCHEMA permissions.

For example, to allow mike@contoso.com to use COPY to create a new table in the HR schema, and insert the data from a Parquet file, use the following Transact-SQL sample:

GRANT ADMINISTER DATABASE BULK OPERATIONS to [mike@contoso.com];
GRANT INSERT to [mike@contoso.com];

GRANT CREATE TABLE to [mike@contoso.com];
GRANT ALTER on SCHEMA::HR to [mike@contoso.com];

The COPY statement accepts only UTF-8 and UTF-16 valid characters for row data and command parameters. The COPY statement might incorrectly interpret source files or parameters (such as ROWTERMINATOR or FIELDTERMINATOR) that use invalid characters and cause unexpected results such as data corruption or other failures. Make sure your source files and parameters are UTF-8 or UTF-16 compliant before you invoke the COPY statement.

The MAXDOP query hint isn't supported with COPY INTO.

To ensure reliable execution, don't change the source files and folders during the COPY INTO operation.

If the source data has greater precision than the destination column definition, the value is truncated, not rounded, for numeric, date, and time types.

Examples

A. Load from a public storage account

The following example shows the simplest form of the COPY command, which loads data from a public storage account. For this example, the COPY statement's defaults match the format of the line item CSV file.

COPY INTO dbo.[lineitem]
FROM 'https://unsecureaccount.blob.core.windows.net/customerdatasets/folder1/lineitem.csv'
WITH (FIELDTERMINATOR = '|')

The default values of the COPY command are:

Important

COPY treats \n as \r\n internally. For more information, see the ROWTERMINATOR section.

The following example loads files that use the line feed as a row terminator, such as a UNIX output. This example also uses a SAS key to authenticate to Azure Blob Storage.

COPY INTO test_1
FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/'
WITH (
    FILE_TYPE = 'CSV',
    CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='<Your_SAS_Token>'),
    --CREDENTIAL should look something like this:
    --CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='?sv=2018-03-28&ss=bfqt&srt=sco&sp=rl&st=2016-10-17T20%3A14%3A55Z&se=2021-10-18T20%3A19%3A00Z&sig=IEoOdmeYnE9%2FKiJDSHFSYsz4AkNa%2F%2BTx61FuQ%2FfKHefqoBE%3D'),
    FIELDQUOTE = '"',
    FIELDTERMINATOR=';',
    ROWTERMINATOR='0X0A',
    ENCODING = 'UTF8',
    DATEFORMAT = 'ymd',
    MAXERRORS = 10,
    ERRORFILE = '/errorsfolder',--path starting from the storage container
    IDENTITY_INSERT = 'ON'
)

C. Load with a column list with default values authenticating via Storage Account Key

This example loads files specifying a column list with default values.

--Note when specifying the column list, input field numbers start from 1
COPY INTO test_1 (Col_one default 'myStringDefault' 1, Col_two default 1 3)
FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/'
WITH (
    FILE_TYPE = 'CSV',
    CREDENTIAL=(IDENTITY= 'Storage Account Key', SECRET='<Your_Account_Key>'),
    FIELDQUOTE = '"',
    FIELDTERMINATOR=',',
    ROWTERMINATOR='0x0A',
    ENCODING = 'UTF8',
    FIRSTROW = 2
)

D. Load Parquet or ORC using existing file format object

This example uses a wildcard to load all Parquet files under a folder.

COPY INTO test_parquet
FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/*.parquet'
WITH (
    FILE_FORMAT = myFileFormat,
    CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='<Your_SAS_Token>')
)

E. Load specifying wild cards and multiple files

COPY INTO t1
FROM
'https://myaccount.blob.core.windows.net/myblobcontainer/folder0/*.txt',
    'https://myaccount.blob.core.windows.net/myblobcontainer/folder1'
WITH (
    FILE_TYPE = 'CSV',
    CREDENTIAL=(IDENTITY= '<client_id>@<OAuth_2.0_Token_EndPoint>',SECRET='<key>'),
    FIELDTERMINATOR = '|'
)

F. Load using MSI credentials

COPY INTO dbo.myCOPYDemoTable
FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder0/*.txt'
WITH (
    FILE_TYPE = 'CSV',
    CREDENTIAL = (IDENTITY = 'Managed Identity'),
    FIELDQUOTE = '"',
    FIELDTERMINATOR=','
)

G. Load using automatic schema detection

COPY INTO [myCOPYDemoTable]
FROM 'https://myaccount.blob.core.windows.net/customerdatasets/folder1/lineitem.parquet'
WITH (
    FILE_TYPE = 'Parquet',
    CREDENTIAL = ( IDENTITY = 'Shared Access Signature',  SECRET='<key>'),
    AUTO_CREATE_TABLE = 'ON'
)

FAQ

How does the performance of the COPY command compare to PolyBase?

The performance of the COPY command can be better depending on your workload.

What is the file splitting guidance for the COPY command loading compressed CSV files?

The following table outlines the number of files you should use. When you reach the recommended number of files, you get better performance with larger files. The number of files is determined by the number of compute nodes multiplied by 60. For example, at 6000 DWU, you have 12 compute nodes, so you have 12 * 60 = 720 partitions. For a simple file splitting experience, see How to maximize COPY load throughput with file splits.

DWU Number of files
100 60
200 60
300 60
400 60
500 60
1,000 120
1,500 180
2,000 240
2,500 300
3,000 360
5,000 600
6,000 720
7,500 900
10,000 1200
15,000 1800
30,000 3600

What is the file splitting guidance for the COPY command loading Parquet or ORC files?

You don't need to split Parquet and ORC files because the COPY command automatically splits files. For best performance, Parquet and ORC files in the Azure storage account should be 256 MB or larger.

Are there any limitations on the number or size of files?

There are no limitations on the number or size of files. However, for best performance, use files that are at least 4 MB. Also, limit the count of source files to a maximum of 5,000 files for better performance.

Are there any known issues with the COPY statement?

If you have an Azure Synapse workspace that you created before December 7, 2020, you might run into a similar error message when authenticating by using Managed Identity: com.microsoft.sqlserver.jdbc.SQLServerException: Managed Service Identity isn't enabled on this server. Please enable Managed Service Identity and try again.

To work around this problem, re-register the workspace's managed identity:

  1. Install Azure PowerShell. See Install PowerShell.
  2. Register your workspace's managed identity by using PowerShell:
Connect-AzAccount  
Select-AzSubscription -SubscriptionId <subscriptionId>  
Set-AzSqlServer -ResourceGroupName your-database-server-resourceGroup -ServerName your-SQL-servername -AssignIdentity  

Applies to: Warehouse in Microsoft Fabric

This article explains how to use the COPY statement in Warehouse in Microsoft Fabric for loading from external storage accounts. The COPY statement provides the most flexibility for high-throughput data ingestion into your Warehouse in Microsoft Fabric, and is as strategy to Ingest data into your Warehouse in Microsoft Fabric.

In Fabric Data Warehouse, the COPY statement currently supports CSV, JSONL, and PARQUET file formats. For data sources, Azure Data Lake Storage Gen2 accounts and OneLake sources are supported.

For more information on using COPY INTO on your Warehouse in Microsoft Fabric, see Ingest data into your Warehouse in Microsoft Fabric using the COPY statement.

By default, COPY INTO authenticates as the executing Microsoft Entra ID user.

Use COPY for the following capabilities:

Syntax

COPY INTO [ warehouse_name. ] [ schema_name. ] table_name
[ (Column_list) ]
FROM '<external_location>' [ , ...n ]
WITH
 (
 [ FILE_TYPE = { 'CSV' | 'JSONL' | 'PARQUET' } ]
 [ , CREDENTIAL = (AZURE CREDENTIAL) ]
 [ , ERRORFILE = ' [ http(s)://storageaccount/container ] /errorfile_directory [ / ] ] '
 [ , ERRORFILE_CREDENTIAL = (AZURE CREDENTIAL) ]
 [ , MAXERRORS = max_errors ]
 [ , COMPRESSION = { 'Gzip' | 'Snappy' } ]
 [ , FIELDQUOTE = 'string_delimiter' ]
 [ , FIELDTERMINATOR =  'field_terminator' ]
 [ , ROWTERMINATOR = 'row_terminator' ]
 [ , FIRSTROW = first_row ]
 [ , DATEFORMAT = 'date_format' ]
 [ , ENCODING = { 'UTF8' | 'UTF16' } ]
 [ , PARSER_VERSION = { '1.0' | '2.0' } ]
 [ , MATCH_COLUMN_COUNT = { 'ON' | 'OFF' } ]
)

Arguments

warehouse_name

Optional if the current warehouse for the user performing the operation is the warehouse of the specified table. If you don't specify warehouse, and the specified schema and table don't exist on the current warehouse, COPY fails, and an error message is returned.

schema_name

Optional if the default schema for the user performing the operation is the schema of the specified table. If you don't specify schema, and the default schema of the user performing the COPY operation is different from the schema of the specified table, COPY is canceled, and an error message is returned.

table_name

The name of the table to COPY data into. The target table must already exist in the warehouse.

(column_list)

An optional list of columns used to map source data fields to target table columns during data load.

column_list must be enclosed in parentheses and delimited by commas. The syntax is:

[(Column_name [default Default_value] [Field_number | JSON_path] [,...n])]

When you don't specify column_list, COPY maps columns based on the source and target order: Input field 1 goes to target column 1, field 2 goes to column 2, and so on.

Note

When working with Parquet files on Warehouse in Microsoft Fabric, column names must match exactly in the source and destination. If the name of the column in target table is different than that of the column name in the parquet file, the target table column is filled with NULL.

When you don't specify a column list, COPY maps columns based on the source and target order: Input field 1 goes to target column 1, field 2 goes to column 2, and so on.

External location

Specifies where the files containing the data are staged. Currently Azure Data Lake Storage (ADLS) Gen2, Azure Blob Storage, and OneLake are supported:

Azure Data Lake Storage (ADLS) Gen2 offers better performance than Azure Blob Storage (legacy). Consider using an ADLS Gen2 account whenever possible.

Note

The .blob endpoint is available for ADLS Gen2 as well and currently yields the best performance. Use the blob endpoint when dfs isn't required for your authentication method.

Wildcards can be included in the path where

Note

For best performance, avoid specifying wildcards that expand over a larger number of files. If possible, list multiple file locations instead of specifying wildcards.

You can specify multiple file locations only from the same storage account and container through a comma-separated list such as:

External locations behind firewall

To access files on Azure Data Lake Storage (ADLS) Gen2 and Azure Blob Storage locations that are behind a firewall, the following prerequisites apply:

FILE_TYPE = { 'CSV' | 'JSONL' | 'PARQUET' }

FILE_TYPE specifies the format of the external data.

CREDENTIAL (IDENTITY = '', SECRET = '')

CREDENTIAL specifies the authentication mechanism to access the external storage account.

In Fabric Data Warehouse:

The user's Microsoft Entra ID authentication is default. No credential needs to be specified.

ERRORFILE = Directory Location

ERRORFILE applies to CSV and JSONL. Specifies the directory where the rejected rows and the corresponding error file should be written. You can specify the full path from the storage account or the path relative to the container. If the specified path doesn't exist, the system creates one on your behalf. A child directory is created with the name _rejectedrows. The _ character ensures that the directory is escaped for other data processing unless explicitly named in the location parameter.

Note

When you pass a relative path to ERRORFILE, make it relative to the container path you specify in external_location.

Within this directory, the warehouse creates a folder based on the time of load submission in the format YearMonthDay -HourMinuteSecond (for example, 20180330-173205). In this folder, the warehouse creates a folder with the statement ID, and under that folder, two types of files are written: an error.Json file containing the reject reasons, and a row.csv file containing the rejected rows.

If ERRORFILE has the full path of the storage account defined, then the ERRORFILE_CREDENTIAL is used to connect to that storage. Otherwise, the value mentioned for CREDENTIAL is used. When the same credential that is used for the source data is used for ERRORFILE, restrictions that apply to ERRORFILE_CREDENTIAL also apply.

When using a firewall protected Azure Storage Account, the error file is created in the same container specified in the storage account path. When considering using the ERRORFILE option in this scenario, it is also required to specify the MAXERROR parameter. If ERRORFILE has the full path of the storage account defined, then the ERRORFILE_CREDENTIAL is used to connect to that storage. Otherwise, the value mentioned for CREDENTIAL is used.

ERRORFILE_CREDENTIAL = (IDENTITY = '', SECRET = '')

ERRORFILE_CREDENTIAL applies to CSV and JSONL files. On Warehouse in Microsoft Fabric, the only supported authentication mechanism is Shared Access Signature (SAS).

Note

If you use the same storage account for your error file and specify the ERRORFILE path relative to the root of the container, you don't need to specify the ERROR_CREDENTIAL.

MAXERRORS = max_errors

MAXERRORS applies to CSV and JSONL. Specifies the maximum number of reject rows allowed in the load before the COPY operation fails. Each row that the COPY operation can't import is ignored and counted as one error. If you don't specify a maximum number of errors, the default is 0.

In Fabric Data Warehouse, you can't use MAXERRORS when FILE_TYPE is PARQUET.

COMPRESSION = { 'Snappy' | 'GZIP' | 'NONE'}

COMPRESSION is optional and specifies the data compression method for the external data.

The COPY command autodetects the compression type based on the file extension when this parameter isn't specified:

Loading compressed files is currently only supported with parser version 1.0.

The COPY command requires that gzip files don't contain any trailing garbage to operate normally. The gzip format strictly requires that files be composed of valid members without any additional information before, between, or after them. Any deviation from this format, such as the presence of trailing non-gzip data, results in the failure of the COPY command. To ensure COPY runs successfully, make sure to verify there's no trailing garbage at the end of gzip files.

FIELDQUOTE = 'field_quote'

FIELDQUOTE only applies to CSV. Specifies a single character that is used as the quote character (string delimiter) in the CSV file. If you don't specify FIELDQUOTE, the quote character (") is used as the quote character as defined in the RFC 4180 standard. Hexadecimal notation is also supported for FIELDQUOTE. Extended ASCII and multibyte characters aren't supported with UTF-8 for FIELDQUOTE.

Note

FIELDQUOTE characters are escaped in string columns where there's a presence of a double FIELDQUOTE (delimiter).

FIELDTERMINATOR = 'field_terminator'

FIELDTERMINATOR only applies to CSV. Specifies the field terminator that is used in the CSV file. You can also specify the field terminator using hexadecimal notation. The field terminator can be multicharacter. The default field terminator is a (,). Extended ASCII and multibyte characters aren't supported with UTF-8 for FIELDTERMINATOR.

ROWTERMINATOR = 'row_terminator'

ROWTERMINATOR only applies to CSV. Specifies the row terminator that is used in the CSV file. You can specify the row terminator using hexadecimal notation. The row terminator can be multicharacter. The default terminators are \r\n, \n, and \r.

The COPY command prefixes the \r character when specifying \n (newline) resulting in \r\n. To specify only the \n character, use hexadecimal notation (0x0A). When specifying multicharacter row terminators in hexadecimal, don't specify 0x between each character.

Extended ASCII and multibyte characters aren't supported with UTF-8 for ROWTERMINATOR.

FIRSTROW = First_row_int

FIRSTROW only applies to CSV. Specifies the row number that is read first in all files for the COPY command. Values start from 1, which is the default value. If you set the value to 2, the first row in every file (header row) is skipped when the data is loaded. Rows are skipped based on the existence of row terminators.

DATEFORMAT = { 'mdy' | 'dmy' | 'ymd' | 'ydm' | 'myd' | 'dym' }

DATEFORMAT applies to CSV and JSONL. Specifies the date format of the date mapping to SQL Server date formats. For an overview of all Transact-SQL date and time data types and functions, see Date and Time Data Types and Functions (Transact-SQL). DATEFORMAT within the COPY command takes precedence over DATEFORMAT configured at the session level.

ENCODING = 'UTF8' | 'UTF16'

ENCODING applies to CSV and JSONL. Default is UTF8. Specifies the data encoding standard for the files loaded by the COPY command.

PARSER_VERSION = { '1.0' | '2.0' }

PARSER_VERSION only applies to CSV files. The default value is 2.0. PARSER_VERSION specifies the file parser used for ingestion when the source file type is CSV. The 2.0 parser offers improved performance for ingestion of CSV files.

Parser version 2.0 has the following limitations:

When you use parser version 1.0 with UTF-8 files, multibyte and multicharacter terminators aren't supported for FIELDTERMINATOR.

Parser version 1.0 is available for backward compatibility only. Use it only when you encounter these limitations.

Note

When you use COPY INTO with compressed CSV files or files with UTF-16 encoding, COPY INTO automatically switches to PARSER_VERSION 1.0, without user action required. For multicharacter terminators on FIELDTERMINATOR or ROWTERMINATOR, the COPY INTO statement fails. Use PARSER_VERSION = '1.0' if you need multicharacter separators.

MATCH_COLUMN_COUNT = { 'ON' | 'OFF' }

MATCH_COLUMN_COUNT only applies to CSV files. The default value is OFF. It specifies if the COPY command should check if the column count rows in source files match the column count of the destination table. The following behavior applies:

Note

MATCH_COLUMN_COUNT works independently from MAXERRORS. A column count mismatch causes COPY INTO to fail regardless of MAXERRORS.

Use COPY INTO with OneLake

Use COPY INTO to load data directly from files stored in the Fabric OneLake, under existing items. This method eliminates the need for external staging accounts, such as ADLS Gen2 or Blob Storage, and enables workspace-governed, SaaS-native ingestion by using Fabric permissions. This functionality supports:

Example:

COPY INTO t1
FROM 'https://onelake.dfs.fabric.microsoft.com/<workspaceId>/<lakehouseId>/Files/*.csv'
WITH (
    FILE_TYPE = 'CSV',
    FIRSTROW = 2
);

Permissions

Control plane permissions

To execute the COPY INTO command, you must be granted membership to a workspace role through Manage access in the Workspace, with at least the Viewer role. Alternatively, you can share warehouse access with a user via Item Permissions in the Fabric portal, with at least Read permissions. To align with the principle of least privilege, Read permission is sufficient.

Data plane permissions

After you grant control plane permissions through workspace roles or item permissions, if the user only has Read permissions at the data plane level, also grant the user INSERT and ADMINISTER DATABASE BULK OPERATIONS permissions by using T-SQL commands.

For example, the following T-SQL script grants these permissions to an individual user by using their Microsoft Entra ID.

GRANT ADMINISTER DATABASE BULK OPERATIONS to [mike@contoso.com];
GO

GRANT INSERT to [mike@contoso.com];
GO

When you use the error file option, the user must have the minimal permission of Blob Storage Contributor on the Storage Account container.

When you use OneLake as the source, the user must have Contributor or higher permissions on both the source workspace (where the Lakehouse is located) and the target workspace (where the Warehouse resides). Microsoft Entra ID and Fabric workspace roles govern all access.

The COPY statement accepts only UTF-8 and UTF-16 valid characters for row data and command parameters. If you use source files or parameters (such as ROWTERMINATOR or FIELDTERMINATOR) that contain invalid characters, the COPY statement might interpret them incorrectly and cause unexpected results, such as data corruption or other failures. Before you invoke the COPY statement, make sure your source files and parameters are UTF-8 or UTF-16 compliant.

The COPY INTO statement has restrictions on the size of individual varchar(max) and varbinary(max) columns, as well as on the total row size that you can ingest.

To ensure reliable execution, don't change the source files and folders during the COPY INTO operation.

If the source data has greater precision than the destination column definition, the value is truncated, not rounded, for numeric, date, and time types.

Limitations for OneLake as source

Examples

For more information on using COPY INTO on your Warehouse in Microsoft Fabric, see Ingest data into your Warehouse in Microsoft Fabric using the COPY statement.

A. Load from a public storage account

The following example shows the simplest form of the COPY command, which loads data from a public storage account. For this example, the COPY statement's defaults match the format of the line item CSV file.

COPY INTO dbo.[lineitem]
FROM 'https://unsecureaccount.blob.core.windows.net/customerdatasets/folder1/lineitem.csv'

The default values of the COPY command are:

The following example loads files that use the line feed as a row terminator, such as a UNIX output. This example also uses a SAS key to authenticate to Azure Blob Storage.

COPY INTO test_1
FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/'
WITH (
    FILE_TYPE = 'CSV',
    CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='<Your_SAS_Token>'),
    FIELDQUOTE = '"',
    FIELDTERMINATOR = ';',
    ROWTERMINATOR = '0X0A',
    ENCODING = 'UTF8',
    MAXERRORS = 10,
    ERRORFILE = '/errorsfolder'--path starting from the storage container
)

C. Load with a column list with default values authenticating via Storage Account Key (SAK)

This example loads files specifying a column list with default values.

--Note when specifying the column list, input field numbers start from 1
COPY INTO test_1 (Col_one default 'myStringDefault' 1, Col_two default 1 3)
FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/'
WITH (
    FILE_TYPE = 'CSV',
    CREDENTIAL=(IDENTITY= 'Storage Account Key', SECRET='<Your_account_key>'),
    FIELDQUOTE = '"',
    FIELDTERMINATOR=',',
    ROWTERMINATOR='0x0A',
    ENCODING = 'UTF8',
    FIRSTROW = 2
)

D. Load Parquet

This example uses a wildcard to load all Parquet files under a folder by using the executing user's Entra ID.

COPY INTO test_parquet
FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/*.parquet'
WITH (
    FILE_TYPE = 'PARQUET'
)

E. Load JSONL

This example uses a wildcard to load all JSONL files under a folder by using the executing user's Entra ID.

COPY INTO test_jsonl
FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/*.jsonl'
WITH (
    FILE_TYPE = 'JSONL'
)

F. Map column names to field paths in JSONL documents

The following example shows a JSON Lines (JSONL) file, where each line represents a single JSON object:

{"CountryKey": 0, "CountryName": "ALGERIA", "RegionID": 0, "Population": 34800000}
{"CountryKey": 1, "CountryName": "ARGENTINA", "RegionID": 1, "Population": 46044703}
{"CountryKey": 2, "CountryName": "BRAZIL", "RegionID": 1, "Population": 203080756}

In COPY INTO, you can map table columns to specific JSON fields by using JSON path expressions. This mapping lets you ingest only the required fields from the source data.

COPY INTO Countries (
  CountryID '$.CountryKey', 
  CountryName '$.CountryName', 
  RegionID '$.RegionKey', 
  Population '$.Population'
)
FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/countries.jsonl'
WITH (   
    FILE_TYPE = 'JSONL' 
)

G. Load data by specifying wildcards and multiple files

COPY INTO t1
FROM
'https://myaccount.blob.core.windows.net/myblobcontainer/folder0/*.txt',
    'https://myaccount.blob.core.windows.net/myblobcontainer/folder1'
WITH (
    FILE_TYPE = 'CSV',
    CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='<Your_SAS_Token>')
    FIELDTERMINATOR = '|'
)

H. Load data from OneLake

COPY INTO t1
FROM 'https://onelake.dfs.fabric.microsoft.com/<workspaceId>/<lakehouseId>/Files/*.csv'
WITH (
    FILE_TYPE = 'CSV',
    FIRSTROW = 2
);