Export table data to Cloud Storage (original) (raw)

Stay organized with collections Save and categorize content based on your preferences.

This page describes how to export or extract data from BigQuery tables to Cloud Storage.

After you've loaded your data into BigQuery, you can export the data in several formats. BigQuery can export up to 1 GB of data to a single file. If you are exporting more than 1 GB of data, you must export your data to multiple files. When you export your data to multiple files, the size of the files will vary.

You can also export the results of a query by using theEXPORT DATAstatement. You can use EXPORT DATA OPTIONSto specify the format of the exported data.

Finally, you can use a service such as Dataflowto read data from BigQuery instead of exporting it from BigLake. For more information about using Dataflow to read from and write to BigQuery, see BigQuery I/O documentation.

Export limitations

When you export data from BigQuery, note the following:

Before you begin

Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.

Required permissions

To perform the tasks in this document, you need the following permissions.

Permissions to export data from a BigQuery table

To export data from a BigQuery table, you need thebigquery.tables.export IAM permission.

Each of the following predefined IAM roles includes thebigquery.tables.export permission:

Permissions to run an export job

To run an export job, you need the bigquery.jobs.create IAM permission.

Each of the following predefined IAM roles includes the permissions that you need in order to run an export job:

Permissions to write the data to the Cloud Storage bucket

To write the data to an existing Cloud Storage bucket, you need the following IAM permissions:

Each of the following predefined IAM roles includes the permissions that you need in order to write the data to an existing Cloud Storage bucket:

For more information about IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Export formats and compression types

BigQuery supports the following data formats and compression types for exported data.

Data format Supported compression types Details
CSV GZIP You can control the CSV delimiter in your exported data by using the --field_delimiter bq command-line tool flag or theconfiguration.extract.fieldDelimiter extract job property. Nested and repeated data is not supported.
JSON GZIP Nested and repeated data are supported.
Avro DEFLATE, SNAPPY GZIP is not supported for Avro exports. Nested and repeated data are supported. SeeAvro export details.
Parquet SNAPPY, GZIP, ZSTD Nested and repeated data are supported. SeeParquet export details.

Export data

You can export table data by:

Export table data

To export data from a BigQuery table:

Console

  1. Open the BigQuery page in the Google Cloud console.
    Go to the BigQuery page
  2. In the Explorer panel, expand your project and dataset, then select the table.
  3. In the details panel, click Export and select Export to Cloud Storage.
  4. In the Export table to Google Cloud Storage dialog:
    • For Select Google Cloud Storage location, browse for the bucket, folder, or file where you want to export the data.
    • For Export format, choose the format for your exported data: CSV, JSON (Newline Delimited), Avro, or Parquet.
    • For Compression, select a compression format or select None for no compression.
    • Click Save to export the table.

To check on the progress of the job, expand the Job history pane and look for EXTRACT type job.

To export views to Cloud Storage, use EXPORT DATA OPTIONS statement.

SQL

Use theEXPORT DATA statement. The following example exports selected fields from a table named mydataset.table1:

  1. In the Google Cloud console, go to the BigQuery page.
    Go to BigQuery
  2. In the query editor, enter the following statement:
    EXPORT DATA
    OPTIONS (
    uri = 'gs://bucket/folder/*.csv',
    format = 'CSV',
    overwrite = true,
    header = true,
    field_delimiter = ';')
    AS (
    SELECT field1, field2
    FROM mydataset.table1
    ORDER BY field1
    );
  3. Click Run.

For more information about how to run queries, see Run an interactive query.

bq

Use the bq extract command with the --destination_format flag.

(Optional) Supply the --location flag and set the value to yourlocation.

Other optional flags include:

bq extract --location=location
--destination_format format
--compression compression_type
--field_delimiter delimiter
--print_header=boolean
project_id:dataset.table
gs://bucket/filename.ext

Where:

Examples:

For example, the following command exports mydataset.mytable into a gzip compressed file named myfile.csv. myfile.csv is stored in a Cloud Storage bucket named example-bucket.

bq extract
--compression GZIP
'mydataset.mytable'
gs://example-bucket/myfile.csv

The default destination format is CSV. To export into JSON or Avro, use thedestination_format flag and set it to either NEWLINE_DELIMITED_JSONor AVRO. For example:

bq extract
--destination_format NEWLINE_DELIMITED_JSON
'mydataset.mytable'
gs://example-bucket/myfile.json

The following command exports mydataset.mytable into an Avro file that is compressed using Snappy. The file is named myfile.avro. myfile.avro is exported to a Cloud Storage bucket named example-bucket.

bq extract
--destination_format AVRO
--compression SNAPPY
'mydataset.mytable'
gs://example-bucket/myfile.avro

The following command exports a single partition ofmydataset.my_partitioned_table into a CSV file in Cloud Storage:

bq extract
--destination_format CSV
'mydataset.my_partitioned_table$0'
gs://example-bucket/single_partition.csv

API

To export data, create an extract job and populate the job configuration.

(Optional) Specify your location in the location property in thejobReference section of the job resource.

  1. Create an extract job that points to the BigQuery source data and the Cloud Storage destination.
  2. Specify the source table by using the sourceTable configuration object that contains the project ID, dataset ID, and table ID.
  3. The destination URI(s) property must be fully-qualified, in the formatgs://bucket/filename.ext. Each URI can contain one '*' wildcard character and it must come after the bucket name.
  4. Specify the data format by setting theconfiguration.extract.destinationFormatproperty. For example, to export a JSON file, set this property to the value NEWLINE_DELIMITED_JSON.
  5. To check the job status, call jobs.get(job_id)with the ID of the job returned by the initial request.
    • If status.state = DONE, the job completed successfully.
    • If the status.errorResult property is present, the request failed, and that object will include information describing what went wrong.
    • If status.errorResult is absent, the job finished successfully, although there might have been some nonfatal errors. Nonfatal errors are listed in the returned job object's status.errors property.

API notes:

C#

Before trying this sample, follow the C# setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery C# API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Go

Before trying this sample, follow the Go setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery Go API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Java

Before trying this sample, follow the Java setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Node.js

Before trying this sample, follow the Node.js setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery Node.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

PHP

Before trying this sample, follow the PHP setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery PHP API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Python

Before trying this sample, follow the Python setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Ruby

Before trying this sample, follow the Ruby setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery Ruby API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Export table metadata

To export table metadata from Iceberg tables, use the following SQL statement:

EXPORT TABLE METADATA FROM [[PROJECT_NAME.]DATASET_NAME.]TABLE_NAME;

Replace the following:

The exported metadata is located in the STORAGE_URI/metadatafolder, where STORAGE_URI is the table's storage location set in the options.

Avro export details

BigQuery expresses Avro formatted data in the following ways:

Parameterized NUMERIC(P[, S]) and BIGNUMERIC(P[, S]) data types transfer their precision and scale type parameters to the Avro decimal logical type.

The Avro format can't be used in combination with GZIP compression. To compress Avro data, use the bq command-line tool or the API and specify one of the supported compression types for Avro data: DEFLATE or SNAPPY.

Parquet export details

BigQuery converts GoogleSQL data types to the following Parquet data types:

BigQuery data type Parquet primitive type Parquet logical type
Integer INT64 NONE
Numeric FIXED_LEN_BYTE_ARRAY DECIMAL (precision = 38, scale = 9)
Numeric(P[, S]) FIXED_LEN_BYTE_ARRAY DECIMAL (precision = P, scale = S)
BigNumeric FIXED_LEN_BYTE_ARRAY DECIMAL (precision = 76, scale = 38)
BigNumeric(P[, S]) FIXED_LEN_BYTE_ARRAY DECIMAL (precision = P, scale = S)
Floating point FLOAT NONE
Boolean BOOLEAN NONE
String BYTE_ARRAY STRING (UTF8)
Bytes BYTE_ARRAY NONE
Date INT32 DATE
Datetime INT64 TIMESTAMP (isAdjustedToUTC = false, unit = MICROS)
Time INT64 TIME (isAdjustedToUTC = true, unit = MICROS)
Timestamp INT64 TIMESTAMP (isAdjustedToUTC = false, unit = MICROS)
Geography BYTE_ARRAY GEOGRAPHY (edges = spherical)

The Parquet schema represents nested data as a group and repeated records as repeated groups. For more information about using nested and repeated data in BigQuery, seeSpecifying nested and repeated columns.

You can use the following workarounds for DATETIME types:

The GEOGRAPHY logical type is represented withGeoParquet metadata added to the exported file(s).

Exporting data into one or more files

The destinationUris property indicates the one or more locations and filenames where BigQuery should export your files.

BigQuery supports a single wildcard operator (*) in each URI. The wildcard can appear anywhere in the filename component. Using the wildcard operator instructs BigQuery to create multiple sharded files based on the supplied pattern. The wildcard operator is replaced with a number (starting at 0), left-padded to 12 digits. For example, a URI with a wildcard at the end of the filename would create files with000000000000appended to the first file, and 000000000001 appended to the second file, continuing in that pattern.

The following table describes several possible options for the destinationUrisproperty:

destinationUris options
Single URI Use a single URI if you are exporting table data that is 1 GB or less. This option is the most common use case, as exported data is generally less than the 1 GB maximum value. This option is not supported for the EXPORT DATA statement; you must use a single wildcard URI. Property definition: ['gs://my-bucket/file-name.json'] Creates: gs://my-bucket/file-name.json
Single wildcard URI A single wildcard can be used only in the filename component of the URI. Use a single wildcard URI if you think your exported data will be larger than the 1 GB maximum value. BigQuery shards your data into multiple files based on the provided pattern. The size of the exported files will vary. Property definition: ['gs://my-bucket/file-name-*.json'] Creates: gs://my-bucket/file-name-000000000000.json gs://my-bucket/file-name-000000000001.json gs://my-bucket/file-name-000000000002.json ... ['gs://my-bucket/*'] Creates: gs://my-bucket/000000000000 gs://my-bucket/000000000001 gs://my-bucket/000000000002 ...

Limit the exported file size

When you export more than 1 GB of data in a single export, you must use a wildcard to export the data into multiple files and the size of the files varies. If you need to limit the maximum size of each exported file, one option is to randomly partition your data and then export each partition to a file:

  1. Determine the number of partitions you need, which is equal to the total size of your data divided by the chosen exported file size. For example, if you have 8,000 MB of data and you want each exported file to be approximately 20 MB, then you need 400 partitions.
  2. Create a new table that is partitioned and clustered by a new randomly generated column called export_id. The following example shows how to create a new processed_table from an existing table called source_tablewhich requires n partitions to achieve the chosen file size:
    CREATE TABLE my_dataset.processed_table
    PARTITION BY RANGE_BUCKET(export_id, GENERATE_ARRAY(0, n, 1))
    CLUSTER BY export_id
    AS (
    SELECT , CAST(FLOOR(nRAND()) AS INT64) AS export_id
    FROM my_dataset.source_table
    );
  3. For each integer i between 0 and n-1, run an EXPORT DATA statement on the following query:
    SELECT * EXCEPT(export_id)
    FROM my_dataset.processed_table
    WHERE export_id = i;

Example use case

This example shows how you can export data to Cloud Storage.

Suppose you are streaming data to Cloud Storage from endpoint logs continuously. A daily snapshot is to be exported to Cloud Storage for backup and archival purposes. The best choice is anextract jobsubject to certainquotasandlimitations.

Submit an extract job with theAPIorclient libraries, passing in a unique ID as jobReference.jobId. Extract Jobs are asynchronous.Check the job statususing the unique job ID used to create the job. The job completed successfully if status.status is DONE. If status.errorResult is present, the job failed and needs to be retried.

Batch data processing

Suppose a nightly batch job is used to load data by a fixed deadline. After this load job completes, a table with statistics is materialized from a query as described in the preceding section. Data from this table is retrieved and compiled into a PDF report and sent to a regulator.

Since the amount of data that needs to be read is small, use thetabledata.listAPI to retrieve all rows of the table in JSON dictionary format. If there is more than one page of data, the results have the pageToken property set. To retrieve the next page of results, make another tabledata.list call and include the token value as the pageToken parameter. If the API call fails with a5xx error, retry with exponential backoff. Most 4xx errors cannot be retried. For better decoupling of BigQuery export and report generation, results should be persisted to disk.

Quota policy

For information on export job quotas, see Export jobson the Quotas and limits page.

Usage for export jobs are available in the INFORMATION_SCHEMA. The job entry in the JOBS_BY_* system tables for the export job contains atotal_bytes_processed value that can be used to monitor the aggregate usage to ensure that it stays under 50 TiB per-day. To learn how to query theINFORMATION_SCHEMA.JOBS view to get the total_bytes_processed value, seeINFORMATION_SCHEMA.JOBS schema

View current quota usage

You can view your current usage of query, load, extract, or copy jobs by running an INFORMATION_SCHEMA query to view metadata about the jobs ran over a specified time period. You can compare your current usage against the quota limit to determine your quota usage for a particular type of job. The following example query uses the INFORMATION_SCHEMA.JOBS view to list the number of query, load, extract, and copy jobs by project:

SELECT sum(case when job_type="QUERY" then 1 else 0 end) as QRY_CNT, sum(case when job_type="LOAD" then 1 else 0 end) as LOAD_CNT, sum(case when job_type="EXTRACT" then 1 else 0 end) as EXT_CNT, sum(case when job_type="COPY" then 1 else 0 end) as CPY_CNT FROM region-REGION_NAME.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE date(creation_time)= CURRENT_DATE()

You can set up a Cloud Monitoring alert policy that provides notification of the number of bytes exported.

  1. In the Google Cloud console, go to the Monitoring page.
    Go to Monitoring
  2. In the navigation pane, select Metrics explorer.
  3. In the MQL query editor, set up an alert to monitor your exported bytes per day, as seen in the following example:
fetch consumer_quota  
  | filter resource.service == 'bigquery.googleapis.com'  
  | { metric serviceruntime.googleapis.com/quota/rate/net_usage  
      | align delta_gauge(1m)  
      | group_by [resource.project_id, metric.quota_metric, resource.location],  
          sum(value.net_usage)  
    ; metric serviceruntime.googleapis.com/quota/limit  
      | filter metric.limit_name == 'ExtractBytesPerDay'  
      | group_by [resource.project_id, metric.quota_metric, resource.location],  
          sliding(1m), max(val()) }  
  | ratio  
  | every 1m  
  | condition gt(val(), 0.01 '1')  
  1. To set up your alert, click Run query.

For more information, see Alerting policies with MQL.

Troubleshooting

To diagnose issues with extract jobs, you can use the Logs Explorerto review the logs for a specific extract job and identify possible errors. The following Logs Explorer filter returns information about your extract jobs:

resource.type="bigquery_resource"
protoPayload.methodName="jobservice.insert"
(protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.query.query=~"EXPORT" OR
protoPayload.serviceData.jobCompletedEvent.eventName="extract_job_completed" OR
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.query=~"EXPORT")

Pricing

For information on data export pricing, see the BigQuery pricing page.

Once the data is exported, you are charged for storing the data in Cloud Storage. For more information, see Cloud Storage pricing.

Table security

To control access to tables in BigQuery, seeControl access to resources with IAM.

What's next