Loading JSON data from Cloud Storage (original) (raw)

Documentation
- Guides
- Reference
- Samples
- Resources
Technology areas
Cross-product tools
Related sites
Console
Contact Us
Start free
Discover
Product overview
Get started
Use the BigQuery sandbox
Migrate
Overview
Migrate a data warehouse
- Introduction to BigQuery Migration Service
- Migration assessment
- Migrate schema and data
- Migrate data pipelines
- Migrate SQL
  * Translate SQL queries interactively
  * Translate SQL queries using the API
  * Translate SQL queries in batch
  * Generate metadata for translation and assessment
  * Transform SQL translations with YAML
  * Map SQL object names for batch translation
Migration guides
- Teradata
  * Introduction
  * Detailed migration overview
  * Migrate Teradata schema and data
  * Migration tutorial
  * SQL translation reference
Design
Organize resources
API dependencies
Understand editions
Datasets
- Introduction
- Create datasets
- List datasets
- Cross-region replication
- Managed disaster recovery
- Migrate to managed disaster recovery
- Dataset data retention
Tables
- External tables
  * Introduction
  * External table definition file
  * Externally partitioned data
  * Use metadata caching
  * Amazon S3 BigLake external tables
  * Apache Iceberg read-only external tables
  * Azure Blob Storage BigLake tables
  * Bigtable external table
  * BigLake external tables for Cloud Storage
  * Cloud Storage object tables
  * Cloud Storage external tables
  * Delta Lake BigLake tables
  * Google Drive external tables
Routines
- Introduction
- Manage routines
- User-defined functions
- User-defined functions in Python
- User-defined aggregate functions
- Table functions
- Remote functions
- SQL stored procedures
- Stored procedures for Apache Spark
- Analyze object tables by using remote functions
- Remote functions and Translation API tutorial
Connections
- Introduction
- Amazon S3 connection
- Apache Spark connection
- Azure Blob Storage connection
- Cloud resource connection
- Spanner connection
- Cloud SQL connection
- AlloyDB connection
- SAP Datasphere connection
- Manage connections
- Configure connections with network attachments
- Default connections
Load, transform, and export
Introduction
Load data
- Introduction
- BigQuery Data Transfer Service
  * Introduction
  * Data location and transfers
  * Authorize transfers
  * Enable transfers
  * Manage transfers
  * Transfer run notifications
  * Troubleshoot transfer configurations
  * Use service accounts
  * Use third-party transfers
  * Use custom organization policies
  * Data source change log
  * Event-driven transfers
  * Transfer guides
  * Google Merchant Center
  * Introduction
  * Schedule transfers
  * Query your data
  * Transfer report schema
  * Best Sellers table
  * Local Inventories table
  * Performance table
  * Price Benchmarks table
  * Price Competitiveness table
  * Price Insights table
  * Product Inventory table
  * Product Targeting table
  * Products table
  * Regional Inventories table
  * Top Brands table
  * Top Products table
- Batch load data
  * Introduction
  * Auto-detect schemas
  * Load Avro data
  * Load Parquet data
  * Load ORC data
  * Load CSV data
  * Load JSON data
  * Load externally partitioned data
  * Load data from a Datastore export
  * Load data from a Firestore export
  * Load data using the Storage Write API
  * Load data into partitioned tables
- Load data from other Google services
- Discover and catalog Cloud Storage data
- Load data using third-party apps
- Load data using cross-cloud operations
Export data
- Introduction
- Export query results
- Export to Cloud Storage
- Export to Bigtable
- Export to Spanner
- Export to Pub/Sub
- Export as Protobuf columns
Analyze
Introduction
Search for resources
Query BigQuery data
- Run a query
- Write queries with Gemini
- Write query results
- Query data with SQL
  * Introduction
  * Arrays
  * JSON data
  * Multi-statement queries
  * Parameterized queries
  * Pipe syntax
  * Analyze data using pipe syntax
  * Recursive CTEs
  * Sketches
  * Table sampling
  * Time series
  * Transactions
  * Wildcard tables
- Access historical data
Work with queries
- Use cached results
- Troubleshoot queries
- Optimize queries
  * Introduction
  * Use the query plan explanation
  * Get query performance insights
  * Optimize query computation
  * Use history-based optimizations
  * Optimize storage for query performance
  * Use materialized views
  * Use BI Engine
  * Use nested and repeated data
  * Optimize functions
  * Use the advanced runtime
Query external data sources
AI and machine learning
Introduction
Generative AI and pretrained models
- Generative AI
  * Overview
  * Tutorials
  * Generate text
  * Generate text using public data and Gemini
  * Generate text using public data and Gemma
  * Generate text using your data
  * Handle quota errors by calling ML.GENERATE_TEXT iteratively
  * Analyze images with a Gemini vision model
  * Generate embeddings
  * Generate text embeddings using an LLM
  * Generate image embeddings using an LLM
  * Generate video embeddings using an LLM
  * Handle quota errors by calling ML.GENERATE_EMBEDDING iteratively
  * Generate and search multimodal embeddings
  * Generate text embeddings using pretrained TensorFlow models
- Task-specific solutions
  * Overview
  * Tutorials
  * Computer vision
  * Annotate images
  * Run inference on image data
  * Analyze images with an imported classification model
  * Analyze images with an imported feature vector model
Machine learning
- ML models and MLOps
  * End-to-end journey per model
  * Model creation
  * Hyperparameter tuning overview
  * Model evaluation overview
  * Model inference overview
  * Explainable AI overview
  * Model weights overview
  * ML pipelines overview
  * Model monitoring overview
  * Manage BigQueryML models in Vertex AI
- Use cases
  * Forecasting
  * Anomaly detection
  * Recommendation
  * Classification
  * Regression
  * Dimensionality reduction
  * Clustering
- Tutorials
  * Get started with BigQuery ML
  * Recommendation
  * Create recommendations based on explicit feedback with a matrix factorization model
  * Create recommendations based on implicit feedback with a matrix factorization model
  * Time series forecasting
  * Forecast a single time series with an ARIMA_PLUS univariate model
  * Forecast multiple time series with an ARIMA_PLUS univariate model
  * Forecast time series with a TimesFM univariate model
  * Scale an ARIMA_PLUS univariate model to millions of time series
  * Forecast a single time series with a multivariate model
  * Forecast multiple time series with a multivariate model
  * Use custom holidays with an ARIMA_PLUS univariate model
  * Limit forecasted values for an ARIMA_PLUS univariate model
  * Forecast hierarchical time series with an ARIMA_PLUS univariate model
Augmented analytics
- Contribution analysis
- Tutorials
  * Get data insights from contribution analysis using a summable metric
  * Get data insights from contribution analysis using a summable ratio metric
Reference patterns
Administer
Introduction
Workload management
- Introduction
- Slots
- Slot reservations
- Slots autoscaling
- Use reservations
  * Get started
  * Estimate slot capacity requirements
  * View slot recommendations and insights
  * Purchase and manage slot commitments
  * Work with slot reservations
  * Work with reservation assignments
- Manage jobs
- Use query queues
Monitor workloads
- Introduction
- Monitor resource utilization
- Monitor jobs
- Monitor sharing listings
- Monitor BI Engine
- Monitor Data Transfer Service
- Monitor materialized views
- Monitor reservations
- Monitor continuous queries
- Dashboards, charts and alerts
Govern
Introduction
Share data
- Introduction
- Manage data exchanges
- Manage listings
- Manage subscriptions
- Configure user roles
- View and subscribe to listings
- Share sensitive data with data clean rooms
- VPC Service Controls for Sharing
- Stream sharing with Pub/Sub
- Commercialize listings on Cloud Marketplace
Audit
- Introduction
- Audit policy tags
- View Data Policy audit logs
- Data Transfer Service audit logs
- Sharing audit logs
- BigQuery audit logs reference
- Migrate audit logs
- BigLake API audit logs
- BigQuery Migration API audit logs
Develop
Introduction
BigQuery code samples
Use the VS Code extension
Choose a Python library
Use ODBC and JDBC drivers
Connect your IDE to BigQuery

You can load newline-delimited JSON (ndJSON) data from Cloud Storage into a new table or partition, or append to or overwrite an existing table or partition. When your data is loaded into BigQuery, it is converted into columnar format for Capacitor(BigQuery's storage format).

When you load data from Cloud Storage into a BigQuery table, the dataset that contains the table must be in the same regional or multi- regional location as the Cloud Storage bucket.

The ndJSON format is the same format as theJSON Lines format.

Limitations

You are subject to the following limitations when you load data into BigQuery from a Cloud Storage bucket:

BigQuery does not guarantee data consistency for external data sources. Changes to the underlying data while a query is running can result in unexpected behavior.
BigQuery does not supportCloud Storage object versioning. If you include a generation number in the Cloud Storage URI, then the load job fails.

When you load JSON files into BigQuery, note the following:

JSON data must be newline delimited, or ndJSON. Each JSON object must be on a separate line in the file.
If you use gzipcompression, BigQuery cannot read the data in parallel. Loading compressed JSON data into BigQuery is slower than loading uncompressed data.
You cannot include both compressed and uncompressed files in the same load job.
The maximum size for a gzip file is 4 GB.
BigQuery supports the JSON type even if schema information is not known at the time of ingestion. A field that is declared as JSON type is loaded with the raw JSON values.
If you use the BigQuery API to load an integer outside the range of [-253+1, 253-1] (usually this means larger than 9,007,199,254,740,991), into an integer (INT64) column, pass it as a string to avoid data corruption. This issue is caused by a limitation on integer size in JSON or ECMAScript. For more information, seethe Numbers section of RFC 7159.
When you load CSV or JSON data, values in DATE columns must use the dash (-) separator and the date must be in the following format: YYYY-MM-DD (year-month-day).
When you load JSON or CSV data, values in TIMESTAMP columns must use a dash (-) or slash (/) separator for the date portion of the timestamp, and the date must be in one of the following formats: YYYY-MM-DD (year-month-day) orYYYY/MM/DD (year/month/day). The hh:mm:ss (hour-minute-second) portion of the timestamp must use a colon (:) separator.
Your files must meet the JSON file size limits described in theload jobs limits.

Before you begin

Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document, and create a dataset to store your data.

Required permissions

To load data into BigQuery, you need IAM permissions to run a load job and load data into BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need IAM permissions to access the bucket that contains your data.

Permissions to load data into BigQuery

To load data into a new BigQuery table or partition or to append or overwrite an existing table or partition, you need the following IAM permissions:

bigquery.tables.create
bigquery.tables.updateData
bigquery.tables.update
bigquery.jobs.create

Each of the following predefined IAM roles includes the permissions that you need in order to load data into a BigQuery table or partition:

roles/bigquery.dataEditor
roles/bigquery.dataOwner
roles/bigquery.admin (includes the bigquery.jobs.create permission)
bigquery.user (includes the bigquery.jobs.create permission)
bigquery.jobUser (includes the bigquery.jobs.create permission)

Additionally, if you have the bigquery.datasets.create permission, you can create and update tables using a load job in the datasets that you create.

For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Permissions to load data from Cloud Storage

To get the permissions that you need to load data from a Cloud Storage bucket, ask your administrator to grant you theStorage Admin (roles/storage.admin) IAM role on the bucket. For more information about granting roles, see Manage access to projects, folders, and organizations.

This predefined role contains the permissions required to load data from a Cloud Storage bucket. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to load data from a Cloud Storage bucket:

storage.buckets.get
storage.objects.get
storage.objects.list (required if you are using a URI [wildcard](/bigquery/docs/batch-loading-data#load-wildcards))

You might also be able to get these permissions with custom roles or other predefined roles.

Create a dataset

Create a BigQuery dataset to store your data.

JSON compression

You can use the gzip utility to compress JSON files. Note that gzip performs full file compression, unlike the file content compression performed by compression codecs for other file formats, such as Avro. Using gzip to compress your JSON files might have a performance impact; for more information about the trade-offs, seeLoading compressed and uncompressed data.

Loading JSON data into a new table

To load JSON data from Cloud Storage into a new BigQuery table:

Console

In the Google Cloud console, go to the BigQuery page.
Go to BigQuery
In the Explorer pane, expand your project, and then select a dataset.
In the Dataset info section, click Create table.
In the Create table panel, specify the following details:
1. In the Source section, select Google Cloud Storage in the Create table from list. Then, do the following:
  1. Select a file from the Cloud Storage bucket, or enter theCloud Storage URI. You cannot include multiple URIs in the Google Cloud console, but wildcards are supported. The Cloud Storage bucket must be in the same location as the dataset that contains the table you want to create, append, or overwrite.
  2. For File format, selectJSONL (Newline delimited JSON).
2. In the Destination section, specify the following details:
  1. For Dataset, select the dataset in which you want to create the table.
  2. In the Table field, enter the name of the table that you want to create.
  3. Verify that the Table type field is set to Native table.
3. In the Schema section, enter the schema definition. To enable the auto detection of a schema, select Auto detect. You can enter schema information manually by using one of the following methods:
  - Option 1: Click Edit as text and paste the schema in the form of a JSON array. When you use a JSON array, you generate the schema using the same process as creating a JSON schema file. You can view the schema of an existing table in JSON format by entering the following command:
    bq show --format=prettyjson dataset.table
  - Option 2: Click Add field and enter the table schema. Specify each field's Name,Type, and Mode.
4. Optional: Specify Partition and cluster settings. For more information, seeCreating partitioned tables andCreating and using clustered tables.
5. Click Advanced options and do the following:
  - For Write preference, leave Write if empty selected. This option creates a new table and loads your data into it.
  - For Number of errors allowed, accept the default value of 0 or enter the maximum number of rows containing errors that can be ignored. If the number of rows with errors exceeds this value, the job will result in an invalid message and fail. This option applies only to CSV and JSON files.
  - For Time zone, enter the default time zone that will apply when parsing timestamp values that have no specific time zone. Checkhere for more valid time zone names. If this value is not present, the timestamp values without specific time zone is parsed using default time zone UTC. (Preview).
  - For Date Format, enter the format elements that define how the DATE values are formatted in the input files. This field expects SQL styles format (for example, MM/DD/YYYY). If this value is present, this format is the only compatible DATE format.Schema autodetection will also decide DATE column type based on this format instead of the existing format. If this value is not present, the DATE field is parsed with thedefault formats. (Preview).
  - For Datetime Format, enter the format elements that define how the DATETIME values are formatted in the input files. This field expects SQL styles format (for example, MM/DD/YYYY HH24:MI:SS.FF3). If this value is present, this format is the only compatible DATETIME format.Schema autodetection will also decide DATETIME column type based on this format instead of the existing format. If this value is not present, the DATETIME field is parsed with thedefault formats. (Preview).
  - For Time Format, enter the format elements that define how the TIME values are formatted in the input files. This field expects SQL styles format (for example, HH24:MI:SS.FF3). If this value is present, this format is the only compatible TIME format.Schema autodetection will also decide TIME column type based on this format instead of the existing format. If this value is not present, the TIME field is parsed with thedefault formats. (Preview).
  - For Timestamp Format, enter the format elements that define how the TIMESTAMP values are formatted in the input files. This field expects SQL styles format (for example, MM/DD/YYYY HH24:MI:SS.FF3). If this value is present, this format is the only compatible TIMESTAMP format.Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format. If this value is not present, the TIMESTAMP field is parsed with thedefault formats. (Preview).
  - If you want to ignore values in a row that are not present in the table's schema, then select Unknown values.
  - For Encryption, click Customer-managed key to use aCloud Key Management Service key. If you leave the Google-managed key setting, BigQueryencrypts the data at rest.
6. Click Create table.

SQL

Use theLOAD DATA DDL statement. The following example loads a JSON file into the new table mytable:

In the Google Cloud console, go to the BigQuery page.
Go to BigQuery
In the query editor, enter the following statement:
LOAD DATA OVERWRITE mydataset.mytable
(x INT64,y STRING)
FROM FILES (
format = 'JSON',
uris = ['gs://bucket/path/file.json']);
Click Run.

For more information about how to run queries, see Run an interactive query.

bq

Use the bq load command, specify NEWLINE_DELIMITED_JSON using the--source_format flag, and include a Cloud Storage URI. You can include a single URI, a comma-separated list of URIs, or a URI containing a wildcard. Supply the schema inline, in a schema definition file, or useschema auto-detect.

(Optional) Supply the --location flag and set the value to yourlocation.

Other optional flags include:

--max_bad_records: An integer that specifies the maximum number of bad records allowed before the entire job fails. The default value is 0. At most, five errors of any type are returned regardless of the--max_bad_records value.
--ignore_unknown_values: When specified, allows and ignores extra, unrecognized values in CSV or JSON data.
--time_zone: (Preview) An optional default time zone that will apply when parsing timestamp values that have no specific time zone in CSV or JSON data.
--date_format: (Preview) An optional custom string that defines how the DATE values are formatted in CSV or JSON data.
--datetime_format: (Preview) An optional custom string that defines how the DATETIME values are formatted in CSV or JSON data.
--time_format: (Preview) An optional custom string that defines how the TIME values are formatted in CSV or JSON data.
--timestamp_format: (Preview) An optional custom string that defines how the TIMESTAMP values are formatted in CSV or JSON data.
--autodetect: When specified, enable schema auto-detection for CSV and JSON data.
--time_partitioning_type: Enables time-based partitioning on a table and sets the partition type. Possible values are HOUR, DAY, MONTH, andYEAR. This flag is optional when you create a table partitioned on a DATE, DATETIME, or TIMESTAMP column. The default partition type for time-based partitioning is DAY. You cannot change the partitioning specification on an existing table.
--time_partitioning_expiration: An integer that specifies (in seconds) when a time-based partition should be deleted. The expiration time evaluates to the partition's UTC date plus the integer value.
--time_partitioning_field: The DATE or TIMESTAMP column used to create a partitioned table. If time-based partitioning is enabled without this value, an ingestion-time partitioned table is created.
--require_partition_filter: When enabled, this option requires users to include a WHERE clause that specifies the partitions to query. Requiring a partition filter can reduce cost and improve performance. For more information, see Require a partition filter in queries.
--clustering_fields: A comma-separated list of up to four column names used to create a clustered table.
--destination_kms_key: The Cloud KMS key for encryption of the table data.
For more information on partitioned tables, see:
- Creating partitioned tables
  For more information on clustered tables, see:
- Creating and using clustered tables
  For more information on table encryption, see:
- Protecting data with Cloud KMS keys

To load JSON data into BigQuery, enter the following command:

bq --location=LOCATION load
--source_format=FORMAT
DATASET.TABLE
PATH_TO_SOURCE
SCHEMA

Replace the following:

LOCATION: your location. The --location flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value to asia-northeast1. You can set a default value for the location using the.bigqueryrc file.
FORMAT: NEWLINE_DELIMITED_JSON.
DATASET: an existing dataset.
TABLE: the name of the table into which you're loading data.
PATH_TO_SOURCE: a fully qualifiedCloud Storage URIor a comma-separated list of URIs.Wildcardsare also supported.
SCHEMA: a valid schema. The schema can be a local JSON file, or it can be typed inline as part of the command. If you use a schema file, do not give it an extension. You can also use the --autodetect flag instead of supplying a schema definition.

Examples:

The following command loads data from gs://mybucket/mydata.json into a table named mytable in mydataset. The schema is defined in a local schema file named myschema.

    bq load \
    --source_format=NEWLINE_DELIMITED_JSON \
    mydataset.mytable \
    gs://mybucket/mydata.json \
    ./myschema

The following command loads data from gs://mybucket/mydata.json into a new ingestion-time partitioned table named mytable in mydataset. The schema is defined in a local schema file named myschema.

    bq load \
    --source_format=NEWLINE_DELIMITED_JSON \
    --time_partitioning_type=DAY \
    mydataset.mytable \
    gs://mybucket/mydata.json \
    ./myschema

The following command loads data from gs://mybucket/mydata.json into a partitioned table named mytable in mydataset. The table is partitioned on the mytimestamp column. The schema is defined in a local schema file named myschema.

    bq load \
    --source_format=NEWLINE_DELIMITED_JSON \
    --time_partitioning_field mytimestamp \
    mydataset.mytable \
    gs://mybucket/mydata.json \
    ./myschema

The following command loads data from gs://mybucket/mydata.json into a table named mytable in mydataset. The schema is auto detected.

    bq load \
    --autodetect \
    --source_format=NEWLINE_DELIMITED_JSON \
    mydataset.mytable \
    gs://mybucket/mydata.json

The following command loads data from gs://mybucket/mydata.json into a table named mytable in mydataset. The schema is defined inline in the format FIELD:DATA_TYPE, FIELD:DATA_TYPE.

    bq load \
    --source_format=NEWLINE_DELIMITED_JSON \
    mydataset.mytable \
    gs://mybucket/mydata.json \
    qtr:STRING,sales:FLOAT,year:STRING

The following command loads data from multiple files in gs://mybucket/into a table named mytable in mydataset. The Cloud Storage URI uses a wildcard. The schema is auto detected.

    bq load \
    --autodetect \
    --source_format=NEWLINE_DELIMITED_JSON \
    mydataset.mytable \
    gs://mybucket/mydata*.json

The following command loads data from multiple files in gs://mybucket/into a table named mytable in mydataset. The command includes a comma- separated list of Cloud Storage URIs with wildcards. The schema is defined in a local schema file named myschema.

    bq load \
    --source_format=NEWLINE_DELIMITED_JSON \
    mydataset.mytable \
    "gs://mybucket/00/*.json","gs://mybucket/01/*.json" \
    ./myschema

API

Create a load job that points to the source data in Cloud Storage.
(Optional) Specify your location in the location property in the jobReference section of thejob resource.
The source URIs property must be fully qualified, in the formatgs://BUCKET/OBJECT. Each URI can contain one '*'wildcard character.
Specify the JSON data format by setting the sourceFormat property toNEWLINE_DELIMITED_JSON.
To check the job status, calljobs.get(JOB_ID*), replacing JOB_ID with the ID of the job returned by the initial request.
- If status.state = DONE, the job completed successfully.
- If the status.errorResult property is present, the request failed, and that object includes information describing what went wrong. When a request fails, no table is created and no data is loaded.
- If status.errorResult is absent, the job finished successfully; although, there might have been some nonfatal errors, such as problems importing a few rows. Nonfatal errors are listed in the returned job object's status.errors property.

API notes:

Load jobs are atomic and consistent; if a load job fails, none of the data is available, and if a load job succeeds, all of the data is available.
As a best practice, generate a unique ID and pass it asjobReference.jobId when calling jobs.insert to create a load job. This approach is more robust to network failure because the client can poll or retry on the known job ID.
Calling jobs.insert on a given job ID is idempotent. You can retry as many times as you like on the same job ID, and at most, one of those operations succeed.

C#

Before trying this sample, follow the C# setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery C# API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Use theBigQueryClient.CreateLoadJob()method to start a load job from Cloud Storage. To use JSONL, create aCreateLoadJobOptionsobject and set itsSourceFormatproperty toFileFormat.NewlineDelimitedJson.

Go

Before trying this sample, follow the Go setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery Go API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Java

Before trying this sample, follow the Java setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Use theLoadJobConfiguration.builder(tableId, sourceUri)method to start a load job from Cloud Storage. To use newline-delimited JSON, use the LoadJobConfiguration.setFormatOptions(FormatOptions.json()).

Node.js

Before trying this sample, follow the Node.js setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery Node.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

PHP

Before trying this sample, follow the PHP setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery PHP API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Python

Before trying this sample, follow the Python setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Use theClient.load_table_from_uri()method to start a load job from Cloud Storage. To use JSONL, set the LoadJobConfig.source_format propertyto the string NEWLINE_DELIMITED_JSON and pass the job config as thejob_config argument to the load_table_from_uri() method.

Ruby

Before trying this sample, follow the Ruby setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery Ruby API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Use theDataset.load_job()method to start a load job from Cloud Storage. To use JSONL, set the format parameter to "json".

Loading nested and repeated JSON data

BigQuery supports loading nested and repeated data from source formats that support object-based schemas, such as JSON, Avro, ORC, Parquet, Firestore, and Datastore.

One JSON object, including any nested or repeated fields, must appear on each line.

The following example shows sample nested or repeated data. This table contains information about people. It consists of the following fields:

id
first_name
last_name
dob (date of birth)
addresses (a nested and repeated field)
- addresses.status (current or previous)
- addresses.address
- addresses.city
- addresses.state
- addresses.zip
- addresses.numberOfYears (years at the address)

The JSON data file would look like the following. Notice that the address field contains an array of values (indicated by [ ]).

{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]} {"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}

The schema for this table would look like the following:

[ { "name": "id", "type": "STRING", "mode": "NULLABLE" }, { "name": "first_name", "type": "STRING", "mode": "NULLABLE" }, { "name": "last_name", "type": "STRING", "mode": "NULLABLE" }, { "name": "dob", "type": "DATE", "mode": "NULLABLE" }, { "name": "addresses", "type": "RECORD", "mode": "REPEATED", "fields": [ { "name": "status", "type": "STRING", "mode": "NULLABLE" }, { "name": "address", "type": "STRING", "mode": "NULLABLE" }, { "name": "city", "type": "STRING", "mode": "NULLABLE" }, { "name": "state", "type": "STRING", "mode": "NULLABLE" }, { "name": "zip", "type": "STRING", "mode": "NULLABLE" }, { "name": "numberOfYears", "type": "STRING", "mode": "NULLABLE" } ] } ]

For information on specifying a nested and repeated schema, seeSpecifying nested and repeated fields.

Loading semi-structured JSON data

BigQuery supports loading semi-structured data, in which a field can take values of different types. The following example shows data similar to the precedingnested and repeated JSON dataexample, except that the address field can be a STRING, a STRUCT, or an ARRAY:

{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","address":"123 First Avenue, Seattle WA 11111"}

{"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","address":{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"}}

{"id":"3","first_name":"Bob","last_name":"Doe","dob":"1982-01-10","address":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"}, "321 Main Street Hoboken NJ 44444"]}

You can load this data into BigQuery by using the following schema:

The address field is loaded into a column with typeJSON that allows it to hold the mixed types in the example. You can ingest data as JSON whether it contains mixed types or not. For example, you could specify JSON instead ofSTRING as the type for the first_name field. For more information, seeWorking with JSON data in GoogleSQL.

Appending to or overwriting a table with JSON data

You can load additional data into a table either from source files or by appending query results.

In the Google Cloud console, use the Write preference option to specify what action to take when you load data from a source file or from a query result.

You have the following options when you load additional data into a table:

Console option	bq tool flag	BigQuery API property	Description
Write if empty	Not supported	WRITE_EMPTY	Writes the data only if the table is empty.
Append to table	--noreplace or --replace=false; if--[no]replace is unspecified, the default is append	WRITE_APPEND	(Default) Appends the data to the end of the table.
Overwrite table	--replace or --replace=true	WRITE_TRUNCATE	Erases all existing data in a table before writing the new data. This action also deletes the table schema, row level security, and removes any Cloud KMS key.

If you load data into an existing table, the load job can append the data or overwrite the table.

You can append or overwrite a table by using one of the following:

The Google Cloud console
The bq command-line tool's bq load command
The jobs.insert API method and configuring a load job
The client libraries

Console

In the Google Cloud console, go to the BigQuery page.
Go to BigQuery
In the Explorer pane, expand your project, and then select a dataset.
In the Dataset info section, click Create table.
In the Create table panel, specify the following details:
1. In the Source section, select Google Cloud Storage in the Create table from list. Then, do the following:
  1. Select a file from the Cloud Storage bucket, or enter theCloud Storage URI. You cannot include multiple URIs in the Google Cloud console, but wildcards are supported. The Cloud Storage bucket must be in the same location as the dataset that contains the table you want to create, append, or overwrite.
  2. For File format, selectJSONL (Newline delimited JSON).
2. In the Destination section, specify the following details:
  1. For Dataset, select the dataset in which you want to create the table.
  2. In the Table field, enter the name of the table that you want to create.
  3. Verify that the Table type field is set to Native table.
3. In the Schema section, enter the schema definition. To enable the auto detection of a schema, select Auto detect. You can enter schema information manually by using one of the following methods:
  - Option 1: Click Edit as text and paste the schema in the form of a JSON array. When you use a JSON array, you generate the schema using the same process as creating a JSON schema file. You can view the schema of an existing table in JSON format by entering the following command:
    bq show --format=prettyjson dataset.table
  - Option 2: Click Add field and enter the table schema. Specify each field's Name,Type, and Mode.
4. Optional: Specify Partition and cluster settings. For more information, seeCreating partitioned tables andCreating and using clustered tables. You cannot convert a table to a partitioned or clustered table by appending or overwriting it. The Google Cloud console does not support appending to or overwriting partitioned or clustered tables in a load job.
5. Click Advanced options and do the following:
  - For Write preference, choose Append to table or Overwrite table.
  - For Number of errors allowed, accept the default value of 0 or enter the maximum number of rows containing errors that can be ignored. If the number of rows with errors exceeds this value, the job will result in an invalid message and fail. This option applies only to CSV and JSON files.
  - For Time zone, enter the default time zone that will apply when parsing timestamp values that have no specific time zone. Checkhere for more valid time zone names. If this value is not present, the timestamp values without specific time zone is parsed using default time zone UTC. (Preview).
  - For Date Format, enter the format elements that define how the DATE values are formatted in the input files. This field expects SQL styles format (for example, MM/DD/YYYY). If this value is present, this format is the only compatible DATE format.Schema autodetection will also decide DATE column type based on this format instead of the existing format. If this value is not present, the DATE field is parsed with thedefault formats. (Preview).
  - For Datetime Format, enter the format elements that define how the DATETIME values are formatted in the input files. This field expects SQL styles format (for example, MM/DD/YYYY HH24:MI:SS.FF3). If this value is present, this format is the only compatible DATETIME format.Schema autodetection will also decide DATETIME column type based on this format instead of the existing format. If this value is not present, the DATETIME field is parsed with thedefault formats. (Preview).
  - For Time Format, enter the format elements that define how the TIME values are formatted in the input files. This field expects SQL styles format (for example, HH24:MI:SS.FF3). If this value is present, this format is the only compatible TIME format.Schema autodetection will also decide TIME column type based on this format instead of the existing format. If this value is not present, the TIME field is parsed with thedefault formats. (Preview).
  - For Timestamp Format, enter the format elements that define how the TIMESTAMP values are formatted in the input files. This field expects SQL styles format (for example, MM/DD/YYYY HH24:MI:SS.FF3). If this value is present, this format is the only compatible TIMESTAMP format.Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format. If this value is not present, the TIMESTAMP field is parsed with thedefault formats. (Preview).
  - If you want to ignore values in a row that are not present in the table's schema, then select Unknown values.
  - For Encryption, click Customer-managed key to use aCloud Key Management Service key. If you leave the Google-managed key setting, BigQueryencrypts the data at rest.
6. Click Create table.

SQL

Use theLOAD DATA DDL statement. The following example appends a JSON file to the table mytable:

In the Google Cloud console, go to the BigQuery page.
Go to BigQuery
In the query editor, enter the following statement:
LOAD DATA INTO mydataset.mytable
FROM FILES (
format = 'JSON',
uris = ['gs://bucket/path/file.json']);
Click Run.

For more information about how to run queries, see Run an interactive query.

bq

Supply the schema inline, in a schema definition file, or useschema auto-detect.

Specify the --replace flag to overwrite the table. Use the --noreplace flag to append data to the table. If no flag is specified, the default is to append data.

It is possible to modify the table's schema when you append or overwrite it. For more information on supported schema changes during a load operation, see Modifying table schemas.

(Optional) Supply the --location flag and set the value to yourlocation.

Other optional flags include:

--max_bad_records: An integer that specifies the maximum number of bad records allowed before the entire job fails. The default value is 0. At most, five errors of any type are returned regardless of the--max_bad_records value.
--ignore_unknown_values: When specified, allows and ignores extra, unrecognized values in CSV or JSON data.
--time_zone: (Preview) An optional default time zone that will apply when parsing timestamp values that have no specific time zone in CSV or JSON data.
--date_format: (Preview) An optional custom string that defines how the DATE values are formatted in CSV or JSON data.
--datetime_format: (Preview) An optional custom string that defines how the DATETIME values are formatted in CSV or JSON data.
--time_format: (Preview) An optional custom string that defines how the TIME values are formatted in CSV or JSON data.
--timestamp_format: (Preview) An optional custom string that defines how the TIMESTAMP values are formatted in CSV or JSON data.
--autodetect: When specified, enable schema auto-detection for CSV and JSON data.
--destination_kms_key: The Cloud KMS key for encryption of the table data.

bq --location=LOCATION load
--[no]replace
--source_format=FORMAT
DATASET.TABLE
PATH_TO_SOURCE
SCHEMA

Replace the following:

LOCATION: yourlocation. The --location flag is optional. You can set a default value for the location using the.bigqueryrc file.
FORMAT: NEWLINE_DELIMITED_JSON.
DATASET: an existing dataset.
TABLE: the name of the table into which you're loading data.
PATH_TO_SOURCE: a fully qualifiedCloud Storage URIor a comma-separated list of URIs.Wildcardsare also supported.
SCHEMA: a valid schema. The schema can be a local JSON file, or it can be typed inline as part of the command. You can also use the --autodetect flag instead of supplying a schema definition.

Examples:

The following command loads data from gs://mybucket/mydata.json and overwrites a table named mytable in mydataset. The schema is defined using schema auto-detection.

    bq load \
    --autodetect \
    --replace \
    --source_format=NEWLINE_DELIMITED_JSON \
    mydataset.mytable \
    gs://mybucket/mydata.json

The following command loads data from gs://mybucket/mydata.json and appends data to a table named mytable in mydataset. The schema is defined using a JSON schema file — myschema.

    bq load \
    --noreplace \
    --source_format=NEWLINE_DELIMITED_JSON \
    mydataset.mytable \
    gs://mybucket/mydata.json \
    ./myschema

API

Create a load job that points to the source data in Cloud Storage.
(Optional) Specify your location in the location property in the jobReference section of the job resource.
The source URIs property must be fully-qualified, in the formatgs://BUCKET/OBJECT. You can include multiple URIs as a comma-separated list. Thewildcards are also supported.
Specify the data format by setting theconfiguration.load.sourceFormat property to NEWLINE_DELIMITED_JSON.
Specify the write preference by setting theconfiguration.load.writeDisposition property to WRITE_TRUNCATE orWRITE_APPEND.

Go

Before trying this sample, follow the Go setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery Go API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Java

Node.js

Before trying this sample, follow the Node.js setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery Node.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

PHP

Before trying this sample, follow the PHP setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery PHP API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Python

To replace the rows in an existing table, set theLoadJobConfig.write_disposition propertyto the string WRITE_TRUNCATE.

Before trying this sample, follow the Python setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Ruby

To replace the rows in an existing table, set the write parameter ofTable.load_job()to "WRITE_TRUNCATE".

Before trying this sample, follow the Ruby setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQuery Ruby API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Loading hive-partitioned JSON data

BigQuery supports loading hive partitioned JSON data stored on Cloud Storage and populates the hive partitioning columns as columns in the destination BigQuery managed table. For more information, seeLoading externally partitioned data.

Details of loading JSON data

This section describes how BigQuery parses various data types when loading JSON data.

Data types

Boolean. BigQuery can parse any of the following pairs for Boolean data: 1 or 0, true or false, t or f, yes or no, or y or n (all case insensitive). Schema autodetectionautomatically detects any of these except 0 and 1.

Bytes. Columns with BYTES types must be encoded as Base64.

Date. Columns with DATE types must be in the format YYYY-MM-DD.

Datetime. Columns with DATETIME types must be in the format YYYY-MM-DD HH:MM:SS[.SSSSSS].

Geography. Columns with GEOGRAPHY types must contain strings in one of the following formats:

Well-known text (WKT)
Well-known binary (WKB)
GeoJSON

If you use WKB, the value should be hex encoded.

The following list shows examples of valid data:

WKT: POINT(1 2)
GeoJSON: { "type": "Point", "coordinates": [1, 2] }
Hex encoded WKB: 0101000000feffffffffffef3f0000000000000040

Before loading GEOGRAPHY data, also readLoading geospatial data.

Interval. Columns with INTERVAL types must be inISO 8601 formatPYMDTHMS, where:

P = Designator that indicates that the value represents a duration. You must always include this.
Y = Year
M = Month
D = Day
T = Designator that denotes the time portion of the duration. You must always include this.
H = Hour
M = Minute
S = Second. Seconds can be denoted as a whole value or as a fractional value of up to six digits, at microsecond precision.

You can indicate a negative value by prepending a dash (-).

The following list shows examples of valid data:

P-10000Y0M-3660000DT-87840000H0M0S
P0Y0M0DT0H0M0.000001S
P10000Y0M3660000DT87840000H0M0S

To load INTERVAL data, you must use the bq load command and use the --schemaflag to specify a schema. You can't upload INTERVAL data by using the console.

Time. Columns with TIME types must be in the format HH:MM:SS[.SSSSSS].

Timestamp. BigQuery accepts various timestamp formats. The timestamp must include a date portion and a time portion.

The date portion can be formatted as YYYY-MM-DD or YYYY/MM/DD.
The timestamp portion must be formatted as HH:MM[:SS[.SSSSSS]] (seconds and fractions of seconds are optional).
The date and time must be separated by a space or 'T'.
Optionally, the date and time can be followed by a UTC offset or the UTC zone designator (Z). For more information, seeTime zones.

For example, any of the following are valid timestamp values:

2018-08-19 12:11
2018-08-19 12:11:35
2018-08-19 12:11:35.22
2018/08/19 12:11
2018-07-05 12:54:00 UTC
2018-08-19 07:11:35.220 -05:00
2018-08-19T12:11:35.220Z

If you provide a schema, BigQuery also accepts Unix epoch time for timestamp values. However, schema autodetection doesn't detect this case, and treats the value as a numeric or string type instead.

Examples of Unix epoch timestamp values:

1534680695
1.534680695e12

Array (repeated field). The value must be a JSON array or null. JSONnull is converted to SQL NULL. The array itself cannot contain nullvalues.

Schema auto-detection

This section describes the behavior ofschema auto-detection when loading JSON files.

JSON nested and repeated fields

BigQuery infers nested and repeated fields in JSON files. If a field value is a JSON object, then BigQuery loads the column as aRECORD type. If a field value is an array, then BigQuery loads the column as a repeated column. For an example of JSON data with nested and repeated data, seeLoading nested and repeated JSON data.

String conversion

If you enable schema auto-detection, then BigQuery converts strings into Boolean, numeric, or date/time types when possible. For example, using the following JSON data, schema auto-detection converts the id field to an INTEGER column:

{ "name":"Alice","id":"12"}
{ "name":"Bob","id":"34"}
{ "name":"Charles","id":"45"}

Encoding types

BigQuery expects JSON data to be UTF-8 encoded. If you have JSON files with other supported encoding types, you should explicitly specify the encoding by using the --encoding flag so that BigQuery converts the data to UTF-8.

BigQuery supports the following encoding types for JSON files:

UTF-8
ISO-8859-1
UTF-16BE (UTF-16 Big Endian)
UTF-16LE (UTF-16 Little Endian)
UTF-32BE (UTF-32 Big Endian)
UTF-32LE (UTF-32 Little Endian)

JSON options

To change how BigQuery parses JSON data, specify additional options in the Google Cloud console, the bq command-line tool, the API, or the client libraries.

JSON option	Console option	bq tool flag	BigQuery API property	Description
Number of bad records allowed	Number of errors allowed	--max_bad_records	maxBadRecords (Java,Python)	(Optional) The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is `0`, which requires that all records are valid.
Unknown values	Ignore unknown values	--ignore_unknown_values	ignoreUnknownValues (Java,Python)	(Optional) Indicates whether BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. The `sourceFormat` property determines what BigQuery treats as an extra value: CSV: trailing columns, JSON: named values that don't match any column names.
Encoding	None	-E or --encoding	encoding (Python)	(Optional) The character encoding of the data. The supported values are UTF-8, ISO-8859-1, UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE. The default value is UTF-8.
Time Zone	Time Zone	--time_zone	None	(Preview) (Optional) Default time zone that is applied when parsing timestamp values that have no specific time zone. Checkvalid time zone names. If this value is not present, the timestamp values without specific time zone is parsed using default time zone UTC.
Date Format	Date Format	--date_format	None	(Preview) (Optional)Format elements that define how the DATE values are formatted in the input files (for example, MM/DD/YYYY). If this value is present, this format is the only compatible DATE format.Schema autodetection will also decide DATE column type based on this format instead of the existing format. If this value is not present, the DATE field is parsed with thedefault formats.
Datetime Format	Datetime Format	--datetime_format	None	(Preview) (Optional)Format elements that define how the DATETIME values are formatted in the input files (for example, MM/DD/YYYY HH24:MI:SS.FF3). If this value is present, this format is the only compatible DATETIME format.Schema autodetection will also decide DATETIME column type based on this format instead of the existing format. If this value is not present, the DATETIME field is parsed with thedefault formats.
Time Format	Time Format	--time_format	None	(Preview) (Optional)Format elements that define how the TIME values are formatted in the input files (for example, HH24:MI:SS.FF3). If this value is present, this format is the only compatible TIME format.Schema autodetection will also decide TIME column type based on this format instead of the existing format. If this value is not present, the TIME field is parsed with thedefault formats.
Timestamp Format	Timestamp Format	--timestamp_format	None	(Preview) (Optional)Format elements that define how the TIMESTAMP values are formatted in the input files (for example, MM/DD/YYYY HH24:MI:SS.FF3). If this value is present, this format is the only compatible TIMESTAMP format.Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format. If this value is not present, the TIMESTAMP field is parsed with thedefault formats.

What's next

For information about loading JSON data from a local file, seeLoading data from local files.
For more information about creating, ingesting, and querying JSON data, seeWorking with JSON data in GoogleSQL.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-06-12 UTC.