Introduction to datasets (original) (raw)

Stay organized with collections Save and categorize content based on your preferences.

This page provides an overview of datasets in BigQuery.

Datasets

A dataset is contained within a specific project. Datasets are top-level containers that are used to organize and control access to yourtables and views. A table or view must belong to a dataset, so you need to create at least one dataset beforeloading data into BigQuery. Use the format projectname.datasetname to fully qualify a dataset name when using GoogleSQL, or the format projectname:datasetname to fully qualify a dataset name when using the bq command-line tool.

Location

You specify a location for storing your BigQuery data when you create a dataset. For a list of BigQuery dataset locations, seeBigQuery locations. After you create the dataset, the location cannot be changed , but you can copy datasets to different locations, or manually move (recreate) the dataset in a different location.

BigQuery processes queries in the same location as the dataset that contains the tables you're querying. BigQuery stores your data in the selected location in accordance with the Service Specific Terms.

Data retention

Datasets use time travel in conjunction with the fail-safe periodto retain deleted and modified data for a short time, in case you need to recover it. For more information, seeData retention with time travel and fail-safe.

Storage billing models

You can be billed for BigQuery data storage in either logical or physical (compressed) bytes, or a combination of both. The storage billing model you choose determines yourstorage pricing. The storage billing model you choose doesn't impact BigQuery performance. Whichever billing model you choose, your data is stored as physical bytes.

You set the storage billing model at the dataset level. If you don't specify a storage billing model when you create a dataset, it defaults to using logical storage billing. However, you canchange a dataset's storage billing modelafter you create it. If you change a dataset's storage billing model, you must wait 14 days before you can change the storage billing model again.

When you change a dataset's billing model, it takes 24 hours for the change to take effect. Any tables or table partitions in long-term storage are not reset to active storage when you change a dataset's billing model. Query performance and query latency are not affected by changing a dataset's billing model.

Datasets use time travel andfail-safe storage for data retention. Time travel and fail-safe storage are charged separately at active storage rates when you use physical storage billing, but are included in the base rate you are charged when you use logical storage billing. You can modify the time travel window you use for a dataset in order to balance physical storage costs with data retention. You can't modify the fail-safe window. For more information about dataset data retention, seeData retention with time travel and fail-safe. For more information on forecasting your storage costs, seeForecast storage billing.

You can't enroll a dataset in physical storage billing if your organization has any existing legacyflat-rate slot commitmentslocated in the same region as the dataset. This doesn't apply to commitments purchased with a BigQuery edition.

External datasets

In addition to BigQuery datasets, you can create external datasets, which are links to external data sources:

Note that external datasets are also knowns as federated datasets and both terms are used interchangeably.

Once created, external datasets contain tables from a referenced external data source. Data from these tables aren't copied into BigQuery, but queried every time they are used. For more information, see Spanner federated queries.

Limitations

BigQuery datasets are subject to the following limitations:

Quotas

For more information on dataset quotas and limits, seeQuotas and limits.

Pricing

You are not charged for creating, updating, or deleting a dataset.

For more information on BigQuery pricing, see Pricing.

Security

To control access to datasets in BigQuery, seeControlling access to datasets. For information about data encryption, see Encryption at rest.

What's next