Manage data preparations (original) (raw)
This document describes how to manage data preparation in BigQuery, including granting the required Identity and Access Management (IAM) roles and managing metadata inDataplex Universal Catalog.
Data preparations areBigQueryresources powered by Dataform.
Before you begin
- Ensure that you have enabled theGemini for Google Cloud API.
- To manage data preparation metadata in Dataplex Universal Catalog, ensure that the Dataplex API is enabled in your Google Cloud project.
Required roles
Users who are preparing the data and the Dataform service accounts that are running the jobs require the permissions granted by the following Identity and Access Management (IAM) roles.
Get user access for data preparation
To get the permissions that you need to prepare data in BigQuery, ask your administrator to grant you the following IAM roles:
- BigQuery Studio User (
roles/bigquery.studioUser
) on the project - Gemini for Google Cloud User (
roles/cloudaicompanion.user
) on the project - Access the source tables:BigQuery Data Viewer (
roles/bigquery.dataViewer
) on the table, dataset, or project
For more information about granting roles, see Manage access to projects, folders, and organizations.
For more information about IAM for datasets in BigQuery, see Grant access to a resource.
You might also be able to get these permissions with custom roles or other predefined roles.
Get access to manage metadata
To get the permissions you need to manage data preparation metadata in Dataplex Universal Catalog, ensure that you have the required Dataplex Universal Catalog rolesand thedataform.repositories.getpermission.
Give access to the Dataform service account
To ensure that the Dataform service account has the necessary permissions to execute data preparations in BigQuery, ask your administrator to grant the Dataform service account the following IAM roles:
- Access the source tables:BigQuery Data Viewer (
roles/bigquery.dataViewer
) on the table, dataset, or project - Access the destination tables:BigQuery Data Editor (
roles/bigquery.dataEditor
) on the table, dataset, or project
The Dataform service account might require additional permissions, depending on your data preparation pipeline. For more information, see Grant Dataform required access.
View existing data preparations
To view a list of existing data preparations, follow these steps:
- On the BigQuery page, go to the Explorer pane.
- Expand your project.
- Expand the Data preparations list.
Optimize data preparation by incrementally processing data
To configure the way your prepared data is written into a destination table, follow these steps.
- In the Google Cloud console, go to the BigQuery page.
Go to BigQuery - In the Explorer pane, select your data preparation.
- In the toolbar of your data preparation, select More > Write mode.
- Select one of the options. For more information, see Write mode.
- Click Save.
Help improve suggestions
You can help improve Gemini suggestions by sharing with Google the prompt data that you submit to features in Preview. To share your prompt data, follow these steps:
- Open the data preparation editor in BigQuery.
- In the data preparation toolbar, clicksettings More.
- Select Share data to improve Gemini in BigQuery.
Data sharing settings apply to the entire project and can only be set by a project administrator with the serviceusage.services.enable
andserviceusage.services.list
IAM permissions. For more information about data use in the Trusted Tester Program, seeGemini for Google Cloud Trusted Tester Program.
Data preparation versions
You can choose to create a data preparation either inside of or outside of a repository. Data preparation versioning is handled differently based on where the data preparation is located.
Data preparation versioning in repositories
Repositories are Git repositories that reside either in BigQuery or with a third-party provider. You can useworkspaces in repositories to perform version control on data preparations. For more information, seeUse version control with a file.
Data preparation versioning outside of repositories
BigQuery data preparations that aren't in repositories don't support viewing, comparing, or restoring data preparation versions.
For a list of data preparation versions in chronological order, follow these steps:
- On the BigQuery page, go to the Explorer pane.
- Select your data preparation.
- ClickVersion history.
Download a data preparation
To download a data preparation in a YAML file, follow these steps:
- In the Google Cloud console, go to the BigQuery page.
Go to BigQuery - In the Explorer pane, expand your project and the Data preparationsfolder. Click the name of the data preparation that you want to download.
- Click Download. The data preparation is saved in the YAML file format—for example,
NAME data preparation.dp.yaml
.
Upload a data preparation
To upload a data preparation from a YAML file, follow these steps:
- In the Google Cloud console, go to the BigQuery page.
Go to BigQuery - In the Explorer pane, expand your project.
- Go to the Data preparations folder and clickmore_vert Menu > Upload to Data preparation.
- In the Upload data preparation dialog, select a file to upload, or enter the URL of the data preparation.
- Enter a name for the data preparation.
- Select a data preparation location where resources are managed and stored.
- Click Upload.
Manage metadata in Dataplex Universal Catalog
Dataplex Universal Catalog lets you store and manage metadata for data preparations. Data preparations are available in Dataplex Universal Catalog by default, without additional configuration.
You can use Dataplex Universal Catalog to manage data preparations in all BigQuery locations. Managing data preparations in Dataplex Universal Catalog is subject to Dataplex Universal Catalog quotas and limitsand Dataplex Universal Catalog pricing.
Dataplex Universal Catalog automatically retrieves the following metadata from data preparations:
- Data asset name
- Data asset parent
- Data asset location
- Data asset type
- Corresponding Google Cloud project
Dataplex Universal Catalog logs data preparations asentries with the following entry values:
System entry group
The system entry groupfor data preparations is @dataform
. To view details of data preparation entries in Dataplex Universal Catalog, you need to view the dataform
system entry group. For instructions about how to view a list of all entries in an entry group, seeView details of an entry groupin the Dataplex Universal Catalog documentation.
System entry type
The system entry typefor data preparations is dataform-code-asset
. To view details of data preparations,you need to view the dataform-code-asset
system entry type, filter the results with an aspect-based filter, and set the type field inside dataform-code-asset aspect to DATA_PREPARATION. Then, select an entry of the selected data preparation. For instructions about how to view details of a selected entry type, seeView details of an entry typein the Dataplex Universal Catalog documentation. For instructions about how to view details of a selected entry, seeView details of an entryin the Dataplex Universal Catalog documentation.
System aspect type
The system aspect typefor data preparations is dataform-code-asset
. To provide additional context to data preparations in Dataplex Universal Catalog by annotating data preparation entries withaspects, view the dataform-code-asset
aspect type, filter the results with an aspect-based filter, and set the type field inside dataform-code-asset aspect to DATA_PREPARATION. For instructions about how to annotate entries with aspects, seeManage aspects and enrich metadatain the Dataplex Universal Catalog documentation.
Type
The type for data canvases is DATA_PREPARATION
. This type lets you filter data preparations in the dataform-code-asset
system entry type and the dataform-code-asset
aspect type by using theaspect:dataplex-types.global.dataform-code-asset.type=DATA_PREPARATION
query in an aspect-based filter.
For instructions about how to search for assets, seeSearch for data assets in Dataplex Universal Catalogin the Dataplex Universal Catalog documentation.
What's next
- Learn more about preparing data in BigQuery.
- Learn how to run data preparations manually or with a schedule.
- Learn how to create data preparations.