Introduction to continuous queries (original) (raw)
This document describes BigQuery continuous queries.
BigQuery continuous queries are SQL statements that run continuously. Continuous queries let you analyze incoming data in BigQuery in real time. You can insert the output rows produced by a continuous query into a BigQuery table or export them to Pub/Sub or Bigtable. Continuous queries can process data that has been written tostandard BigQuery tablesby using one of the following methods:
You can use continuous queries to perform time sensitive tasks, such as creating and immediately acting on insights, applying real time machine learning (ML) inference, and replicating data into other platforms. This lets you use BigQuery as an event-driven data processing engine for your application's decision logic.
The following diagram shows common continuous query workflows:
Use cases
Common use cases where you might want to use continuous queries are as follows:
- Personalized customer interaction services: use generative AI to create tailored messages customized for each customer interaction.
- Anomaly detection: build solutions that let you perform anomaly and threat detection on complex data in real time, so that you can react to issues more quickly.
- Customizable event-driven pipelines: use continuous query integration with Pub/Sub to trigger downstream applications based on incoming data.
- Data enrichment and entity extraction: use continuous queries to perform real time data enrichment and transformation by using SQL functions and ML models.
- Reverse extract-transform-load (ETL): perform real time reverse ETL into other storage systems more suited for low latency application serving. For example, analyzing or enhancing event data that is written to BigQuery, and then streaming it to Bigtable for application serving.
Supported operations
The following operations are supported in continuous queries:
- RunningINSERT statementsto write data from a continuous query into a BigQuery table.
- RunningEXPORT DATA statementsto publishcontinuous query output to Pub/Sub topics. For more information, seeExport data to Pub/Sub.
From a Pub/Sub topic, you can use the data with other services, such as performing streaming analytics by using Dataflow, or using the data in an application integration workflow. - Running
EXPORT DATA
statements to export data from BigQuery to Bigtable tables. For more information, seeExport data to Bigtable. - Calling the following generative AI function:
- ML.GENERATE_TEXT
This function requires you to have aBigQuery ML remote modelover aVertex AI model.
- ML.GENERATE_TEXT
- Calling the following AI functions:
- ML.UNDERSTAND_TEXT
- ML.TRANSLATE
These functions require you to have aBigQuery ML remote modelover a Cloud AI API.
- Normalizing numerical data by using theML.NORMALIZER function.
- Using stateless GoogleSQL functions—for example,conversion functions. In stateless functions, each row is processed independently from other rows in the table.
- Using theAPPENDSchange history function to start continuous query processing from a specific point in time.
The Google Cloud access tokensthat are used when running continuous query jobs have a time to live (TTL) of two days when they are generated by a user account. Therefore, such jobs stop running after two days. The access tokens that are generated by service accounts can run longer, but must still adhere to the maximum query runtime. For more information, seeRun a continuous query by using a service account.
Locations
Continuous queries are supported in the following locations:
Region description | Region name | Details |
---|---|---|
Americas | ||
Iowa | us-central1 | |
Montréal | northamerica-northeast1 | |
Northern Virginia | us-east4 | |
Oregon | us-west1 | |
South Carolina | us-east1 | |
US multi-region | us | |
Asia Pacific | ||
Delhi | asia-south2 | |
Hong Kong | asia-east2 | |
Jakarta | asia-southeast2 | |
Mumbai | asia-south1 | |
Seoul | asia-northeast3 | |
Singapore | asia-southeast1 | |
Sydney | australia-southeast1 | |
Taiwan | asia-east1 | |
Tokyo | asia-northeast1 | |
Europe | ||
Belgium | europe-west1 | |
EU multi-region | eu | |
Frankfurt | europe-west3 | |
London | europe-west2 | |
Netherlands | europe-west4 |
Limitations
Continuous queries are subject to the following limitations:
- BigQuery continuous queries don't maintain the state of ingested data. Common operations that rely on state, such as a
JOIN
, aggregation function, or window function, aren't supported. - You can't use the following SQL capabilities in a continuous query:
- JOIN operations
- Aggregate functions
- Approximate aggregate functions
- The followingquery clauses:
* GROUP BY
* HAVING
* ORDER BY
* LIMIT - The followingquery operators:
* PIVOT
* UNPIVOT
* TABLESAMPLE - Query set operators
- The SELECT DISTINCT statement
- EXISTS or NOT EXISTS subqueries
- Recursive CTEs
- User-defined functions
- Windowing functions
- BigQuery ML functions other than those listed inSupported operations
- Data definition language (DDL) statements
- Data manipulation language (DML) statementsexcept for
INSERT
. - Data control language (DCL) statements
EXPORT DATA
statements that don't target Bigtable or Pub/Sub- Procedural language
- Debugging statements
- Continuous queries don't support processingChange Data Capture (CDC) upsert data.
- Continuous queries don't supportwildcard tables as a data source.
- Continuous queries don't supportexternal tables as a data source.
- Continuous queries don't supportINFORMATION_SCHEMA views as a data source.
- Continuous queries don't supportBigLake tables for Apache Iceberg in BigQuery.
- Continuous queries don't support the following BigQuery security features:
- Whenexporting data to Bigtable, you can only target Bigtable instances that fall within the same Google Cloud regional boundary as the BigQuery dataset that contains the table you are querying. For more information, seeLocation considerations. This restriction doesn't apply to exporting data to Pub/Sub because Pub/Sub is a global resource.
- You can't run a continuous query from adata canvas.
- You can't modify the SQL used in a continuous query while the continuous query job is running. For more information, seeModify the SQL of a continuous query.
- If a continuous query job falls behind in processing incoming data and has an output watermark lag of more than 48 hours, then it fails. You can run the query again and use theAPPENDSchange history function to resume processing from the point in time at which you stopped the previous continuous query job. For more information, seeStart a continuous query from a particular point in time.
- A continuous query configured with a user account can run for up to two days. A continuous query configured with a service account can run for up to 150 days. When the maximum query runtime is reached, the query fails and stops processing incoming data.
- Although continuous queries are built using BigQuery reliability features, occasional temporary issues can occur. Issues might lead to some amount of automatic reprocessing of your continuous query, which could result in duplicate data in the continuous query output. Design your downstream systems to handle such scenarios.
Reservation limitations
- You must create Enterprise edition or Enterprise Plus edition reservations in order to run continuous queries. Continuous queries don't support the on-demand compute billing model.
- When you create a
CONTINUOUS
reservation assignment, the associated reservation is limited to at most 500 slots. You can request an increase to this limit by contactingbq-continuous-queries-feedback@google.com. - A continuous query reservation assignment doesn't shareidle slots, even if the reservation is configured to do so.
- You can't create a reservation assignment that uses a differentjob type in the same reservation as a continuous query reservation assignment.
- You can't configure continuous query concurrency. BigQuery automatically determines the number of continuous queries that can run concurrently, based on available reservation assignments that use the
CONTINUOUS
job type. - When running multiple continuous queries using the same reservation, individual jobs might not split available resources fairly, as defined byBigQuery fairness.
Slots autoscaling
Continuous queries can useslot autoscaling to dynamically scale allocated capacity to accommodate your workload. As your continuous queries workload increases or decreases, BigQuery dynamically adjusts your slots.
After a continuous query starts running, it actively listens for incoming data, which consumes slot resources. While a reservation with a running continuous query does not scale down to zero slots, an idle continuous query that is primarily listening for incoming data is expected to consume a minimal amount of slots, typically around 1 slot.
Pricing
Continuous queries useBigQuery capacity compute pricing, which is measured in slots. To run continuous queries, you must have areservation that uses theEnterprise or Enterprise Plus edition, and a reservation assignmentthat uses the CONTINUOUS
job type.
Usage of other BigQuery resources, such as data ingestion and storage, are charged at the rates shown inBigQuery pricing.
Usage of other services that receive continuous query results or that are called during continuous query processing are charged at the rates published for those services. For the pricing of other Google Cloud services used by continuous queries, see the following topics: