Billing settings for services (original) (raw)

This page describes billing settings assuming the use of the default Cloud Run autoscaling behavior. See Billing behavior using manual scalingfor additional considerations if you use manual scaling.

There are two billing settings in Cloud Run services:

Request-based billing (default): Cloud Run instances are only charged when they process requests, when they start, and when they shut down. See instance lifecycle for more details. This setting was previously called_CPU only allocated during request processing_.
Instance-based billing: Cloud Run instances are charged for the entire lifecycle of instances, even when there are no incoming requests. Instance-based billing can be useful for running short-lived background tasks and other asynchronous processing tasks. This setting was previously called_CPU always allocated_.

If you choose request-based billing, you are charged per request and only when the instance processes a request. If you choose instance-based billing, you are charged for the entire lifecycle of the instance. See theCloud Run pricing tables for details.

Recommender automatically looks at traffic received by your Cloud Run service over the past month, and will recommend switching from request-based billing to_instance-based billing_, if this is cheaper.

CPU allocation impact

Selecting a billing setting impacts how CPU is allocated.

With request-based billing, CPU is only allocated during request processing.
With instance-based billing, CPU is allocated for the entire container instance lifecycle.

How to choose the appropriate billing setting

Choosing the appropriate billing setting for your use case depends on several factors, such as traffic patterns, background execution, and cost, each of which is described in the following sections.

Traffic patterns considerations

Request-based billing is recommended when incoming traffic is sporadic, bursty or spiky.
Instance-based billing is recommended when incoming traffic is steady, slowly varying.

Background execution considerations

Selecting instance-based billing allocates CPU even outside of request processing, letting you execute short-lived background tasks and other asynchronous processing work after returning responses. For example:

Leveraging monitoring agents like OpenTelemetry that may assume to be able to run in the background.
Using Go's Goroutines, Node.js async, Java threads, and Kotlin coroutines.
Using application frameworks that rely on built-in scheduling/background functionalities.

Idle instances, including those kept warm using minimum instances, can be shut down at any time. If you need to finish outstanding tasks before the container is terminated, you can trap SIGTERM to give a instance 10 seconds grace time before it is stopped.

Consider using Cloud Tasks forexecuting asynchronous tasks. Cloud Tasks automatically retries failed tasks and supports running times up to 30 minutes.

Cost considerations

If you are using request-based billing, instance-based billing can be more economical if:

Your Cloud Run service is processing high number of current requests at a rather steady rate.
You don't see a lot of "idle" instances when looking at theinstance count metric.

You can use the pricing calculator to estimate cost differences.

Autoscaling considerations

Cloud Run by default autoscalesthe number of container instances.

For a service set to request-based billing, Cloud Run autoscales the number of instances based on CPU utilization only during request processing.

For a service set to instance-based billing, Cloud Run autoscales the number of instances based on CPU utilization for the entire lifecycle of the container instance, except when scaling to and from zero, where it only uses requests.

See manual scaling for additional considerations if you use manual scaling instead of the Cloud Run autoscaling feature.

Instance-based billing considerations

Even if the billing setting is set to instance-based billing, Cloud Run autoscaling is still in effect, and may terminate instances if they aren't needed to handle incoming traffic or current CPU utilization outside of requests. An instance will never stay idle for more than 15 minutes after processing a request unless it is kept active usingminimum instances.

Combining instance-based billing with a number ofminimum instances results in a number of instances up and running with full access to CPU resources, enabling background processing use cases. When using this pattern, Cloud Run appliesinstance autoscaling even if a service is using CPU outside of any requests.

If you use healthcheck probes, you must use instance-based billing for every probe. See container healthcheck probesfor billing details.

Required roles

To get the permissions that you need to configure and deploy Cloud Run services, ask your administrator to grant you the following IAM roles:

Cloud Run Developer (roles/run.developer) on the Cloud Run service
Service Account User (roles/iam.serviceAccountUser) on the service identity

If you are deploying a serviceor function from source code, you must also have additional roles granted to you on your project and Cloud Build service account.

For a list of IAM roles and permissions that are associated with Cloud Run, seeCloud Run IAM rolesand Cloud Run IAM permissions. If your Cloud Run service interfaces with Google Cloud APIs, such as Cloud Client Libraries, see theservice identity configuration guide. For more information about granting roles, seedeployment permissionsand manage access.

Set and update billing

Any configuration change leads to the creation of a new revision. Subsequent revisions will also automatically get this configuration setting unless you make explicit updates to change it.

If you select instance-based billing, you must specify at least 512MiB ofmemory.

You can change the billing setting using the Google Cloud console, the gcloud CLI, or a YAML file when youcreate a new service ordeploy a new revision:

Console

In the Google Cloud console, go to the Cloud Run Services page:
Go to Cloud Run
Click Deploy container to configure a new service. If you are configuring an existing service, click the service, then clickEdit and deploy new revision.
If you are configuring a new service, fill out the initial service settings page.
Select a billing setting under Billing. Selectrequest-based billing for your instances to be charged only during request processing. Select instance-based billing for your instances to be charged for the entire lifetime of instances.
Click Create or Deploy.

gcloud

You can update the billing setting. To set instance-based billing for a given service:

gcloud run services update SERVICE --no-cpu-throttling

Replace SERVICE with the name of your service.

To set request-based billing:

gcloud run services update SERVICE --cpu-throttling

You can also set your billing setting duringdeployment. To set your billing setting to instance-based billing:

gcloud run deploy --image IMAGE_URL --no-cpu-throttling

To set your billing setting to request-based billing:

gcloud run deploy --image IMAGE_URL --cpu-throttling

Replace IMAGE_URL with a reference to the container image, for example, us-docker.pkg.dev/cloudrun/container/hello:latest. If you use Artifact Registry, the repository REPO_NAME must already be created. The URL follows the format of LOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG.

YAML

If you are creating a new service, skip this step. If you are updating an existing service, download its YAML configuration:
gcloud run services describe SERVICE --format export > service.yaml
Update the cpu attribute:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: SERVICE
spec:
template:
metadata:
annotations:
run.googleapis.com/cpu-throttling: 'BOOLEAN'
name: REVISION
Replace the following:
- SERVICE: the name of your Cloud Run service
- BOOLEAN: true to set request-billing, or false to set instance-based billing.
- REVISION with a new revision name or delete it (if present). If you supply a new revision name, it must meet the following criteria:
  * Starts with SERVICE-
  * Contains only lowercase letters, numbers and -
  * Does not end with a -
  * Does not exceed 63 characters
Create or update the service using the following command:
gcloud run services replace service.yaml

Terraform

To learn how to apply or remove a Terraform configuration, seeBasic Terraform commands.

Add the following to agoogle_cloud_run_v2_service resource in your Terraform configuration:

To view the current billing settings for your Cloud Run service:

Console

In the Google Cloud console, go to the Cloud Run Services page:
Go to Cloud Run
Click the service to open the Service details page.
Click the Revisions tab.
In the details panel, the Billing setting is listed under the General tab.

gcloud

Run the following command to view the billing configuration:
gcloud run services describe SERVICE --format=yaml
In the YAML output, find the run.googleapis.com/cpu-throttlingsetting. A value of false indicates instance-based billing, and if this setting is missing, it indicates request-based billing.