Compute Engine instances provisioning models (original) (raw)

Linux Windows

When you create a Compute Engine instance, you must define the method, called_provisioning model_, that you want to use to obtain your requested resources. Each provisioning model determines the availability, lifespan, and pricing of your compute instances.

This document explains the different provisioning models that you can specify when you create compute instances. By understanding these models, you can choose the best option for your workload.

Available provisioning models

When you create a compute instance, you can specify one of the following provisioning models. If you don't specify a provisioning model, then Compute Engine uses the standard provisioning model by default.

Standard
Spot
Flex-start
Reservation-bound

The following table helps you compare the use cases and pricing for each provisioning model:

Standard	Spot	Flex-start	Reservation-bound
Summary	Based on resource availability, you can immediately create compute instances. You can control when to stop or delete compute instances.	Based on resource availability, you can immediately create compute instances. You can control when to stop or delete compute instances. However, you also allow Compute Engine to stop or delete compute instances at any time to reclaim capacity.	Based on resource availability, you can create compute instances within a specified waiting time. For a standalone Flex-start VM, you can specify a waiting time of up to two hours. For a MIG, Compute Engine keeps trying to create compute instances until resources become available or you cancel your creation request. You can control when to stop or delete compute instances. However, you can't suspend or recreate them. Compute instances run for a minimum of 10 minutes and up to a maximum of seven days. When the compute instances reach the end of their run duration, Compute Engine stops or deletes them based on their termination action.	You can request to reserve capacity at a future date for creating compute instances. If Google Cloud approves your request, then Compute Engine creates a reservation. At the start of the reservation period, you can consume the reservation by creating compute instances that match the reservation. During the approved reservation period, you can stop, restart, delete, and recreate compute instances to consume the reservation as needed. When the reservation period ends, Compute Engine deletes the reservation, and stops or deletes any compute instances that consume the reservation based on their termination action.
Use cases	Ideal for workloads that require stability and continuous operation, such as the following workloads: Web servers Databases Enterprise applications Development and testing	Ideal for workloads that can tolerate interruptions, such as the following workloads: Batch processing High performance computing (HPC) Continuous integration and continuous deployment (CI/CD) Data analytics Media encoding Online inference	Workloads that require stability and need to run for no more than seven days, such as the following workloads: Small model pre-training Model fine-tuning HPC simulation Batch inference	Ideal for workloads that require stability and a specific run time, such as the following: For workloads that last up to 90 days: Model pre-training jobs Model fine-tuning jobs HPC simulation workloads Short-term expected increases in inference workloads For workloads longer than 90 days: Training workloads Inference workloads
Resource allocation	Best-effort. Compute Engine physically places resources close to each other on a best-effort basis. To control placement, you can optionally useplacement policies.	Best-effort. Compute Engine physically places resources close to each other on a best-effort basis. To control placement, you can optionally useplacement policies.	Dense on a best-effort basis. Compute Engine makes best-effort attempts to densely place resources close together. To control placement for your Flex-start VMs, you can optionally use one of the following: For standalone Flex-start VMs: compact placement policies For Flex-start VMs in a MIG:workload policies	Dense. Compute Engine physically places resources on tightly coupled hosts connected by a high-speed network fabric to minimize network latency.
Pricing	You incur standard pricing for compute instances. SeeVM instance pricing. You incur charges based on the method that you use to create compute instances: If you immediately create compute instances, then you pay as you go (PAYG). If you create compute instances by using an on-demand reservation or an auto-created reservation for a future reservation, then you're charged for as long as the reservation exists. For more information, see reservations billing.	You get discounts up to 91% off for many machine types, GPUs, TPUs, and Local SSD disks. For more information, see thepricing page. You PAYG.	Based on the machine series that your compute instances use, you get a discount as follows: For A4, A3, and A2 machine series, you get a 53% discount for vCPUs, memory, and GPUs. For H4D machine series, you get a 25% discount for vCPUs and memory. Other supported machine series aren't eligible for discounts. For more information, seeDynamic Workload Scheduler (DWS) pricing. You PAYG.	You incur charges based on how you reserve capacity for creating compute instances as follows: If you reserve capacity in AI Hypercomputer, then you incur charges based on accelerator-optimized VMs pricing. If you reserve resources for a year or longer, then you must purchase and attach a resource-based commitment to your reserved resources. If you reserve capacity by using future reservations in calendar mode, then you incur charges based on the Dynamic Workload Scheduler (DWS) pricing. You're charged for the reservation period. For more information, see reservations billing.
Quota	When you create a compute instance, standard quota is consumed.	When you create a Spot VM,preemptible quota is consumed. If your project lacks preemptible quota, then standard quota is consumed. Google Cloud Free Tier credits don't apply to Spot VMs.	When you create a Flex-start VM,preemptible quota is consumed. If your project lacks preemptible quota, then standard quota is consumed.	Quota consumption varies based on the method that you use to reserve resources: Future reservations with AI Hypercomputer or HPC clusters: Google manages the quota for your reserved resources. You don't need to manually request quota. At the start time of your approved future reservation, Google automatically increases quota if your project lacks it. Future reservations in calendar mode: to reserve H4D machine types, CPU quota is consumed. To reserve GPU or TPU machine types, quota isn't required.

Compute instance availability and lifespan

The following table shows compute instance availability and lifespan for each provisioning model:

Standard	Spot	Flex-start	Reservation-bound
Creation prerequisites	No creation prerequisites.	No creation prerequisites.	No creation prerequisites.	To create compute instances, you must first reserve capacity using one of the following methods: To reserve capacity for long-running workloads, usefuture reservations with AI Hypercomputer orfuture reservations with HPC clusters. To reserve capacity for workloads that run for up to 90 days, use future reservations in calendar mode. At your chosen delivery date and time, Compute Engine provisions your requested capacity. Then, you can consume the capacity by creating compute instances.
Supported machine series	You can use any machine series, except A4X Max, A4X, A4, and A3 Ultra.	You can use any machine series, except A4X instances and any bare metal instances (machine types with a name that contains-metal). Spot for TPU7x is restricted by an allowlist. See thisnote.	You can only use the following machine series: A4, A3, A2, G4, and G2 machine series TPU7x1, TPU v6e, and TPU v5p N1 virtual machine (VM) instances with GPUs attached H4D machine series	Based on how you reserve capacity to create VMs, you can only use the following machine series: If youreserve capacity in AI Hypercomputer, then you can only use A4X Max, A4X, A4, A3 Ultra, A3 Mega with 8 GPUs, A3 High with 8 GPUs, and A3 Edge. If you create a future reservation in calendar mode, then you can only use the following series: GPUs: A4, A3 Ultra, A3 Mega with 8 GPUs, A3 High with 8 GPUs, and H4D machine series. TPUs: TPU7x*, v6e, v5p
Compute instance availability	You can create compute instances at any time, as long as your requested resources are available. To help reduce your chances of encountering resource availability errors, you canview the availability of Spot VMs before you create them.	You can create compute instances at any time, as long as your requested resources are available.	You can create compute instances as follows: Create a standalone compute instance. Create a MIG that individually creates compute instances based on availability. Create a MIG that creates compute instances all at once. Compute Engine uses DWS to schedule the provisioning of your requested capacity based on resource availability. DWS helps you obtain high-demand resources like GPUs.	You can only create compute instances after reserving capacity for a future date. On your requested date, Compute Engine delivers your requested capacity, which you can then use to create compute instances. If you reserve resources using future reservations in calendar mode, then Compute Engine uses DWS to provision your requested capacity. DWS helps you obtain high-demand resources like GPUs.
Capacity assurance	Based on the creation method. Capacity assurance varies based on the method that you use to create compute instances as follows: If you immediately create compute instances, then Compute Engine makes best-effort attempts to provision your requested capacity. If you create compute instances by consuming an on-demand reservation or an auto-created reservation for a future reservation, then you have very high assurance that Compute Engine provisions your requested capacity if the reservation has reserved capacity available.	Best-effort. When you create Spot VMs, Compute Engine makes best-effort attempts to provision your requested capacity.	Best-effort. When you create a MIG resize request, Compute Engine makes best-effort attempts to schedule the provisioning of your requested capacity.	Very high. If Google Cloud approves your reservation request, then you have very high assurance that Compute Engine provisions your reserved capacity at your chosen delivery date and time. You have exclusive access to your reserved capacity for the reservation period.
Compute instance lifespan	You can control when to stop or delete a compute instance. However, if the machine type that the compute instance uses doesn't support live migration, then Compute Engine stops the compute instance duringhost maintenance events.	You can control when to stop or delete a compute instance, except in the following cases: Compute Engine stops or deletes the compute instance to reclaim capacity. This process is calledpreemption. If the machine type that the compute instance uses doesn't support live migration, then Compute Engine stops the compute instance duringhost maintenance events.	Before a compute instance reaches the end of its run duration, you can do the following: Stop the compute instance: if you created your Flex-start VM as a standalone compute instance or individually in a MIG as capacity becomes available, then you can stop the compute instances at any time. You can't stop Flex-start VM that were created in a MIG resize request. Delete the compute instance: you can delete the compute instance at any time. When a compute instance reaches the end of its run duration, Compute Engine stops or deletes it based on its termination action.	You can control when to stop or delete a compute instance, except in the following cases: Compute Engine stops the compute instance duringhost maintenance events. The automatically created reservation to provision your requested capacity reaches the end of its committed reservation period. At that time, Compute Engine deletes the reservation, and stops or deletes any compute instances that consume the reservation based on the termination action that is specified in their configuration.

1 Spot, Flex-start, and Future reservations in calendar mode for TPU7x is restricted by an allowlist. To request access, contact your account team or thesales team.

What's next

Read anoverview of creating Compute Engine instances.
Learn more aboutSpot VMs.
Learn more aboutFlex-start VMs.
Learn more aboutcompute instances that use the reservation-bound provisioning model.