Create a Flex-start VM (original) (raw)
This document explains how to create a Flex-start virtual machine (VM) instance. Flex-start VMs run for up to seven days and help you acquire high-demand resources like GPUs at a discounted price. These features make Flex-start VMs a cost-effective solution for running short-duration workloads, such as model fine-tuning and batch inference.
To learn more about the key characteristics of Flex-start VMs, including the requirements and limitations that you apply when you create them, seeAbout Flex-start VMs.
Before you begin
- To learn more about using accelerator-optimized machine type (except A4X Max or A4X) in a VM, see Overview of creating an instance with attached GPUs.
- If you haven't already, set up authentication. Authentication verifies your identity for access to Google Cloud services and APIs. To run code or samples from a local development environment, you can authenticate to Compute Engine by selecting one of the following options:
Select the tab for how you plan to use the samples on this page:
Required roles
To get the permissions that you need to create a standalone Flex-start VM, ask your administrator to grant you theCompute Instance Admin (v1) (roles/compute.instanceAdmin.v1) IAM role on the project. For more information about granting roles, see Manage access to projects, folders, and organizations.
This predefined role contains the permissions required to create a standalone Flex-start VM. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to create a standalone Flex-start VM:
compute.instances.createon the project- To use a custom image to create the VM:
compute.images.useReadOnlyon the image - To use a snapshot to create the VM:
compute.snapshots.useReadOnlyon the snapshot - To use an instance template to create the VM:
compute.instanceTemplates.useReadOnlyon the instance template - To specify a subnet for your VM:
compute.subnetworks.useon the project or on the chosen subnet - To specify a static IP address for the VM:
compute.addresses.useon the project - To assign an external IP address to the VM when using a VPC network:
compute.subnetworks.useExternalIpon the project or on the chosen subnet - To assign a legacy network to the VM:
compute.networks.useon the project - To assign an external IP address to the VM when using a legacy network:
compute.networks.useExternalIpon the project - To set VM instance metadata for the VM:
compute.instances.setMetadataon the project - To set tags for the VM:
compute.instances.setTagson the VM - To set labels for the VM:
compute.instances.setLabelson the VM - To set a service account for the VM to use:
compute.instances.setServiceAccounton the VM - To create a new disk for the VM:
compute.disks.createon the project - To attach an existing disk in read-only or read-write mode:
compute.disks.useon the disk - To attach an existing disk in read-only mode:
compute.disks.useReadOnlyon the disk
You might also be able to get these permissions with custom roles or other predefined roles.
If you want to create a Flex-start VM that uses a machine type other than A2, G4, G2, or N1, then see the following:
To create an A2 or G2 Flex-start VM that specifies acompact placement policyfor minimal network latency, use the gcloud CLI or REST API. Otherwise, select one of the following options:
Console
- In the Google Cloud console, go to the Create an instance page.
Go to Create an instance - In the Machine configuration pane, complete the following steps:
- In the Name field, enter a name for the Flex-start VM.
- Specify the Region and Zone where you want to create your VM. To review the regions and zones where the machine type that you want to use is available, seeAvailable regions and zones.
- Based on the workload that you want to run, specify a machine type as follows:
- To specify an accelerator-optimized machine type, do the following:
1. Click the GPUs tab.
2. In the GPU type list, select a GPU type, exceptNVIDIA GB200 192GB.
3. In the Number of GPUs list, select the number of GPUs to attach to your VM.
4. Optional: If your GPU model supportsNVIDIA RTX Virtual Workstations (vWS) for graphics workloads, and you plan to run graphics-intensive workloads, selectEnable Virtual Workstation (NVIDIA GRID). - To specify an H4D machine type, do the following:
1. Click the Compute optimized tab.
2. In the Series column, select H4D.
- To specify an accelerator-optimized machine type, do the following:
- In the navigation menu, click Advanced. In the Advanced pane that appears, complete the following steps:
- In the Provisioning model section, in theVM provisioning model list, select Flex-start.
- In the Enter number of hours field, enter the maximum amount of time that you want the VM to run. The value must be between
0.01(0.01 hours, or 36 seconds) or168(168 hours, or seven days). - Select the Set a wait time for VM creation checkbox. Then, based on the zonal requirements for your workload, specify one of the following durations to help increase the chances that your VM creation request succeeds:
- If your workload requires you to create the VM in a specific zone, then specify a duration between 90 seconds and2 hours. Longer durations give you higher chances of obtaining resources.
- If the VM can run in any zone within the region, then specify a duration of 0 seconds or clear theSet a wait time for VM creation checkbox. This action specifies that Compute Engine only allocates resources if they are immediately available. If the VM creation request fails because resources are unavailable, then retry the request in a different zone.
- In the On VM termination field, select whether to stop or delete the Flex-start VM at the end of its run duration:
- To delete the VM, select Delete.
- To stop the VM, select Stop.
- To create the Flex-start VM, click Create.
gcloud
To create an A2, G4, G2, or N1 Flex-start VM, use thegcloud compute instances create commandwith the following flags:
- The
--request-valid-for-durationflag - The
--provisioning-model=FLEX_STARTflag - The
--instance-termination-actionflag - The
--max-run-durationflag - The
--maintenance-policy=TERMINATEflag - The
--reservation-affinity=noneflag
To create a Flex-start VM, run the following command:
gcloud compute instances create VM_NAME \
--machine-type=MACHINE_TYPE \
--zone=ZONE \
--request-valid-for-duration=VALID_FOR_DURATION \
--provisioning-model=FLEX_START \
--instance-termination-action=TERMINATION_ACTION \
--max-run-duration=RUN_DURATION \
--maintenance-policy=TERMINATE \
--reservation-affinity=none
Replace the following:
VM_NAME: the name of your new VM.MACHINE_TYPE: the machine type to use for the Flex-start VM. If you specify a G4, G2, or N1 machine type, then consider the following:- For G4 or G2 machine types, you can optionally specify aNVIDIA RTX Virtual Workstations (vWS)to use for graphic-intensive workloads. To do so, include the
--acceleratorflag in the command as follows:
--accelerator=count=VWS_ACCELERATOR_COUNT,type=VWS_ACCELERATOR_TYPEReplace the following:
*VWS_ACCELERATOR_COUNT: the number of NVIDIA RTX vWS that your workload requires. This number must match the number of GPUs that are attached in the G4 or G2 machine type that you want to use.
*VWS_ACCELERATOR_TYPE: the type of NVIDIA RTX vWS accelerator to use. Specify one of the following values:
* For G4 machine types:nvidia-rtx-pro-6000-vws
* For G2 machine types:nvidia-l4-vws- For N1 machine types, you must specify the number and type of GPUs to attach to your VM. Otherwise, creating the VM fails. To attach GPUs to an N1 VM, include the
--acceleratorflag in the command as follows:
--accelerator=count=NUMBER_OF_ACCELERATORS,type=ACCELERATOR_TYPEReplace the following:
*NUMBER_OF_ACCELERATORS: the number of GPUs to attach to your N1 VM.
*ACCELERATOR_TYPE: asupported GPU model for N1 VMs.- For G4 or G2 machine types, you can optionally specify aNVIDIA RTX Virtual Workstations (vWS)to use for graphic-intensive workloads. To do so, include the
ZONE: the zone where you want to create the VM. To verify that your specified machine type is available in the zone where you want to create the VM, seeAvailable regions and zones.VALID_FOR_DURATION: the maximum time to wait for provisioning your requested resources. You must format the value as the number of days, hours, minutes, or seconds followed byd,h,m, andsrespectively. For example, specify30mfor 30 minutes or1h2m3sfor one hour, two minutes, and three seconds. Based on the zonal requirements for your workload, specify one of the following durations to help increase your chances that your VM creation request succeeds:- If your workload requires you to create the VM in a specific zone, then specify a duration between 90 seconds (
90s) and two hours (2h). Longer durations give you higher chances of obtaining resources. - If the VM can run in any zone within the region, then specify a duration of zero seconds (
0s). This value specifies that Compute Engine only allocates resources if they are immediately available. If the creation request fails because resources are unavailable, then retry the request in a different zone.
- If your workload requires you to create the VM in a specific zone, then specify a duration between 90 seconds (
TERMINATION_ACTION: whether to stop or delete the VM at the end of its run duration. Specify one of the following values:- To stop the VM:
STOP - To delete the VM:
DELETE
- To stop the VM:
RUN_DURATION: the maximum time that the VM runs before Compute Engine automatically stops or deletes it. You must format the value as the number of days, hours, minutes, or seconds followed byd,h,m, andsrespectively. The value must be between 10 minutes and seven days.
If you want to apply a compact placement policy to an A2 or G2 Flex-start VM, then include the --resource-policies flag in the command:
gcloud compute instances create VM_NAME \
--machine-type=MACHINE_TYPE \
--zone=ZONE \
--request-valid-for-duration=VALID_FOR_DURATION \
--provisioning-model=FLEX_START \
--instance-termination-action=TERMINATION_ACTION \
--max-run-duration=RUN_DURATION \
--maintenance-policy=TERMINATE \
--reservation-affinity=none \
--resource-policies=POLICY_NAME
Replace POLICY_NAME with the name of an existing compact placement policy. You can only create your Flex-start VM in the same region as the placement policy.
REST
To create an A2, G4, G2, or N1 Flex-start VM, make a POSTrequest to theinstances.insert method. In the request body, include the following fields:
- The
params.requestValidForDurationfield. - The
scheduling.provisioningModelfield set toFLEX_START. - The
scheduling.instanceTerminationActionfield. - The
scheduling.maxRunDurationfield. - The
scheduling.onHostMaintenancefield set toTERMINATE. - The
reservationAffinity.consumeReservationTypeset toNO_RESERVATION.
To create a Flex-start VM, make a POST request as follows:
POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances
{
"name": "VM_NAME",
"machineType": "zones/ZONE/machineTypes/MACHINE_TYPE",
"disks": [
{
"initializeParams": {
"sourceImage": "projects/IMAGE_PROJECT/global/images/IMAGE"
},
"boot": true
}
],
"networkInterfaces": [
{
"network": "global/networks/default"
}
],
"params": {
"requestValidForDuration": {
"seconds": VALID_FOR_DURATION
}
},
"scheduling": {
"provisioningModel": "FLEX_START",
"instanceTerminationAction": "TERMINATION_ACTION",
"maxRunDuration": {
"seconds": RUN_DURATION
},
"onHostMaintenance": "TERMINATE"
},
"reservationAffinity": {
"consumeReservationType": "NO_RESERVATION"
}
}
Replace the following:
PROJECT_ID: the ID of the project in which to create the VM.ZONE: the zone where you want to create the VM. To verify that a machine type is available in the zone where you want to create the VM, seeAvailable regions and zones.VM_NAME: the name of your new VM.MACHINE_TYPE: the machine type to use for the Flex-start VM. If you specify a G4, G2, or N1 machine type, then consider the following:- For G4 or G2 machine types, you can optionally specify aNVIDIA RTX Virtual Workstations (vWS)to use for graphic-intensive workloads. To do so, include the
guestAcceleratorsfield in the request body as follows:
"guestAccelerators": [ { "acceleratorCount": VWS_ACCELERATOR_COUNT, "acceleratorType": "projects/PROJECT_ID/zones/ZONE/acceleratorTypes/VWS_ACCELERATOR_TYPE" } ]Replace the following:
*VWS_ACCELERATOR_COUNT: the number of NVIDIA RTX vWS that your workload requires. This number must match the number of GPUs that are attached in the G4 or G2 machine type that you want to use.
*VWS_ACCELERATOR_TYPE: the type of NVIDIA RTX vWS accelerator to use. Specify one of the following values:
* For G2 machine types:nvidia-l4-vws
* For G4 machine types:nvidia-rtx-pro-6000-vws- For N1 machine types, you must specify the number and type of GPUs to attach to your VM. Otherwise, creating the VM fails. To attach GPUs to an N1 VM, include the
guestAcceleratorsfield in the request body as follows:
"guestAccelerators": [ { "acceleratorCount": ACCELERATOR_COUNT, "acceleratorType": "projects/PROJECT_ID/zones/ZONE/acceleratorTypes/ACCELERATOR_TYPE" } ]Replace the following:
*NUMBER_OF_ACCELERATORS: the number of GPUs to attach to your N1 VM.
*ACCELERATOR_TYPE: asupported GPU model for N1 VMs.- For G4 or G2 machine types, you can optionally specify aNVIDIA RTX Virtual Workstations (vWS)to use for graphic-intensive workloads. To do so, include the
IMAGE_PROJECT: the image project that contains the image—for example,debian-cloud. For more information about the supported image projects, seePublic images.IMAGE: specify one of the following:- A specific version of the OS image—for example,
debian-12-bookworm-v20240617. - An image family, which must be formatted as
family/IMAGE_FAMILY. This value specifies to use the most recent, non-deprecated OS image. For example, if you specifyfamily/debian-12, the latest version in the Debian 12 image family is used. For more information about using image families, seeImage families best practices.
- A specific version of the OS image—for example,
VALID_FOR_DURATION: the maximum time in seconds to wait for the VM to be provisioned. Based on the zonal requirements for your workload, specify one of the following durations to help increase your chances that your VM creation request succeeds:- If your workload requires you to create the VM in a specific zone, then specify a duration between 90 seconds (
90) and two hours (7200). Longer durations give you higher chances of obtaining resources. - If the VM can run in any zone within the region, then specify a duration of zero seconds (
0). This value specifies that Compute Engine only allocates resources if they are immediately available. If the creation request fails because resources aren't available, then retry the request in a different zone.
- If your workload requires you to create the VM in a specific zone, then specify a duration between 90 seconds (
TERMINATION_ACTION: whether to stop or delete the VM at the end of its run duration. Specify one of the following values:- To stop the VM:
STOP - To delete the VM:
DELETE
- To stop the VM:
RUN_DURATION: the maximum time, in seconds, that the VM runs before Compute Engine automatically stops or deletes it. The value must be between600(600 seconds, or 10 minutes) and604800(604,800 seconds, or seven days).
If you want to apply a compact placement policy to an A2 or G2 Flex-start VM, then include the resourcePolicies field in the request:
POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances
{
"name": "VM_NAME",
"machineType": "zones/ZONE/machineTypes/MACHINE_TYPE",
"disks": [
{
"initializeParams": {
"sourceImage": "projects/IMAGE_PROJECT/global/images/IMAGE"
},
"boot": true
}
],
"networkInterfaces": [
{
"network": "global/networks/default"
}
],
"params": {
"requestValidForDuration": {
"seconds": VALID_FOR_DURATION
}
},
"scheduling": {
"provisioningModel": "FLEX_START",
"instanceTerminationAction": "TERMINATION_ACTION",
"maxRunDuration": {
"seconds": RUN_DURATION
},
"onHostMaintenance": "TERMINATE"
},
"reservationAffinity": {
"consumeReservationType": "NO_RESERVATION"
},
"resourcePolicies": [
"projects/PROJECT_ID/regions/REGION/resourcePolicies/POLICY_NAME"
]
}
Replace the following:
REGION: the region where the compact placement policy exists. You can only create your Flex-start VM in the same region as the placement policy.POLICY_NAME: the name of the compact placement policy.
What's next
Try it for yourself
If you're new to Google Cloud, create an account to evaluate how Compute Engine performs in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.