Create a MIG with a multi-host Cloud TPU slice (original) (raw)

This document describes how to create a managed instance group (MIG) with a multi-host TPU slice.

Prerequisites

Complete the following prerequisites:

  1. Create a project for your TPUs as described in Set up a project for TPUs.
  2. Determine your TPU requirements as described in Plan your resources.

Create a MIG with multi-host TPU slices

  1. Create an instance template.
  2. Create a workload policy.
  3. Create the MIG.

Create an instance template

The command to create an instance template depends on the consumption option you use: on-demand, Spot, reservation-bound, or flex-start. For more information about consumption options, see Plan your TPU resources.

Create an instance template for an on-demand TPU VM

The following command creates an instance template using the on-demand consumption option:

gcloud compute instance-templates create INSTANCE_TEMPLATE_NAME \
  --machine-type=MACHINE_TYPE \
  --maintenance-policy=TERMINATE \
  --image-family=IMAGE_FAMILY \
  --image-project=IMAGE_PROJECT

Replace the following placeholders:

Create an instance template for a TPU Spot VM

The following command creates an instance template using the Spot consumption option:

gcloud compute instance-templates create INSTANCE_TEMPLATE_NAME \
  --machine-type=MACHINE_TYPE \
  --maintenance-policy=TERMINATE \
  --instance-termination-action=STOP \
  --provisioning-model=SPOT \
  --image-family=IMAGE_FAMILY \
  --image-project=IMAGE_PROJECT

Replace the following placeholders:

Create an instance template for a TPU reservation-bound VM

The following command creates an instance template using the reservation-bound consumption option:

gcloud compute instance-templates create INSTANCE_TEMPLATE_NAME \
  --machine-type=MACHINE_TYPE \
  --maintenance-policy=TERMINATE \
  --instance-termination-action=DELETE \
  --reservation-affinity=specific \
  --provisioning-model=reservation-bound \
  --reservation=RESERVATION_NAME \
  --image-family=IMAGE_FAMILY \
  --image-project=IMAGE_PROJECT

Replace the following placeholders:

Create an instance template for a TPU Flex-start VM

The following command creates an instance template using the flex-start consumption option:

gcloud compute instance-templates create INSTANCE_TEMPLATE_NAME \
    --machine-type=MACHINE_TYPE \
    --maintenance-policy=TERMINATE \
    --instance-termination-action=DELETE \
    --provisioning-model=FLEX_START \
    --max-run-duration=DURATION \
    --image-family=IMAGE_FAMILY \
    --image-project=IMAGE_PROJECT

Replace the following placeholders:

Create a workload policy

You must create a workload policy with the accelerator-topology parameter (for example, 4x4, 8x8, or 4x4x4). The accelerator topology configures the MIG to treat the instances as a single, interconnected slice.

The following command creates a workload policy:

gcloud compute resource-policies create workload-policy WORKLOAD_POLICY_NAME \
  --type=high-throughput \
  --accelerator-topology=TOPOLOGY \
  --region=REGION

Replace the following placeholders:

Create a MIG

Create a zonal or a regional MIG by using thegcloud compute instance-groups managed create commandas follows:

 gcloud compute instance-groups managed create MIG_NAME \  
    --size=MIG_SIZE \  
    --target-size-policy-mode=bulk \  
    --template=INSTANCE_TEMPLATE_URL \  
    --zone=ZONE \  
    --default-action-on-vm-failure=do-nothing \  
    --workload-policy=WORKLOAD_POLICY_URL  
 gcloud compute instance-groups managed create MIG_NAME \  
    --size=MIG_SIZE \  
    --target-size-policy-mode=bulk \  
    --template=INSTANCE_TEMPLATE_URL \  
    --region=REGION \  
    --default-action-on-vm-failure=do-nothing \  
    --workload-policy=WORKLOAD_POLICY_URL \  
    --target-distribution-shape=any-single-zone \  
    --instance-redistribution-type=none  

Replace the following placeholders:

Create VMs with custom names in a MIG

You can create VMs in a MIG by specifying custom names for each VM. This is useful for debugging and ensuring instances are created in a specific order.

MIGs that contain a multi-host TPU slice use the bulk mode of target size policy. When creating VMs with custom names in such a MIG, the following applies:

Create VMs with custom names by using one of the following REST API methods:

Replace the following placeholders:

What's next