Create an A3 Ultra or A4 instance (original) (raw)


This document describes how to create instances with attached GPUs from the A3 Ultra or A4 machine series. To learn more about creating instances with attached GPUs, seeOverview of creating an instance with attached GPUs.

Before you begin

Required roles

To get the permissions that you need to create instances, ask your administrator to grant you theCompute Instance Admin (v1) (roles/compute.instanceAdmin.v1) IAM role on the project. For more information about granting roles, see Manage access to projects, folders, and organizations.

This predefined role contains the permissions required to create instances. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to create instances:

You might also be able to get these permissions with custom roles or other predefined roles.

A3 Ultra or A4 instances are available through the following creation options, which each have different creation procedures, resource availability, and pricing. Identify which option that you want to use based on your workload.

Create an A3 Ultra or A4 instance using Spot VMs

To create an A3 Ultra or A4 instance using Spot VMs, complete the steps in the following sections:

  1. Create VPC networks.
  2. Create the Spot VM.
  3. Prepare a Spot VM with attached GPUs for use.

Create VPC networks

For A4 or A3 Ultra machine type, you must create three VPC networks for the following network interfaces:

For more information about NIC arrangement, seeReview network bandwidth and NIC arrangement.

Set up the networks either manually by following the instruction guides or automatically by using the provided script.

Instruction guides

To create the networks, you can use the following instructions:

For these VPC networks, we recommend setting themaximum transmission unit (MTU) to a larger value. For A4 or A3 Ultra machine type, the recommended MTU is 8896 bytes. To review the recommended MTU settings for other GPU machine types, seeMTU settings for GPU machine types.

Script

To create the networks, you can use the following script.

For these VPC networks, we recommend setting themaximum transmission unit (MTU) to a larger value. For A4 or A3 Ultra machine type, the recommended MTU is 8896 bytes. To review the recommended MTU settings for other GPU machine types, seeMTU settings for GPU machine types.

#!/bin/bash

Create standard VPCs (network and subnets) for the gVNICs

for N in $(seq 0 1); do gcloud compute networks create GVNIC_NAME_PREFIX-net-$N
--subnet-mode=custom
--mtu=8896

gcloud compute networks subnets create GVNIC_NAME_PREFIX-sub-$N \
  --network=GVNIC_NAME_PREFIX-net-$N \
  --region=REGION \
  --range=10.$N.0.0/16

gcloud compute firewall-rules create GVNIC_NAME_PREFIX-internal-$N \
  --network=GVNIC_NAME_PREFIX-net-$N \
  --action=ALLOW \
  --rules=tcp:0-65535,udp:0-65535,icmp \
  --source-ranges=10.0.0.0/8

done

Create SSH firewall rules

gcloud compute firewall-rules create GVNIC_NAME_PREFIX-ssh
--network=GVNIC_NAME_PREFIX-net-0
--action=ALLOW
--rules=tcp:22
--source-ranges=IP_RANGE

Assumes that an external IP is only created for vNIC 0

gcloud compute firewall-rules create GVNIC_NAME_PREFIX-allow-ping-net-0
--network=GVNIC_NAME_PREFIX-net-0
--action=ALLOW
--rules=icmp
--source-ranges=IP_RANGE

List and make sure network profiles exist

gcloud compute network-profiles list

Create network for CX-7

gcloud compute networks create RDMA_NAME_PREFIX-mrdma
--network-profile=ZONE-vpc-roce
--subnet-mode custom
--mtu=8896

Create subnets.

for N in $(seq 0 7); do gcloud compute networks subnets create RDMA_NAME_PREFIX-mrdma-sub-$N
--network=RDMA_NAME_PREFIX-mrdma
--region=REGION
--range=10.$((N+2)).0.0/16 # offset to avoid overlap with gVNICs done

Replace the following:

Create the Spot VM

To create the Spot VM, use one of the following methods:

Console

  1. In the Google Cloud console, go to the Create an instance page.
    Go to Create an instance
    The Create an instance screen appears and displays theMachine configuration pane.
  2. In the Machine configuration pane, complete the following steps:
    1. Specify a Name for your instance. SeeResource naming convention.
    2. Select the Region and Zone where you want to reserve capacity. See the list of available GPU regions and zones.
    3. Click the GPUs tab, and then complete the following steps:
      1. In the GPU type list, select your GPU type.
        * For A4 instances, select NVIDIA B200
        * For A3 Ultra instances, select NVIDIA H200 141GB
      2. In the Number of GPUs list, select 8.
  3. In the navigation menu, click OS and storage. In theOS and storage pane that appears, complete the following steps:
    1. Click Change. The Boot disk configuration pane opens.
    2. On the Public images tab, select a recommended image. For a list of recommended images, see Operating systems.
    3. To confirm your boot disk options, click Select.
  4. To create a multi-NIC instance, complete the following steps. Otherwise, to create a single-NIC instance, skip these steps.
    1. In the navigation menu, click Networking. In theNetworking pane that appears, complete the following steps:
      1. In the Network interfaces section, complete the following steps:
      2. Delete the default network interface. To delete the interface, click Delete.
      3. Click Add a network interface. Use this option to add the gVNIC and RDMA networks that you created in the previous section. When you add the networks, remember the following:
        * Specify your host networks in the Network andSubnetwork lists, and set theNetwork interface card list to gVNIC.
        * Specify your GPU networks in the Network andSub-network lists, and set theNetwork interface card list to MRDMA for these networks.
  5. In the navigation menu, click Advanced. In the Advanced pane that appears, complete the following steps:
    1. In the Provisioning model section, select Spot in theVM provisioning model list.
    2. Optional: To specify the action to take when Compute Engine preempts the instance (stop (default) or delete), complete the following steps:
      1. Expand the VM provisioning model advanced settings section.
      2. In the On VM termination list, select an option.
  6. To create and start the instance, click Create.

gcloud

To create the VM, use the gcloud compute instances create command.

gcloud compute instances create VM_NAME
--machine-type=MACHINE_TYPE
--image-family=IMAGE_FAMILY
--image-project=IMAGE_PROJECT
--zone=ZONE
--boot-disk-type=hyperdisk-balanced
--boot-disk-size=DISK_SIZE
--scopes=cloud-platform
--network-interface=nic-type=GVNIC,network=GVNIC_NAME_PREFIX-net-0,subnet=GVNIC_NAME_PREFIX-sub-0
--network-interface=nic-type=GVNIC,network=GVNIC_NAME_PREFIX-net-1,subnet=GVNIC_NAME_PREFIX-sub-1,no-address
--network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-0,no-address
--network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-1,no-address
--network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-2,no-address
--network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-3,no-address
--network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-4,no-address
--network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-5,no-address
--network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-6,no-address
--network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-7,no-address
--provisioning-model=SPOT
--instance-termination-action=TERMINATION_ACTION

Replace the following:

REST

To create the VM, make a POST request to the instances.insert method.

POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances { "machineType":"projects/PROJECT_ID/zones/ZONE/machineTypes/MACHINE_TYPE", "name":"VM_NAME", "disks":[ { "boot":true, "initializeParams":{ "diskSizeGb":"DISK_SIZE", "diskType":"hyperdisk-balanced", "sourceImage":"projects/IMAGE_PROJECT/global/images/family/IMAGE_FAMILY" }, "mode":"READ_WRITE", "type":"PERSISTENT" } ], "networkInterfaces": [ { "accessConfigs": [ { "name": "external-nat", "type": "ONE_TO_ONE_NAT" } ], "network": "projects/NETWORK_PROJECT_ID/global/networks/GVNIC_NAME_PREFIX-net-0", "nicType": "GVNIC", "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/GVNIC_NAME_PREFIX-sub-0" }, { "network": "projects/NETWORK_PROJECT_ID/global/networks/GVNIC_NAME_PREFIX-net-1", "nicType": "GVNIC", "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/GVNIC_NAME_PREFIX-sub-1" }, { "network": "projects/NETWORK_PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma", "nicType": "MRDMA", "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-0" }, { "network": "projects/NETWORK_PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma", "nicType": "MRDMA", "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-1" }, { "network": "projects/NETWORK_PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma", "nicType": "MRDMA", "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-2" }, { "network": "projects/NETWORK_PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma", "nicType": "MRDMA", "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-3" }, { "network": "projects/NETWORK_PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma", "nicType": "MRDMA", "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-4" }, { "network": "projects/NETWORK_PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma", "nicType": "MRDMA", "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-5" }, { "network": "projects/NETWORK_PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma", "nicType": "MRDMA", "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-6" }, { "network": "projects/NETWORK_PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma", "nicType": "MRDMA", "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-7" } ], "scheduling": { "provisioningModel": "SPOT", "instanceTerminationAction": "TERMINATION_ACTION" } }

Replace the following:

Prepare a Spot VM with attached GPUs for use

To prepare a Spot VM with attached GPUs for use, complete the following steps:

  1. To allow an instance to use attached its GPUs, the instance requires GPU drivers. Unless you specified an image that already includes the required GPU drivers, follow the steps toInstall GPU drivers.
  2. To prepare a Spot VM for use, complete the following steps:

What's next