Review compute instance and cluster configurations (original) (raw)

This document describes the configurations in AI Hypercomputer to consider before you create Compute Engine instances and clusters. Reviewing the available configurations helps ensure optimal performance for your workloads, as well as minimize downtimes and performance issues.

Configuration factors for compute instance and cluster creation

Before you create compute instances and clusters to run your workloads, consider which configuration to use:

The provisioning model
The cluster deployment tools
If you use the reservation-bound provisioning model, then you must also consider the following factors:

Provisioning models

Based on theconsumption option that you choose for creating compute instances or clusters, you can use one of the following provisioning models to obtain the necessary resources for creating instances:

Reservation-bound: you can reserve resources at a discounted price for a future date and duration. At the start of your reservation period, you can use the reserved resources to create instances or clusters. You have exclusive access to your reserved resources for the reservation period.
Flex-start: you can request discounted resources for up to seven days. Compute Engine makes best-effort attempts to schedule the provisioning of your requested resources as soon as they're available. You have exclusive access to your obtained resources for your requested period.
Spot: based on availability, you can immediately obtain deeply discounted resources. However, Compute Engine might stop or delete instances at any time to reclaim capacity.

Reservation-bound provisioning model

The reservation-bound provisioning model links your created compute instances to the capacity that you previously reserved. When you reserve capacity, Compute Engine creates an empty reservation. Then, at the reservation start time, the following occurs:

Compute Engine adds your reserved number of instances to the reservation. You have exclusive access to the reserved capacity until the reservation end time.
Google Cloud charges you for the reserved capacity until the end of your reservation period, whether you use the capacity or not.

You can then use the reserved resources to create instances without additional charges. You only pay for resources that aren't included in the reservation, such as disks or IP addresses.

To specify the reservation-bound provisioning model when you create compute instances or MIGs, do the following:

In the Google Cloud console, in the Provisioning model list, selectReservation-bound.
In the Google Cloud CLI, include the--provisioning-model=RESERVATION_BOUND flag in the command.
In the Compute Engine API, include the"provisioningModel": "RESERVATION_BOUND" field in the request body.

For more information about setting these parameters when you create instances or MIGs after you reserve capacity, seeCompute instance and cluster creation overview. If you use Cluster Toolkit to deploy your clusters, then the cluster blueprint sets the provisioning model for you.

Flex-start provisioning model

The flex-start provisioning model lets you create standalone Flex-start VMs or add Flex-start VMs to a managed instance group (MIG) when your requested capacity is available. When you add Flex-start VMs to a MIG by using resize requests, the MIG creates the instances all at once. This approach helps you avoid unnecessary charges for partial capacity that Compute Engine might deliver while you wait for the full capacity needed to start your workload. The flex-start provisioning model provisions resources from a secure capacity pool, which helps to increase your chances of obtaining high-demand resources like GPUs.

To specify the flex-start provisioning model when creating a standalone instance or an instance template for a MIG, do the following:

In the Google Cloud console, in the Provisioning model list, selectFlex-start.
In the gcloud CLI, include the --provisioning-model=FLEX_STARTflag in the command.
In the Compute Engine API, include the"provisioningModel": "FLEX_START" field in the request body.

For more information about creating instances or clusters that use flex-start provisioning model, see the following documents:

Create a standalone instance
Create MIGs with resize requests
Create Slurm clusters:
- Create a fully managed Slurm cluster
- Create a self-managed Slurm cluster
Create GKE clusters:
- Create a cluster with the default configuration
- Create a custom cluster

Spot provisioning model

The spot provisioning model lets you create deeply-discounted compute instances based on availability. However, Compute Engine might stop or delete the created instances at any time to reclaim capacity. This process is calledpreemption.

To specify the spot provisioning model when you create instances or MIGs, do the following:

In the Google Cloud console, in the Provisioning model list, selectSpot.
In the gcloud CLI, include the --provisioning-model=SPOT flag in the command.
In the Compute Engine API, include the "provisioningModel": "SPOT"field in the request body.

For more information about setting these parameters when you create instances or MIGs, seeCompute instance and cluster creation overview.

Cluster deployment tools

Cluster Toolkit is an open source deployment tool that is recommended for creating GPU-accelerated clusters. Cluster Toolkit can deploy both Google Kubernetes Engine (GKE) or Slurm clusters.

Alternatively, you can choose to provision your groups of compute instances by using one of the following methods, and then incorporate your own workload scheduler as needed:

Reservation block deployment types

If you use the reservation-bound provisioning model when creating A4X Max, A4X, A4, A3 Ultra, A3 Mega, and A3 High (8 GPUs) compute instances or clusters, the machines you receive are automatically deployed within blocks of densely allocated hosts. This deployment offers the following benefits:

Non-blocking networking for consistent high-bandwidth, low-latency instance connectivity by using dynamic machine learning (ML) network fabric from Google.
Access to network topology that provides a hierarchical view of the relative proximity among instances. This feature is useful for advanced job scheduling use cases.
Fine-grained, topology-aware placement when you use orchestrators.
Fine-grained user control over maintenance schedules to maximize job scheduling and uptime, and minimize downtimes.

Reservation operational mode

If you use the reservation-bound provisioning model, then the machine type that you reserve determines the _reservation operational mode_for your reserved capacity. Each mode defines how to respond to host errors or faulty host reports, as well as your level of visibility and control over the reservation's infrastructure.

Each reservation operational mode defines the following:

Who manages recovery: you or Google Cloud.
What capacity you use for recovery: only your reserved capacity, or capacity within or outside your reservations.
Your level of placement control: whether you can view and start maintenance before the planned time for specific reservation sub-blocks for fine-grained control.

When you reserve capacity to create compute instances or clusters, you must choose between one of the following reservation operational modes:managed mode or all capacity mode.

Managed mode

In managed mode, Google Cloud automatically manages the maintenance and recovery process of your compute instances after host errors or faulty host reports. This approach is ideal when your workload requires high stability, and you prefer an automated process to minimize downtimes.

The managed mode has the following features:

Only use reserved capacity for recovery: Compute Engine only uses your reserved capacity to restart instances. If there's no available capacity in your reservations, then Compute Engine only restarts instances after you obtain more capacity.
Automated instance restarts: Google Cloud handles the entire recovery process for an instance. When host maintenance is required, Compute Engine automatically migrates your instances on other available machines within your reservation and restarts the instances.
Block management and visibility: you can view the topology, health, and maintenance status of individual reservations and reservation blocks. You can also receive maintenance notifications, and optionally start maintenance before the scheduled maintenance time, for these resources.
Potential API rate limits: calls to the report faulty host API may be rate-limited per reservation.

All capacity mode

In all capacity mode, you are responsible for managing a compute instance recovery process. You must manually start maintenance after host errors or faulty host reports. Unlike the managed mode, you can also view and start maintenance for your reservation sub-blocks. These features give you full, granular control over the maintenance and recovery process for your instances.

The all capacity mode has the following features:

Use reserved and unreserved capacity for recovery: you can use your reserved resources, as well as any resources that are available outside of your reservation, to help you migrate and restart an instance when its host fails.
Manual instance restarts: you're responsible for the recovery process of an instance. When host maintenance is required because of an host error or faulty host report, Compute Engine stops your instance. You can only restart the instance after maintenance completes.
Block and sub-block management and visibility: you can view the topology, health, and maintenance status of individual reservations, reservation blocks, and reservation sub-blocks. You can also receive maintenance notifications, and optionally start maintenance before the scheduled maintenance time, for these resources.
No API rate limits: there are no rate limits when you make calls to the report faulty host API.

Maintenance scheduling types

If you use the reservation-bound provisioning model, then Cluster Director provides options for scheduling host maintenance for the running compute instances in your cluster. When you reserve capacity, you can specify whether to group instances and have synchronized maintenance scheduling (grouped), or the instances can be loosely coupled and have independent maintenance scheduling (independent).

Grouped maintenance scheduling

The grouped maintenance scheduling type helps ensure that, no matter when Compute Engine provisions a compute instance, all instances running the same workload have the same planned maintenance frequency. This tightly-coupled maintenance lets you optimize your job's performance by giving you complete control over your used and unused capacity.

A group maintenance scheduling type is useful in the following cases:

Your environment uses a job scheduler, such as Slurm or GKE.
You want to run training or other highly parallelized-computing workloads.

Independent maintenance scheduling

This independent maintenance scheduling type gives instances different maintenance schedules. This configuration is ideal if you want to run inference or limited-scale training where workloads run more efficiently when they have separate maintenance schedules.

What's next?

Reserve capacity