Kubernetes environment setup for Neuron — AWS Neuron Documentation (original) (raw)

This document is relevant for: Inf1, Inf2, Trn1, Trn2

Kubernetes environment setup for Neuron#

Introduction#

Customers that use Kubernetes can conveniently integrate Inf1/Trn1 instances into their workflows. This tutorial will go through deploying the neuron device plugin daemonset and also how to allocate neuron cores or devices to application pods.

Please refer to EKS instructions to create a cluster. Once the cluster is ACTIVE, please add nodes to the cluster. We recommend using node template for neuron nodes. Following example demonstrates how to add neuron nodes using node template. The example adds managed nodes using eksctl tool. For more details, please refer to EKS User Guide.

As first step, please create a script to capture the parameters for the node template:

#!/bin/bash

CLUSTER_NAME=$1 CLUSTER_SG=$(eksctl get cluster $CLUSTER_NAME -o json|jq -r ".[0].ResourcesVpcConfig.ClusterSecurityGroupId") VPC_ID=$(eksctl get cluster $CLUSTER_NAME -o json|jq -r ".[0].ResourcesVpcConfig.VpcId")

cat < cfn_params.json [ { "ParameterKey": "ClusterName", "ParameterValue": "$CLUSTER_NAME" },

{
    "ParameterKey": "ClusterControlPlaneSecurityGroup",
    "ParameterValue": "$CLUSTER_SG"
},

{
    "ParameterKey": "VpcId",
    "ParameterValue": "$VPC_ID"
}

] EOF

These parameters include the name of the cluster, the security group the nodes can use to connect to the control plane and the vpcid. Next, get the node group template from tutorial below -

wget https://raw.githubusercontent.com/aws-neuron/aws-neuron-eks-samples/master/dp_bert_hf_pretrain/cfn/eks_trn1_ng_stack.yaml

This template file has a few important config settings -

--==BOUNDARY== Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash set -ex config_dir=/opt/aws/neuron config_file=${config_dir}/logical_nc_config [ -d "$config_dir" ] || mkdir -p "$config_dir" [ -f "$config_file" ] || touch "$config_file" if ! grep -q "^NEURON_LOGICAL_NC_CONFIG=1$" "$config_file" 2>/dev/null; then printf "NEURON_LOGICAL_NC_CONFIG=1" >> "$config_file" fi --==BOUNDARY==--

Finally, run the following command to create cloud formation stack:

aws cloudformation create-stack
--stack-name eks-trn1-ng-stack
--template-body file://eks_trn1_ng_stack.yaml
--parameters file://cfn_params.json
--capabilities CAPABILITY_IAM

The above command will create a stack named eks-trn1-ng-stack, which will be visible in cloudformation. Please wait for that stack creation to complete before proceeding to next step.

Now we are ready to add the nodes. The example will demonstrate creating node groups using eksctl tool.

Please run following command to determine the AZs:

aws ec2 describe-availability-zones
--region $REGION_CODE
--query "AvailabilityZones[]"
--filters "Name=zone-id,Values=$1"
--query "AvailabilityZones[].ZoneName"
--output text

Next, create a script named create_ng_yaml.sh to generate node group yaml. The arguments to the script include the region, AZs, cluster name and name of the cloudformation stack created earlier (eks-trn1-ng-stack in case of this example):

#!/bin/bash

REGION_CODE=$1 EKSAZ1=$2 EKSAZ2=$3 CLUSTER_NAME=$4 STACKNAME=$5

LT_ID_TRN1=$(aws cloudformation describe-stacks --stack-name $STACKNAME
--query "Stacks[0].Outputs[?OutputKey=='LaunchTemplateIdTrn1'].OutputValue"
--output text)

cat < trn1_nodegroup.yaml apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig

metadata: name: $CLUSTER_NAME region: $REGION_CODE version: "1.28"

iam: withOIDC: true

availabilityZones: ["$EKSAZ1","$EKSAZ2"]

managedNodeGroups:

Run the above script. It should produce a yaml similar to -

apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig

metadata: name: nemo2 region: us-west-2 version: "1.25"

iam: withOIDC: true

availabilityZones: ["us-west-2d","us-west-2c"]

managedNodeGroups:

The example shows kubernetes version 1.25. Please update the version as needed. This yaml can now be used with eksctl.

eksctl create nodegroup -f trn1_nodegroup.yaml

This will add the nodes to the cluster. Please wait for the nodes to be ‘Ready’. This can be verified using the get node command.

If you are running a distributed training or inference job, you will need EFA resources. Please install the EFA device plugin using instructions at EFA device plugin repository.

Next, we will install the Neuron Device Plugin.

Neuron device plugin exposes Neuron cores & devices to kubernetes as a resource. aws.amazon.com/neuroncore and aws.amazon.com/neuron are the resources that the neuron device plugin registers with the kubernetes. aws.amazon.com/neuroncore is used for allocating neuron cores to the container. aws.amazon.com/neuron is used for allocating neuron devices to the container. When resource name ‘neuron’ is used, all the cores belonging to the device will be allocated to container.

The Neuron scheduler extension is required for scheduling pods that require more than one Neuron core or device resource. For a graphical depiction of how the Neuron scheduler extension works, see Neuron Scheduler Extension Flow Diagram. The Neuron scheduler extension finds sets of directly connected devices with minimal communication latency when scheduling containers. On Inf1 and Inf2 instance types where Neuron devices are connected through a ring topology, the scheduler finds sets of contiguous devices. For example, for a container requesting 3 Neuron devices the scheduler might assign Neuron devices 0,1,2 to the container if they are available but never devices 0,2,4 because those devices are not directly connected. On Trn1.32xlarge and Trn1n.32xlarge instance types where devices are connected through a 2D torus topology, the Neuron scheduler enforces additional constraints that containers request 1, 4, 8, or all 16 devices. If your container requires a different number of devices, such as 2 or 5, we recommend that you use an Inf2 instance instead of Trn1 to benefit from more advanced topology.

The Neuron scheduler extension applies different rules when finding devices to allocate to a container on Inf1 and Inf2 instances than on Trn1. These rules ensure that when users request a specific number of resources, Neuron delivers consistent and high performance regardless of which cores and devices are assigned to the container.

On Inf1 and Inf2 Neuron devices are connected through a ring topology. There are no restrictions on the number of devices requested as long as it is fewer than the number of devices on a node. When the user requests N devices, the scheduler finds a node where N contiguous devices are available. It will never allocate non-contiguous devices to the same container. The figure below shows examples of device sets on an Inf2.48xlarge node which could be assigned to a container given a request for 2 devices.

eks-inf2-device-set

Devices on Trn1.32xlarge and Trn1n.32xlarge nodes are connected via a 2D torus topology. On Trn1 nodes containers can request 1, 4, 8, or all 16 devices. In the case you request an invalid number of devices, such as 7, your pod will not be scheduled and you will receive a warning:

Instance type trn1.32xlarge does not support requests for device: 7. Please request a different number of devices.

When requesting 4 devices, your container will be allocated one of the following sets of devices if they are available.

eks-trn1-device-set4

When requesting 8 devices, your container will be allocated one of the following sets of devices if they are available.

eks-trn1-device-set8

For all instance types, requesting one or all Neuron cores or devices is valid.

Multiple Scheduler Approach

In cluster environments where there is no access to default scheduler, the neuron scheduler extension can be used with another scheduler. A new scheduler is added (along with the default scheduler) and then the pod’s that needs to run the neuron workload use this new scheduler. Neuron scheduler extension is added to this new scheduler. EKS natively does not yet support the neuron scheduler extension and so in the EKS environment this is the only way to add the neuron scheduler extension.

   cpu: "4"  
   memory: 4Gi  
   aws.amazon.com/neuroncore: 9  
   requests:  
   cpu: "1"  
   memory: 1Gi

Default Scheduler Approach

This document is relevant for: Inf1, Inf2, Trn1, Trn2