Train your model on EC2 — AWS Neuron Documentation (original) (raw)

Contents

This document is relevant for: Inf2, Trn1, Trn2

Train your model on EC2#

Table of Contents

Description#

Neuron developer flow on EC2

You can use a single Trn1 instance as a development environment to compile and train Neuron models. In this developer flow, you provision an EC2 Trn1 instance using a Deep Learming AMI (DLAMI) and execute the two steps of the development flow in the same instance. The DLAMI comes pre-packaged with the Neuron frameworks, compiler, and required runtimes to complete the flow. Development happens through Jupyter Notebooks or using a secure shell (ssh) connection in terminal. Follow the steps bellow to setup your environment.

Setup Environment#

1. Launch an Trn1 Instance#

Note

If you are facing a connectivity issue during the model loading process on a Trn1 instance with Ubuntu, that could probably be because of Ubuntu limitations with multiple interfaces. To solve this problem, please follow the steps mentioned here.

Users are highly encouraged to use DLAMI to launch the instances, since DLAMIs come with the required fix.

2. Set up a development environment#

Enable PyTorch-Neuron#

PyTorch 1.11.0

Ubuntu 20 AMI

Note

Configure Linux for Neuron repository updates

. /etc/os-release

sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main EOF wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -

Update OS packages

sudo apt-get update -y

Install git

sudo apt-get install git -y

Install OS headers

sudo apt-get install linux-headers-$(uname -r) -y

Remove preinstalled packages and Install Neuron Driver and Runtime

sudo apt-get remove aws-neuron-dkms -y sudo apt-get remove aws-neuronx-dkms -y sudo apt-get remove aws-neuronx-oci-hook -y sudo apt-get remove aws-neuronx-runtime-lib -y sudo apt-get remove aws-neuronx-collectives -y sudo apt-get install aws-neuronx-dkms=2.* -y sudo apt-get install aws-neuronx-oci-hook=2.* -y sudo apt-get install aws-neuronx-runtime-lib=2.* -y sudo apt-get install aws-neuronx-collectives=2.* -y

Install EFA Driver(only required for multi-instance training)

curl -O https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz wget https://efa-installer.amazonaws.com/aws-efa-installer.key && gpg --import aws-efa-installer.key cat aws-efa-installer.key | gpg --fingerprint wget https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz.sig && gpg --verify ./aws-efa-installer-latest.tar.gz.sig

tar -xvf aws-efa-installer-latest.tar.gz cd aws-efa-installer && sudo bash efa_installer.sh --yes cd sudo rm -rf aws-efa-installer-latest.tar.gz aws-efa-installer

Remove pre-installed package and Install Neuron Tools

sudo apt-get remove aws-neuron-tools -y sudo apt-get remove aws-neuronx-tools -y sudo apt-get install aws-neuronx-tools=2.* -y

export PATH=/opt/aws/neuron/bin:$PATH

Install Python venv and activate Python virtual environment to install

Neuron pip packages.

sudo apt install python3.8-venv python3.8 -m venv aws_neuron_venv_pytorch source aws_neuron_venv_pytorch/bin/activate pip install -U pip

Install wget, awscli

pip install wget pip install awscli

Install packages from repos

python -m pip config set global.extra-index-url "https://pip.repos.neuron.amazonaws.com"

Install Python packages - Transformers package is needed for BERT

python -m pip install torch-neuronx=="1.11.0.1." "neuronx-cc==2."

Amazon Linux 2 AMI

Note

Configure Linux for Neuron repository updates

sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF [neuron] name=Neuron YUM Repository baseurl=https://yum.repos.neuron.amazonaws.com enabled=1 metadata_expire=0 EOF sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB

Install OS headers

sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r) -y

Update OS packages

sudo yum update -y

Install git

sudo yum install git -y

Remove preinstalled packages and Install Neuron Driver and Runtime

sudo yum remove aws-neuron-dkms -y sudo yum remove aws-neuronx-dkms -y sudo yum remove aws-neuronx-oci-hook -y sudo yum remove aws-neuronx-runtime-lib -y sudo yum remove aws-neuronx-collectives -y sudo yum install aws-neuronx-dkms-2.* -y sudo yum install aws-neuronx-oci-hook-2.* -y sudo yum install aws-neuronx-runtime-lib-2.* -y sudo yum install aws-neuronx-collectives-2.* -y

Install EFA Driver(only required for multi-instance training)

curl -O https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz wget https://efa-installer.amazonaws.com/aws-efa-installer.key && gpg --import aws-efa-installer.key cat aws-efa-installer.key | gpg --fingerprint wget https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz.sig && gpg --verify ./aws-efa-installer-latest.tar.gz.sig tar -xvf aws-efa-installer-latest.tar.gz cd aws-efa-installer && sudo bash efa_installer.sh --yes cd sudo rm -rf aws-efa-installer-latest.tar.gz aws-efa-installer

Remove pre-installed package and Install Neuron Tools

sudo yum remove aws-neuron-tools -y sudo yum remove aws-neuronx-tools -y sudo yum install aws-neuronx-tools-2.* -y

export PATH=/opt/aws/neuron/bin:$PATH

Install Python venv and activate Python virtual environment to install

Neuron pip packages.

python3.7 -m venv aws_neuron_venv_pytorch source aws_neuron_venv_pytorch/bin/activate python -m pip install -U pip

Install wget, awscli

pip install wget pip install awscli

Install packages from repos

python -m pip config set global.extra-index-url "https://pip.repos.neuron.amazonaws.com"

Install Python packages - Transformers package is needed for BERT

python -m pip install torch-neuronx=="1.11.0.1." "neuronx-cc==2."

3. Set up Jupyter notebook#

To develop from a Jupyter notebook see Jupyter Notebook QuickStart

You can also run a Jupyter notebook as a script, first enable the ML framework Conda or Python environment of your choice and see Running Jupyter Notebook as script for instructions.

This document is relevant for: Inf2, Trn1, Trn2