Install with support for C++11 ABI — AWS Neuron Documentation (original) (raw)

This document is relevant for: Inf2, Trn1, Trn2, Trn3

Install with support for C++11 ABI#

Warning

The intended user of this guide is using a custom built version oftorch and torch-xla or compiling a non-python application which must be built using the C++11 ABI.

Most applications do not require this specialized distribution.

For regular installation instructions see: Fresh install

The standard torch-neuronx packages (which are normally installed according to the Fresh install guide) are compiled with the pre-C++11 ABI and linked against the pre-C++11 libtorch. These compilation options ensure that the torch-neuronx ABI matches the _publicly_released version of the torch and torch-xla packages that are installed from the default PyPI index.

To support applications with specific ABI requirements, Neuron distributes packages which are linked against the C++11 version oflibtorch. These torch-neuronx packages are built using the-D_GLIBCXX_USE_CXX11_ABI=1 compilation flag.

Note

The libneuronxla packages are already built with both pre-C++11 ABI and C++11 ABI symbols so the same PIP package can be used for C++11 ABI applications.

The only difference between these packages and the standard packages is the torch plugin library contained within the package. This is thelibtorchneuron.so library located in the torch_neuronx/lib/ package directory. All other libraries and python files within the packages are identical. This means that these C++11-compatible packages are drop-in replacements in environments that are incompatible with the standard releases oftorch-neuronx. The behavior is identical whether compiling models, executing inferences or running training.

Installation#

All versions of the library are available to download from the following pip index:

https://pip.repos.neuron.amazonaws.com/cxx11

To install a wheel, it is recommended to use the --no-deps flag since versions of torch and torch-xla compiled using the C++11 ABI are not distributed on this index.

pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com/cxx11 torch-neuronx --no-deps

Specific versions of torch-neuronx with C++11 ABI support can be installed just like standard versions of torch-neuronx.

pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com/cxx11 "torch-neuronx==2.5." --no-deps pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com/cxx11 "torch-neuronx==2.6." --no-deps

Important

This pip index does not include a distribution of torch and torch-xla compiled with the new C++11 ABI. The intent of this index is only to provide Neuron SDK wheels. See Building torch and torch-xla with C++11 ABI.

The version of torch and torch-xla that are distributed on the default PyPI index is compiled with the old pre-C++11 ABI.

If a C++11 torch-neuronx package is installed with dependencies using the default PyPI index, then the installed version of torch and torch-xla will be using the pre-C++11 ABI and torch-neuronx will be using the C++11 ABI. This ABI mismatch will lead to undefined symbol errors in both Python usage and at link time for non-Python applications.

Building `torch` and `torch-xla` with C++11 ABI#

The instructions for building torch from source are at pytorch/pytorch

The instructions for building torch-xla from source are at pytorch/xla

The following are simplified instructions (subject to change):

Setting the build environment:

sudo apt install cmake pip install yapf==0.30.0 wget https://github.com/bazelbuild/bazelisk/releases/download/v1.20.0/bazelisk-linux-amd64 sudo cp bazelisk-linux-amd64 /usr/local/bin/bazel

Build torch (CPU only) and torch-xla wheels for version 2.5:

git clone --recursive https://github.com/pytorch/pytorch --branch v2.5.1 cd pytorch/ git clone --recursive https://github.com/pytorch/xla.git --branch v2.5.1 _GLIBCXX_USE_CXX11_ABI=1 python setup.py bdist_wheel

pip wheel will be present in ./dist

cd xla/ CXX_ABI=1 python setup.py bdist_wheel

pip wheel will be present in ./dist

Build torch (CPU only) and torch-xla wheels for version 2.6:

git clone --recursive https://github.com/pytorch/pytorch --branch v2.6.0 cd pytorch/ git clone --recursive https://github.com/pytorch/xla.git --branch r2.6_aws_neuron _GLIBCXX_USE_CXX11_ABI=1 python setup.py bdist_wheel

pip wheel will be present in ./dist

cd xla/ CXX_ABI=1 python setup.py bdist_wheel

pip wheel will be present in ./dist

FAQ#

When should I use a C++11 torch-neuronx wheel?#

Distributions compiled with the new C++11 ABI should only be used in the following cases:

You have built your own version of torch and torch-xla which uses the new C++11 ABI and need a corresponding version of torch-neuronx that is compatible.
You are compiling an application against a libtorchwhich uses the C++11 ABI and would like to includelibtorchneuron.so as well. Torch distributes these C++11 libtorchlibraries with a libtorch-cxx11 prefix.

Example:

https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.5.1%2Bcpu.zip

How can I know which ABI torch-neuronx is using?#

Packages which use the pre-C++11 ABI have no local identifier and use the following version scheme:

Packages which use the C++11 ABI have a +cxx11 local identifier and use following version scheme:

.+cxx11

This allows the ABI to be validated in the by inspecting the local identifier (or version suffix).

Example:

2.5.1.2.4.0+cxx11 2.6.1.2.4.0+cxx11

How can I know which ABI torch is using?#

The torch python package provides an API at the that allows you to check if the underlying libtorch was compiled with the C++11 ABI:

import torch torch.compiled_with_cxx11_abi() # True/False

Currently torch-neuronx does not have an equivalent API. If the C++11 ABI was used, it will be visible in the version string (See How can I know which ABI torch-neuronx is using?).

Troubleshooting#

What Python errors could I see if I mix ABI versions?#

Using a version of torch compiled with the C++11 ABI will trigger an error in the python interpreter when importing a version of torch-neuronx using the old (pre-C++11) ABI from the standard index. This will manifest as an error when the import torch_neuronx statement is executed.

Traceback (most recent call last): File "/python3.9/site-packages/torch_neuron/init.py", line 64, in _register_extension() File "/python3.9/site-packages/torch_neuron/init.py", line 60, in _register_extension torch.ops.load_library(neuron_op_filename) File "/python3.9/site-packages/torch/_ops.py", line 110, in load_library ctypes.CDLL(path) File "/python3.9/ctypes/init.py", line 364, in init self._handle = _dlopen(self._name, mode) OSError: /python3.9/site-packages/torch_neuron/lib/libtorchneuron.so: undefined symbol: ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9type_infoS6

Similarly, when using the standard pre-C++11 versions of torch/torch-xla with the C++11 version of torch-neuronx, an error would also occur at import.

Traceback (most recent call last): File "/python3.9/site-packages/torch_neuron/init.py", line 79, in _register_extension() File "/python3.9/site-packages/torch_neuron/init.py", line 75, in _register_extension torch.ops.load_library(neuron_op_filename) File "/python3.9/site-packages/torch/_ops.py", line 110, in load_library ctypes.CDLL(path) File "/python3.9/ctypes/init.py", line 364, in init self._handle = _dlopen(self._name, mode) OSError: /python3.9/site-packages/torch_neuron/lib/libtorchneuron.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

In either of these cases, the remedy is to ensure that the ABI of the torch and torch-xladistribution matches the ABI of the torch-neuronx distribution.

What compiler/linking errors could I see if I mix ABI versions?#

If you link an application which uses the old (pre-C++11) ABIlibtorchneuron.so with a C++11 version of torch, this will trigger a link error.

libtorchneuron.so: undefined reference to torch::detail::class_base::class_base(std::string const&, std::string const&, std::string, std::type_info const&, std::type_info const&)' libtorchneuron.so: undefined reference to c10::Error::Error(c10::SourceLocation, std::string)' libtorchneuron.so: undefined reference to c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::string const&)' libtorchneuron.so: undefined reference to c10::ClassType::getMethod(std::string const&) const' libtorchneuron.so: undefined reference to c10::ivalue::ConstantString::create(std::string)' libtorchneuron.so: undefined reference to c10::DeviceTypeName(c10::DeviceType, bool)' libtorchneuron.so: undefined reference to torch::jit::parseSchema(std::string const&)' libtorchneuron.so: undefined reference to unsigned short caffe2::TypeMeta::_typeMetaDatastd::string()' libtorchneuron.so: undefined reference to c10::Warning::warn(c10::SourceLocation const&, std::string const&, bool)' libtorchneuron.so: undefined reference to torch::jit::parseSchemaOrName(std::string const&)' libtorchneuron.so: undefined reference to c10::Symbol::fromQualString(std::string const&)' libtorchneuron.so: undefined reference to c10::Error::Error(std::string, std::string, void const*)' libtorchneuron.so: undefined reference to c10::detail::infer_schema::make_function_schema(std::string&&, std::string&&, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>)' libtorchneuron.so: undefined reference to c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&)' libtorchneuron.so: undefined reference to `torch::jit::canonicalSchemaString(c10::FunctionSchema const&)'

Similarly, an error will also occur in the opposite scenario where the C++11 libtorchneuron.so library is used with the pre-C++11 libtorch:

libtorchneuron.so: undefined reference to c10::ivalue::ConstantString::create(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)' libtorchneuron.so: undefined reference to torch::jit::parseSchemaOrName(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)' libtorchneuron.so: undefined reference to c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)' libtorchneuron.so: undefined reference to c10::Error::Error(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, void const*)' libtorchneuron.so: undefined reference to torch::jit::canonicalSchemaString[abi:cxx11](c10::FunctionSchema const&)' libtorchneuron.so: undefined reference to torch::detail::class_base::class_base(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::type_info const&, std::type_info const&)' libtorchneuron.so: undefined reference to c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)' libtorchneuron.so: undefined reference to c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)' libtorchneuron.so: undefined reference to c10::detail::infer_schema::make_function_schema(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>)' libtorchneuron.so: undefined reference to torch::jit::parseSchema(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)' libtorchneuron.so: undefined reference to c10::DeviceTypeName[abi:cxx11](c10::DeviceType, bool)' libtorchneuron.so: undefined reference to c10::Symbol::fromQualString(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)' libtorchneuron.so: undefined reference to unsigned short caffe2::TypeMeta::_typeMetaData<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >()' libtorchneuron.so: undefined reference to c10::ClassType::getMethod(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) const' libtorchneuron.so: undefined reference to `c10::Warning::warn(c10::SourceLocation const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool)'

In either of these cases, the remedy is to ensure that the ABI of thelibtorch distribution matches the ABI of the libtorchneuron.sodistribution.

The torch and torch-xla ABI must match the torch-neuron ABI or an undefined symbol error will occur.

This document is relevant for: Inf2, Trn1, Trn2, Trn3

Install with support for C++11 ABI — AWS Neuron Documentation (original) (raw)