Neuron Glossary — AWS Neuron Documentation (original) (raw)
Contents
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2
Neuron Glossary#
Table of contents
Term |
Description |
Inferentia# |
AWS first generation accelerated machine learning chip supporting inference only |
Trainium/Inferentia2# |
AWS second generation accelerated machine learning chip supporting training and inference |
Trainium2# |
AWS second generation accelerated machine learning chip supporting training and inference |
Neuron Device# |
Accelerated machine learning chip (e.g. Inferentia or Trainium) |
Term |
Description |
Inf1# |
Inferentia powered accelerated compute EC2 instance |
Trn1# |
Trainium powered accelerated compute EC2 instance |
Inf2# |
Inferentia2 powered accelerated compute EC2 instance |
Trn2# |
Trainium2 powered accelerated compute EC2 instance |
Term |
Description |
NeuronCore# |
The machine learning compute cores within Inferentia/Trainium |
NeuronCore-v1# |
Neuron Core within Inferentia |
NeuronCore-v2# |
Neuron Core within Trainium1/Inferentia2 |
NeuronCore-v3# |
Neuron Core within Trainium2 |
Tensor Engine# |
2D systolic array (within the NeuronCore), used for matrix computations |
Scalar Engine# |
A scalar-engine within each NeuronCore, which can accelerate element-wise operations (e.g. GELU, ReLU, reciprocal, etc) |
Vector Engine# |
A vector-engine with each NeuronCore, which can accelerate spatial operations (e.g. layerNorm, TopK, pooling, etc) |
GPSIMD Engine# |
Embedded General Purpose SIMD cores, within each NeuronCore, to accelerate custom-operators |
Sync Engine# |
The SP engine, which is integrated inside NeuronCore. Used for synchronization and DMA triggering. |
Collective Communication Engine# |
Dedicated engine for collective communication, allows for overlapping computation and communication |
High Bandwidth Memory# |
High Bandwidth Memory, used as device memory for NeuronCore-v2 and beyond. |
State Buffer# |
The main software-managed on-chip memory in NeuronCore-v1 and beyond. |
Partial Sum Buffer# |
A second software-managed on-chip memory in NeuronCore-v1 and beyond, with near-memory accumulation support for TensorE output data. |
NeuronLink# |
Interconnect between NeuronCores |
NeuronLink-v1# |
Interconnect between NeuronCores in Inferentia device |
NeuronLink-v2# |
Interconnect between NeuronCores in Trainium1/Inferentia2 device |
NeuronLink-v3# |
Interconnect between NeuronCores in Trainium2 device |
Term |
Description |
Neuron Kernel Interface# |
A bare-metal language and compiler for directly programming Neuron devices available on AWS Trainium/Inferentia2 and beyond devices. |
Abbreviation |
Description |
NxD Core# |
NeuronX Distributed Core Library |
NxD Training# |
NeuronX Distributed Training Library |
NxD Inference# |
NeuronX Distributed Inference Library |
NC# |
Neuron Core |
NeuronCore# |
Neuron Core |
ND# |
Neuron Device |
NeuronDevice# |
Neuron Device |
TensorE# |
Tensor Engine |
ScalarE# |
Scalar Engine |
VectorE# |
Vector Engine |
GpSimdE# |
GpSimd Engine |
CCE# |
Collective Communication Engine |
HBM# |
High Bandwidth Memory |
SBUF# |
State Buffer |
PSUM# |
Partial Sum Buffer |
FP32# |
Float32 |
TF32# |
TensorFloat32 |
FP16# |
Float16 |
BF16# |
Bfloat16 |
cFP8# |
Configurable Float8 |
RNE# |
Round Nearest Even |
SR# |
Stochastic Rounding |
NKI# |
Neuron Kernel Interface |
CustomOps# |
Custom Operators |
RT# |
Neuron Runtime |
DP# |
Data Parallel |
DPr# |
Data Parallel degree |
TP# |
Tensor Parallel |
TPr# |
Tensor Parallel degree |
PP# |
Pipeline Parallel |
PPr# |
Pipeline Parallel degree |
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2