Neuron Glossary — AWS Neuron Documentation (original) (raw)

Contents

This document is relevant for: Inf1, Inf2, Trn1, Trn2

Neuron Glossary#

Table of contents

Terms#

Neuron Devices (Accelerated Machine Learning chips)#

Term Description
Inferentia# AWS first generation accelerated machine learning chip supporting inference only
Trainium/Inferentia2# AWS second generation accelerated machine learning chip supporting training and inference
Trainium2# AWS second generation accelerated machine learning chip supporting training and inference
Neuron Device# Accelerated machine learning chip (e.g. Inferentia or Trainium)

Neuron powered Instances#

Term Description
Inf1# Inferentia powered accelerated compute EC2 instance
Trn1# Trainium powered accelerated compute EC2 instance
Inf2# Inferentia2 powered accelerated compute EC2 instance
Trn2# Trainium2 powered accelerated compute EC2 instance

NeuronCore terms#

Term Description
NeuronCore# The machine learning compute cores within Inferentia/Trainium
NeuronCore-v1# Neuron Core within Inferentia
NeuronCore-v2# Neuron Core within Trainium1/Inferentia2
NeuronCore-v3# Neuron Core within Trainium2
Tensor Engine# 2D systolic array (within the NeuronCore), used for matrix computations
Scalar Engine# A scalar-engine within each NeuronCore, which can accelerate element-wise operations (e.g. GELU, ReLU, reciprocal, etc)
Vector Engine# A vector-engine with each NeuronCore, which can accelerate spatial operations (e.g. layerNorm, TopK, pooling, etc)
GPSIMD Engine# Embedded General Purpose SIMD cores, within each NeuronCore, to accelerate custom-operators
Sync Engine# The SP engine, which is integrated inside NeuronCore. Used for synchronization and DMA triggering.
Collective Communication Engine# Dedicated engine for collective communication, allows for overlapping computation and communication
High Bandwidth Memory# High Bandwidth Memory, used as device memory for NeuronCore-v2 and beyond.
State Buffer# The main software-managed on-chip memory in NeuronCore-v1 and beyond.
Partial Sum Buffer# A second software-managed on-chip memory in NeuronCore-v1 and beyond, with near-memory accumulation support for TensorE output data.
NeuronLink# Interconnect between NeuronCores
NeuronLink-v1# Interconnect between NeuronCores in Inferentia device
NeuronLink-v2# Interconnect between NeuronCores in Trainium1/Inferentia2 device
NeuronLink-v3# Interconnect between NeuronCores in Trainium2 device

Neuron SDK terms#

Term Description
Neuron Kernel Interface# A bare-metal language and compiler for directly programming Neuron devices available on AWS Trainium/Inferentia2 and beyond devices.

Abbreviations#

Abbreviation Description
NxD Core# NeuronX Distributed Core Library
NxD Training# NeuronX Distributed Training Library
NxD Inference# NeuronX Distributed Inference Library
NC# Neuron Core
NeuronCore# Neuron Core
ND# Neuron Device
NeuronDevice# Neuron Device
TensorE# Tensor Engine
ScalarE# Scalar Engine
VectorE# Vector Engine
GpSimdE# GpSimd Engine
CCE# Collective Communication Engine
HBM# High Bandwidth Memory
SBUF# State Buffer
PSUM# Partial Sum Buffer
FP32# Float32
TF32# TensorFloat32
FP16# Float16
BF16# Bfloat16
cFP8# Configurable Float8
RNE# Round Nearest Even
SR# Stochastic Rounding
NKI# Neuron Kernel Interface
CustomOps# Custom Operators
RT# Neuron Runtime
DP# Data Parallel
DPr# Data Parallel degree
TP# Tensor Parallel
TPr# Tensor Parallel degree
PP# Pipeline Parallel
PPr# Pipeline Parallel degree

This document is relevant for: Inf1, Inf2, Trn1, Trn2