TensorFlow 2.x (tensorflow-neuronx) Auto Multicore Replication (Beta) — AWS Neuron Documentation (original) (raw)

TensorFlow 2.x (tensorflow-neuron TF2.x) Auto Multicore Replication Python API (Beta)
TensorFlow Neuron TF2.x (tensorflow-neuronx TF2.x) Auto Multicore Replication CLI (Beta)

This document is relevant for: Inf2, Trn1

TensorFlow 2.x (`tensorflow-neuronx`) Auto Multicore Replication (Beta)#

The Neuron auto multicore replication Python API enables modifying TensorFlow 2.x models trace by `tensorflow_neuronx.trace` so that they can be automatically replicated across multiple cores.

Table of contents

TensorFlow 2.x (tensorflow-neuron TF2.x) Auto Multicore Replication Python API (Beta)
TensorFlow Neuron TF2.x (tensorflow-neuronx TF2.x) Auto Multicore Replication CLI (Beta)

TensorFlow 2.x (tensorflow-neuron TF2.x) Auto Multicore Replication Python API (Beta)#

Method#

tensorflow.neuron.auto_multicoreon models traced bytensorflow_neuronx.trace

Description#

Converts an existing AWS-Neuron-optimized keras.Model and returns an auto-replication tagged AWS-Multicore-Neuron-optimized keras.Model that can execute on AWS Machine Learning Accelerators. Like the traced model, the returned keras.Model will support inference only. Attributes or variables held by the original function or keras.Model will be dropped.

The auto model replication feature in TensorFlow-Neuron enables you to create a model once and the model parallel replication would happen automatically. The desired number of cores can be less than the total available NeuronCores on an trn1 or inf2 instance but not less than 1. This reduces framework memory usage as you are not loading the same model multiple times manually. Calls to the returned model will execute the call on each core in a round-robin fashion.

The returned keras.Model can be exported as SavedModel and served using TensorFlow Serving. Please see tensorflow-serving for more information about exporting to saved model and serving using TensorFlow Serving.

Note that the automatic replication will only work on models compiled with pipeline size 1: via --neuroncore-pipeline-cores=1. If auto replication is not enabled, the model will default to replicate on up to 4 cores.

See Neuron Compiler CLI Reference Guide (neuronx-cc) for more information about compiler options.

Arguments#

func: The keras.Model or function to be traced.
example_inputs: A tf.Tensor or a tuple/list/dict oftf.Tensor objects for tracing the function. When example_inputsis a tf.Tensor or a list of tf.Tensor objects, we expectfunc to have calling signature func(example_inputs). Otherwise, the expectation is that inference on func is done by callingfunc(*example_inputs) when example_inputs is a tuple, or func(**example_inputs) when example_inputs is a dict. The case where func accepts mixed positional and keyword arguments is currently unsupported.
num_cores: The desired number of cores where the model will be automatically replicated across

Returns#

An AWS-Multicore-Neuron-optimized keras.Model.

Example Python API Usage for TF2.x traced models:#

import tensorflow as tf import tensorflow.neuron as tfn import tensorflow_neuronx as tfnx

input0 = tf.keras.layers.Input(3) dense0 = tf.keras.layers.Dense(3)(input0) inputs = [input0] outputs = [dense0] model = tf.keras.Model(inputs=inputs, outputs=outputs) input0_tensor = tf.random.uniform([1, 3]) model_neuron = tfnx.trace(model, input0_tensor)

a trn1.2xlarge has 2 neuron cores

num_cores = 2 multicore_model = tfn.auto_multicore(model_neuron, input0_tensor, num_cores=num_cores) multicore_model(input0_tensor)

Example Python API Usage for TF2.x saved models:#

from tensorflow.python import saved_model

input0_tensor = tf.random.uniform([1, 3]) num_cores = 4 reload_model = saved_model.load(model_dir) multicore_model = tfn.auto_multicore(reload_model, input0_tensor, num_cores=num_cores)

TensorFlow Neuron TF2.x (tensorflow-neuronx TF2.x) Auto Multicore Replication CLI (Beta)#

The Neuron auto multicore replication CLI enables modifying Tensorflow 2.x traced saved models so that they can be automatically replicated across multiple cores. By performing this call on Tensorflow Saved Models, we can support Tensorflow-Serving without significant modifications to the code.

Method#

tf-neuron-auto-multicore MODEL_DIR --num_cores NUM_CORES --new_model_dir NEW_MODEL_DIR

Arguments#

MODEL_DIR: The directory of a saved AWS-Neuron-optimized keras.Model.
NUM_CORES: The desired number of cores where the model will be automatically replicated across
NEW_MODEL_DIR: The directory of where the AWS-Multicore-Neuron-optimizedkeras.Model will be saved

Example CLI Usage for Tensorflow-Serving saved models:#

tf-neuron-auto-multicore ./resnet --num_cores 8 --new_model_dir ./modified_resnet

This document is relevant for: Inf2, Trn1

TensorFlow 2.x (tensorflow-neuronx) Auto Multicore Replication (Beta) — AWS Neuron Documentation (original) (raw)

Contents