NeuronPerf FAQ — AWS Neuron Documentation (original) (raw)

When should I use NeuronPerf?
When should I not use NeuronPerf?
Which frameworks does NeuronPerf support?
Which Neuron instance types does NeuronPerf support?
Is NeuronPerf Open Source?
What is the secret to obtaining the best numbers?
What are the “best practices” that NeuronPerf uses?

This document is relevant for: Inf1, Inf2, Trn1, Trn2

NeuronPerf FAQ#

Table of contents

When should I use NeuronPerf?
When should I not use NeuronPerf?
Which frameworks does NeuronPerf support?
Which Neuron instance types does NeuronPerf support?
Is NeuronPerf Open Source?
What is the secret to obtaining the best numbers?
What are the “best practices” that NeuronPerf uses?

When should I use NeuronPerf?#

When you want to measure the highest achievable performance for your model with Neuron.

When should I not use NeuronPerf?#

When measuring end-to-end performance that includes your network serving stack. Instead, your should compare your e2e numbers to those obtained by NeuronPerf to optimize your serving overhead.

Which frameworks does NeuronPerf support?#

See NeuronPerf Framework Notes.

Which Neuron instance types does NeuronPerf support?#

PyTorch and TensorFlow support all instance types. MXNet support is limited to inf1.

Is NeuronPerf Open Source?#

Yes. You can download the source here.

What is the secret to obtaining the best numbers?#

There is no secret sauce. NeuronPerf follows best practices.

What are the “best practices” that NeuronPerf uses?#

These vary slightly by framework and how your model was compiled
For a model compiled for a single NeuronCore (DataParallel):
- To maximize throughput, for N models, use 2 * N worker threads
- To minimize latency, use 1 worker thread per model
Use a new Python process for each model to avoid GIL contention
Ensure you benchmark long enough for your numbers to stabilize
Ignore outliers at the start and end of inference benchmarking