NeuronPerf API — AWS Neuron Documentation (original) (raw)
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2
NeuronPerf API#
Due to a bug in Sphinx, some of the type annotations may be incomplete. You can download the source code here. In the future, the source will be hosted in a more browsable way.
compile(compile_fn, model, inputs, batch_sizes: Union[int, List[int]] = None, pipeline_sizes: Union[int, List[int]] = None, performance_levels: Union[str, List[int]] = None, models_dir: str = 'models', filename: str = None, compiler_args: dict = None, verbosity: int = 1, *args, **kwargs) → str:#
Compiles the provided model with each provided example input, pipeline size, and performance level. Any additional compiler_args passed will be forwarded to the compiler on every invocation.
Parameters:
- model – The model to compile.
- inputs (list) – A list of example inputs.
- batch_sizes – A list of batch sizes that correspond to the example inputs.
- pipeline_sizes – A list of pipeline sizes to use. See NeuronCore Pipeline.
- performance_levels – A list of performance levels to try. Options are: 0 (max accuracy), 1, 2, 3 (max performance, default). See Mixed precision and performance-accuracy tuning (neuron-cc).
- models_dir (str) – The directory where compilation artifacts will be stored.
- model_name (str) – An optional model name tag to apply to compiled artifacts.
- filename (str) – The name of the model index to write out. If not provided, a name will be generated and returned.
- compiler_args (dict) – Additional compiler arguments to be forwarded with every compilation.
- verbosity (int) – 0 = error, 1 = info, 2 = debug
Returns:
A model index filename. If a configuration fails to compile, it will not be included in the index and an error will be logged.
Return type:
benchmark(load_fn: Callable[[str, int], Any], model_filename: str, inputs: Any, batch_sizes: Union[int, List[int]] = None, duration: float = BENCHMARK_SECS, n_models: Union[int, List[int]] = None, pipeline_sizes: Union[int, List[int]] = None, cast_modes: Union[str, List[str]] = None, workers_per_model: Union[int, None] = None, env_setup_fn: Callable[[int, Dict], None] = None, setup_fn: Callable[[int, Dict, Any], None] = None, preprocess_fn: Callable[[Any], Any] = None, postprocess_fn: Callable[[Any], Any] = None, dataset_loader_fn: Callable[[Any, int], Any] = None, verbosity: int = 1, multiprocess: bool = True, multiinterpreter: bool = False, return_timers: bool = False, device_type: str = 'neuron') → List[Dict]:#
Benchmarks the model index or individiual model using the provided inputs. If a model index is provided, additional fields such as pipeline_sizes
andperformance_levels
can be used to filter the models to benchmark. The default behavior is to benchmark all configurations in the model index.
Parameters:
- load_fn – A function that accepts a model filename and device id, and returns a loaded model. This is automatically passed through the subpackage calls (e.g.
neuronperf.torch.benchmark
). - model_filename (str) – A path to a model index from compile or path to an individual model. For CPU benchmarking, a class should be passed that can be instantiated with a default constructor (e.g.
MyModelClass
). - inputs (list) – A list of example inputs. If the list contains tuples, they will be destructured on inference to support multiple arguments.
- batch_sizes – A list of ints indicating batch sizes that correspond to the inputs. Assumes 1 if not provided.
- duration (float) – The number of seconds to benchmark each model.
- n_models – The number of models to run in parallel. Default behavior runs 1 model and the max number of models possible, determined by a best effort from
device_type
, instance size, or other environment state. - pipeline_sizes – A list of pipeline sizes to use. See NeuronCore Pipeline.
- performance_levels – A list of performance levels to try. Options are: 0 (max accuracy), 1, 2, 3 (max performance, default). See Mixed precision and performance-accuracy tuning (neuron-cc).
- workers_per_model – The number of workers to use per model loaded. If
None
, this is automatically selected. - env_setup_fn – A custom environment setup function to run in each subprocess before model loading. It will receive the benchmarker id and config.
- setup_fn – A function that receives the benchmarker id, config, and model to perform last minute configuration before inference.
- preprocess_fn – A custom preprocessing function to perform on each input before inference.
- postprocess_fn – A custom postprocessing function to perform on each input after inference.
- multiprocess (bool) – When True, model loading is dispatched to forked subprocesses. Should be left alone unless debugging.
- multiinterpreter (bool) – When True, benchmarking is performed in a new python interpreter per model. All parameters must be serializable. Overrides multiprocess.
- return_timers (bool) – When True, the return of this function is a list of tuples
(config, results)
with detailed information. This can be converted to reports withget_reports(results)
. - stats_interval (float) – Collection interval (in seconds) for metrics during benchmarking, such as CPU and memory usage.
- device_type (str) – This will be set automatically to one of the
SUPPORTED_DEVICE_TYPES
. - cost_per_hour (float) – The price of this device / hour. Used to estimate cost / 1 million infs in reports.
- model_name (str) – A friendly name for the model to use in reports.
- model_class_name (str) – Internal use.
- model_class_file (str) – Internal use.
- verbosity (int) – 0 = error, 1 = info, 2 = debug
Returns:
A list of benchmarking results.
Return type:
get_reports(results)#
Summarizes and combines the detailed results from neuronperf.benchmark
, when run with return_timers=True
. One report dictionary is produced per model configuration benchmarked. The list of reports can be fed directly to other reporting utilities, such as neuronperf.write_csv
.
Parameters:
- results (list_[_tuple]) – The list of results from
neuronperf.benchmark
. - batch_sizes (list_[_int]) – The batch sizes that correspond to the inputs provided to
compile
andbenchmark
. Used to correct throughput values in the reports.
Returns:
A list of dictionaries that summarize the results for each model configuration.
Return type:
print_reports(reports, cols=SUMMARY_COLS, sort_by='throughput_peak', reverse=False)#
Print a report to the terminal. Example of default behavior:
neuronperf.print_reports(reports) throughput_avg latency_ms_p50 latency_ms_p99 n_models pipeline_size workers_per_model batch_size model_filename 329.667 6.073 6.109 1 1 2 1 models/model_b1_p1_83bh3hhs.pt
Parameters:
- reports – Results from get_reports.
- cols – The columns in the report to be displayed.
- sort_by – Sort the cols by the specified key.
- reverse – Sort order.
write_csv(reports: list[dict], filename: str = None, cols=REPORT_COLS)#
Write benchmarking reports to CSV file.
Parameters:
- reports (list_[_dict]) – Results from neuronperf.get_reports.
- filename (str) – Filename to write. If not provided, generated from model_name in report and current timestamp.
- cols (list_[_str]) – The columns in the report to be kept.
Returns:
The filename written.
Return type:
write_json(reports: list[dict], filename: str = None)#
Writes benchmarking reports to a JSON file.
param list[dict] reports:
Results from neuronperf.get_reports.
param str filename:
Filename to write. If not provided, generated from model_name in report and current timestamp.
return:
The filename written.
rtype:
str
model_index.append(*model_indexes: Union[str, dict]) → dict:#
Appends the model indexes non-destructively into a new model index, without modifying any of the internal data.
This is useful if you have benchmarked multiple related models and wish to combine their respective model indexes into a single index.
Model name will be taken from the first index provided. Duplicate configs will be filtered.
Parameters:
model_indexes – Model indexes or paths to model indexes to combine.
Returns:
A new dictionary representing the combined model index.
Return type:
model_index.copy(old_index: Union[str, dict], new_index: str, new_dir: str) → str:#
Copy an index to a new location. Will rename old_index
to new_index
and copy all model files into new_dir
, updating the index paths.
This is useful for pulling individual models out of a pool.
Returns the path to the new index.
model_index.create(filename, input_idx=0, batch_size=1, pipeline_size=1, cast_mode=DEFAULT_CAST, compile_s=None)#
Create a new model index from a pre-compiled model.
Parameters:
- filename (str) – The path to the compiled model.
- input_idx (int) – The index in your inputs that this model should be run on.
- batch_size (int) – The batch size at compilation for this model.
- pipeline_size (int) – The pipeline size used at compilation for this model.
- cast_mode (str) – The casting option this model was compiled with.
- compile_s (float) – Seconds spent compiling.
Returns:
A new dictionary representing a model index.
Return type:
model_index.delete(filename: str):
Deletes the model index and all associated models referenced by the index.
model_index.filter(index: Union[str, dict], **kwargs) → dict:#
Filters provided model index on provided criteria and returns a new index. Each kwarg is a standard (k, v) pair, where k is treated as a filter name and v may be one or more values used to filter model configs.
model_index.load(filename) → dict:#
Load a NeuronPerf model index from a file.
model_index.move(old_index: str, new_index: str, new_dir: str) → str:#
This is the same as copy
followed by delete
on the old index.
model_index.save(model_index, filename: str = None, root_dir=None) → str:#
Save a NeuronPerf model index to a file.
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2