modelopt.onnx.quantization.quantize — Model Optimizer 0.31.0 (original) (raw)

quantize(onnx_path, quantize_mode='int8', calibration_data=None, calibration_method=None, calibration_cache_path=None, calibration_shapes=None, calibration_eps=['cpu', 'cuda:0', 'trt'], override_shapes=None, op_types_to_quantize=None, op_types_to_exclude=None, nodes_to_quantize=None, nodes_to_exclude=None, use_external_data_format=False, keep_intermediate_files=False, output_path=None, log_level='INFO', log_file=None, trt_plugins=None, trt_plugins_precision=None, high_precision_dtype=None, mha_accumulation_dtype='fp16', disable_mha_qdq=False, dq_only=True, block_size=None, use_zero_point=False, passes=['concat_elimination'], simplify=False, **kwargs)

Quantizes the provided ONNX model.

Parameters:

Returns:

None, writes the quantized onnx model in the supplied output_path or writes to the same directory with filename like “<model_name>.quant.onnx”.

Return type:

None