modelopt.torch.quantization.compress — Model Optimizer 0.27.1 (original) (raw)

compress(quant_model, compress_config={'default': True})

Compress model weights of quantized model.

This function compresses weights in layers that have an enabled weight_quantizer with a supported quantization format. The compression is controlled by a pattern-based configuration.

Parameters:

Note: This function modifies the input model in-place.