modelopt.torch.quantization.compress — Model Optimizer 0.31.0 (original) (raw)
compress(model, config=None)
Compress model weights of quantized model.
This function compresses weights in layers that have an enabled weight_quantizer with a supported quantization format. The compression is controlled by a pattern-based configuration.
Parameters:
- model – The quantized model to compress.
- config (dict [ str , bool ] | None | CompressConfig) –
Dictionary mapping layer patterns to boolean compression flags. IfNone
, defaults to{"default": True}
which compresses all supported layers.
Example configuration:
{
".mlp.fc1": False, # Skip compression for fc1 layers
"default": True, # Compress all other layers
}
Note: Each configuration except “default” is applied sequentially; therefore the later configurations will override the previous ones if the same layer is matched.
Note: This function modifies the input model in-place.