nf4_tensor — Model Optimizer 0.27.1 (original) (raw)
Implements NF4 quantization for efficient tensor storage and computation.
Classes
NF4QTensor | Implements the NF4 quantization on tensors for more efficient storage or computation. |
---|
class NF4QTensor
Bases: BaseQuantizedTensor
Implements the NF4 quantization on tensors for more efficient storage or computation.
quantized_data
The quantized data stored as a packed uint8 tensor.
Type:
torch.Tensor
dequantize(dtype=None, **kwarg)
Dequantze NF4 packed tensor to a target dtype.
Parameters:
dtype (dtype) –
classmethod double_quantization(scales, scale_block_size, num_scale_bits)
Perform double quantization on the scales.
Unlike the quantize method quantizing input data, this function quantizes float scales into int8 to further reduce memory usage of scales.
Parameters:
- scales (Tensor) –
- scale_block_size (int) –
- num_scale_bits (int) –
classmethod quantize(input, block_size)
Converting a tensor to a quantized format based on NF4 double quantization.
Parameters:
- input (torch.Tensor) – The input tensor to be quantized.
- block_size (int) – The size of each block for quantization.
- scale_block_size (int) – The block size for scaling during quantization.
Returns:
Contains quantized data, input quantization config, and scale quantization config.
Return type:
tuple