int4_tensor — Model Optimizer 0.27.1 (original) (raw)
Implements INT4 quantization for efficient tensor storage and computation.
Classes
INT4QTensor | Implements the INT4 quantization on tensors for more efficient storage or computation. |
---|
class INT4QTensor
Bases: BaseQuantizedTensor
Implements the INT4 quantization on tensors for more efficient storage or computation.
quantized_data
The quantized data stored as a packed uint8 tensor.
Type:
torch.Tensor
dequantize(dtype=None, **kwarg)
Dequantze INT4 packed tensor to a target dtype.
Parameters:
dtype (dtype) –
classmethod quantize(input, block_size)
Converting a tensor to a quantized format based on INT4 (AWQ) quantization.
Parameters:
- input (torch.Tensor) – The input tensor to be quantized.
- block_size (int) – The size of each block for quantization.
Returns:
Contains quantized data, input quantization config, and scale quantization config.
Return type:
tuple