int4_tensor — Model Optimizer 0.27.1 (original) (raw)

Implements INT4 quantization for efficient tensor storage and computation.

Classes

INT4QTensor Implements the INT4 quantization on tensors for more efficient storage or computation.

class INT4QTensor

Bases: BaseQuantizedTensor

Implements the INT4 quantization on tensors for more efficient storage or computation.

quantized_data

The quantized data stored as a packed uint8 tensor.

Type:

torch.Tensor

dequantize(dtype=None, **kwarg)

Dequantze INT4 packed tensor to a target dtype.

Parameters:

dtype (dtype) –

classmethod quantize(input, block_size)

Converting a tensor to a quantized format based on INT4 (AWQ) quantization.

Parameters:

Returns:

Contains quantized data, input quantization config, and scale quantization config.

Return type:

tuple