int4_tensor — Model Optimizer 0.27.1 (original) (raw)

Implements INT4 quantization for efficient tensor storage and computation.

Classes

INT4QTensor	Implements the INT4 quantization on tensors for more efficient storage or computation.

class INT4QTensor

Implements the INT4 quantization on tensors for more efficient storage or computation.

quantized_data

The quantized data stored as a packed uint8 tensor.

Type:

torch.Tensor

dequantize(dtype=None, **kwarg)

Dequantze INT4 packed tensor to a target dtype.

Parameters:

dtype (dtype) –

classmethod quantize(input, block_size)

Converting a tensor to a quantized format based on INT4 (AWQ) quantization.

Parameters:

Returns:

Contains quantized data, input quantization config, and scale quantization config.

Return type:

tuple