QuantGranularity

class qlip.quantization.QuantGranularity(type=<factory>, axis=0, block_size=64)

Bases: Mapping

Quantization granularity configuration.

Defines how quantization parameters (scale and offset) are shared across tensor dimensions. For example, per-channel uses a separate scale for each output channel, while per-tensor uses a single scale for the entire tensor.

Variables

  • type (QGranularityType) – Granularity type. For weights: per-channel, per-tensor, block. For activations: per-batch, per-sample, per-token.

  • axis (Optional[int]) – Axis for per-channel and block quantization. Default is 0.

  • block_size (Optional[int]) – Block size for block quantization. Default is 64.