QuantGranularity¶
- class qlip.quantization.QuantGranularity(type=<factory>, axis=0, block_size=64)¶
Bases:
MappingQuantization granularity configuration.
Defines how quantization parameters (scale and offset) are shared across tensor dimensions. For example,
per-channeluses a separate scale for each output channel, whileper-tensoruses a single scale for the entire tensor.Variables
type (
QGranularityType) – Granularity type. For weights:per-channel,per-tensor,block. For activations:per-batch,per-sample,per-token.axis (
Optional[int]) – Axis for per-channel and block quantization. Default is 0.block_size (
Optional[int]) – Block size for block quantization. Default is 64.