QuantizationManager

class qlip.quantization.QuantizationManager

Bases: BaseManager

Provides mechanism for quantization of module weights and input/output activations by setting corresponding hooks.

setup_modules(modules, *, weights_scheme=int32, weights_granularity='per-channel', activations_scheme=int32, activations_granularity='per-batch', placement='input', quantization_mode='static', calibration_iterations=1, train_activations_scale=False, train_activations_offset=False, train_weights_scale=False, train_weights_offset=False, observer=None, observer_kwargs={}, weights_scale_offset_dtype=torch.float32, activations_scale_offset_dtype=torch.float32, **kwargs)

Attaches quantization parameters and sets quantization hooks.

Parameters

  • modules (list[torch.nn.Module]) – List of modules to setup.

  • weights_scheme (QuantScheme) – Weights scheme.

  • weights_granularity (Union[str, QuantGranularity, Dict[str, Any]]) – Default is per-channel.

  • activations_scheme (QuantScheme) – Activations scheme.

  • activations_granularity (Union[str, QuantGranularity, Dict[str, Any]]) – Possible values: per-batch, per-sample or per-token. Default is per-batch.

  • placement (Union[str, QuantPlacement]) – Three possible values: input, output or attention. Quantize input / output activations or attention. Default is input.

  • quantization_mode (Union[str, QuantizationMode]) – Two possible values: static or dynamic. Default is static.

  • calibration_iterations (size_t) – The number of batches to initialize scale and offset parameters for activations quantizer. Ignored if quantization_mode=’dynamic’. Default is 1.

  • train_activations_scale (bool) – Sets requires_grad attribute for activations scale parameters.

  • train_activations_offset (bool) – Sets requires_grad attribute for activations offset parameters.

  • train_weights_scale (bool) – Sets requires_grad attribute for weights scale parameters.

  • train_weights_offset (bool) – Sets requires_grad attribute for weights offset parameters.

  • observer (qlip.observers.IObserver) – Implemented observer interface. Default is MinMaxObserver.

  • observer_kwargs (dict) – Additional keyword arguments for observer constructor.

  • weights_scale_offset_dtype (torch.dtype) – Data type for weights scale and offset parameters. Default is torch.float32.

  • activations_scale_offset_dtype (torch.dtype) – Data type for activations scale and offset parameters. Default is torch.float32.

setup_modules_groups(modules_groups)

Quantize each module group using given config.

Parameters

  • modules_groups (list[dict```{'modules', 'config'}`]) – List of dictionaries. Each dictionary have two keys: modules and config. Value corresponding to modules is a list of torch.nn.Modules and value corresponding to config is a dictionary with keys as kwargs in QuantizationManager.setup_modules method.

get_quantization_parameters()

Collects all scales and offsets parameters from quantized modules.

Returns

  • quantization_params (dict) – Dictionary with 4 keys: weights_scales, weights_offsets, `activations_scales, activations_offsets. Contains scales and offsets parameters of serving modules.

weights_round(value)

Turn on/off the weights round on train mode.

Parameters

  • value (bool) – If True weights would be rounded on train mode.

activations_round(value)

Turn on/off the activations round on train mode.

Parameters

  • value (bool) – If True activations would be rounded on train mode.

extend(qmanager)

Constructs single QuantizationManager object by concatenation of quantized modules.

static replace_modules(model, module_type, replace_constructor, inplace=True)

Replaces all modules of specific type in model using replacing constructor.

Parameters

  • model (torch.nn.Module)

  • module_type (type(torch.nn.Module))

  • replace_constructor (callable) – Method which returns new module.

  • inplace (bool) – If True then modifies input model object. Otherwise returns copy.

Examples

If we have to reimplement some layer logic and want to replace modules of a specific type by our new implementation.

static make_param_groups(model)

Returns dictionary with keys: params, scales, offsets. At each key dictionary contains list of parameters. At scales, offsets contains all scale and offset parameters from each layer. Other parameters collected at params key.

set_observer_sync(value=True)

Turn on/off observer activation synchronization.

Parameters

  • value (bool) – If True weights would be rounded on train mode.

enable(val)

Enable/disable quantization.

remove(leave_parametrized=False)

Remove quantization from all managed modules.

Parameters

  • leave_parametrized (bool) – If True, keep the parametrized wrapper around modules but remove quantization hooks. By default False.

setup_model(model)

Abstract method for wrapping a model.

This method should be implemented in derived classes to define the specific wrapping behavior for a model.

Parameters

  • model (Any) – The model to be wrapped.

Returns

  • Any – The wrapped model.

get_quantizers()

Return the dictionary of quantizers for all managed modules.

Returns

  • dict – Mapping of (module, parameter_name) tuples to their quantizer instances.