QuantizationManager¶

class qlip.quantization.QuantizationManager¶

Bases: BaseManager

Provides mechanism for quantization of module weights and input/output activations by setting corresponding hooks.

setup_modules(modules, *, weights_scheme=int32, weights_granularity='per-channel', activations_scheme=int32, activations_granularity='per-batch', placement='input', quantization_mode='static', calibration_iterations=1, train_activations_scale=False, train_activations_offset=False, train_weights_scale=False, train_weights_offset=False, observer=None, observer_kwargs={}, weights_scale_offset_dtype=torch.float32, activations_scale_offset_dtype=torch.float32, **kwargs)¶

Attaches quantization parameters and sets quantization hooks.

Parameters

modules (list[torch.nn.Module]) – List of modules to setup.
weights_scheme (QuantScheme) – Weights scheme.
weights_granularity (Union[str, QuantGranularity, Dict[str, Any]]) – Default is per-channel.
activations_scheme (QuantScheme) – Activations scheme.
activations_granularity (Union[str, QuantGranularity, Dict[str, Any]]) – Possible values: per-batch, per-sample or per-token. Default is per-batch.
placement (Union[str, QuantPlacement]) – Three possible values: input, output or attention. Quantize input / output activations or attention. Default is input.
quantization_mode (Union[str, QuantizationMode]) – Two possible values: static or dynamic. Default is static.
calibration_iterations (size_t) – The number of batches to initialize scale and offset parameters for activations quantizer. Ignored if quantization_mode=’dynamic’. Default is 1.
train_activations_scale (bool) – Sets requires_grad attribute for activations scale parameters.
train_activations_offset (bool) – Sets requires_grad attribute for activations offset parameters.
train_weights_scale (bool) – Sets requires_grad attribute for weights scale parameters.
train_weights_offset (bool) – Sets requires_grad attribute for weights offset parameters.
observer (qlip.observers.IObserver) – Implemented observer interface. Default is MinMaxObserver.
observer_kwargs (dict) – Additional keyword arguments for observer constructor.
weights_scale_offset_dtype (torch.dtype) – Data type for weights scale and offset parameters. Default is torch.float32.
activations_scale_offset_dtype (torch.dtype) – Data type for activations scale and offset parameters. Default is torch.float32.

setup_modules_groups(modules_groups)¶

Quantize each module group using given config.

Parameters

modules_groups (list[dict```{'modules', 'config'}`]) – List of dictionaries. Each dictionary have two keys: modules and config. Value corresponding to modules is a list of torch.nn.Modules and value corresponding to config is a dictionary with keys as kwargs in QuantizationManager.setup_modules method.

get_quantization_parameters()¶

Collects all scales and offsets parameters from quantized modules.

Returns

quantization_params (dict) – Dictionary with 4 keys: weights_scales, weights_offsets, `activations_scales, activations_offsets. Contains scales and offsets parameters of serving modules.

weights_round(value)¶

Turn on/off the weights round on train mode.

Parameters

value (bool) – If True weights would be rounded on train mode.

activations_round(value)¶

Turn on/off the activations round on train mode.

Parameters

value (bool) – If True activations would be rounded on train mode.

extend(qmanager)¶: Constructs single QuantizationManager object by concatenation of quantized modules.

static replace_modules(model, module_type, replace_constructor, inplace=True)¶

Replaces all modules of specific type in model using replacing constructor.

Parameters

model (torch.nn.Module)
module_type (type(torch.nn.Module))
replace_constructor (callable) – Method which returns new module.
inplace (bool) – If True then modifies input model object. Otherwise returns copy.

Examples

If we have to reimplement some layer logic and want to replace modules of a specific type by our new implementation.

static make_param_groups(model)¶: Returns dictionary with keys: params, scales, offsets. At each key dictionary contains list of parameters. At scales, offsets contains all scale and offset parameters from each layer. Other parameters collected at params key.

set_observer_sync(value=True)¶

Turn on/off observer activation synchronization.

Parameters

value (bool) – If True weights would be rounded on train mode.

enable(val)¶: Enable/disable quantization.

remove(leave_parametrized=False)¶

Remove quantization from all managed modules.

Parameters

leave_parametrized (bool) – If True, keep the parametrized wrapper around modules but remove quantization hooks. By default False.

setup_model(model)¶

Abstract method for wrapping a model.

This method should be implemented in derived classes to define the specific wrapping behavior for a model.

Parameters

model (Any) – The model to be wrapped.

Returns

Any – The wrapped model.

get_quantizers()¶

Return the dictionary of quantizers for all managed modules.

Returns

dict – Mapping of (module, parameter_name) tuples to their quantizer instances.

QuantizationManager
- QuantizationManager