QuantizationManager¶
- class qlip.quantization.QuantizationManager¶
Bases:
BaseManagerProvides mechanism for quantization of module weights and input/output activations by setting corresponding hooks.
- setup_modules(modules, *, weights_scheme=int32, weights_granularity='per-channel', activations_scheme=int32, activations_granularity='per-batch', placement='input', quantization_mode='static', calibration_iterations=1, train_activations_scale=False, train_activations_offset=False, train_weights_scale=False, train_weights_offset=False, observer=None, observer_kwargs={}, weights_scale_offset_dtype=torch.float32, activations_scale_offset_dtype=torch.float32, **kwargs)¶
Attaches quantization parameters and sets quantization hooks.
Parameters
modules (
list[torch.nn.Module]) – List of modules to setup.weights_scheme (
QuantScheme) – Weights scheme.weights_granularity (
Union[str,QuantGranularity,Dict[str,Any]]) – Default is per-channel.activations_scheme (
QuantScheme) – Activations scheme.activations_granularity (
Union[str,QuantGranularity,Dict[str,Any]]) – Possible values: per-batch, per-sample or per-token. Default is per-batch.placement (
Union[str,QuantPlacement]) – Three possible values: input, output or attention. Quantize input / output activations or attention. Default is input.quantization_mode (
Union[str,QuantizationMode]) – Two possible values: static or dynamic. Default is static.calibration_iterations (
size_t) – The number of batches to initialize scale and offset parameters for activations quantizer. Ignored if quantization_mode=’dynamic’. Default is 1.train_activations_scale (
bool) – Sets requires_grad attribute for activations scale parameters.train_activations_offset (
bool) – Sets requires_grad attribute for activations offset parameters.train_weights_scale (
bool) – Sets requires_grad attribute for weights scale parameters.train_weights_offset (
bool) – Sets requires_grad attribute for weights offset parameters.observer (
qlip.observers.IObserver) – Implemented observer interface. Default is MinMaxObserver.observer_kwargs (
dict) – Additional keyword arguments for observer constructor.weights_scale_offset_dtype (
torch.dtype) – Data type for weights scale and offset parameters. Default is torch.float32.activations_scale_offset_dtype (
torch.dtype) – Data type for activations scale and offset parameters. Default is torch.float32.
- setup_modules_groups(modules_groups)¶
Quantize each module group using given config.
Parameters
modules_groups (
list[dict```{'modules', 'config'}`]) – List of dictionaries. Each dictionary have two keys: modules and config. Value corresponding to modules is a list of torch.nn.Modules and value corresponding to config is a dictionary with keys as kwargs in QuantizationManager.setup_modules method.
- get_quantization_parameters()¶
Collects all scales and offsets parameters from quantized modules.
Returns
quantization_params (
dict) – Dictionary with 4 keys: weights_scales, weights_offsets, `activations_scales, activations_offsets. Contains scales and offsets parameters of serving modules.
- weights_round(value)¶
Turn on/off the weights round on train mode.
Parameters
value (
bool) – If True weights would be rounded on train mode.
- activations_round(value)¶
Turn on/off the activations round on train mode.
Parameters
value (
bool) – If True activations would be rounded on train mode.
- extend(qmanager)¶
Constructs single QuantizationManager object by concatenation of quantized modules.
- static replace_modules(model, module_type, replace_constructor, inplace=True)¶
Replaces all modules of specific type in model using replacing constructor.
Parameters
model (
torch.nn.Module)module_type (
type(torch.nn.Module))replace_constructor (
callable) – Method which returns new module.inplace (
bool) – If True then modifies input model object. Otherwise returns copy.
Examples
If we have to reimplement some layer logic and want to replace modules of a specific type by our new implementation.
- static make_param_groups(model)¶
Returns dictionary with keys: params, scales, offsets. At each key dictionary contains list of parameters. At scales, offsets contains all scale and offset parameters from each layer. Other parameters collected at params key.
- set_observer_sync(value=True)¶
Turn on/off observer activation synchronization.
Parameters
value (
bool) – If True weights would be rounded on train mode.
- enable(val)¶
Enable/disable quantization.
- remove(leave_parametrized=False)¶
Remove quantization from all managed modules.
Parameters
leave_parametrized (
bool) – If True, keep the parametrized wrapper around modules but remove quantization hooks. By default False.
- setup_model(model)¶
Abstract method for wrapping a model.
This method should be implemented in derived classes to define the specific wrapping behavior for a model.
Parameters
model (
Any) – The model to be wrapped.
Returns
Any– The wrapped model.
- get_quantizers()¶
Return the dictionary of quantizers for all managed modules.
Returns
dict– Mapping of (module, parameter_name) tuples to their quantizer instances.
- QuantizationManager
QuantizationManagerQuantizationManager.setup_modules()QuantizationManager.setup_modules_groups()QuantizationManager.get_quantization_parameters()QuantizationManager.weights_round()QuantizationManager.activations_round()QuantizationManager.extend()QuantizationManager.replace_modules()QuantizationManager.make_param_groups()QuantizationManager.set_observer_sync()QuantizationManager.enable()QuantizationManager.remove()QuantizationManager.setup_model()QuantizationManager.get_quantizers()