NvidiaCompileManager

class qlip.compiler.nvidia.NvidiaCompileManager(model, workspace, use_timing_cache=True, **kwargs)

Bases: CompileManager

Nvidia manager for compilation.

Parameters

  • model (torch.nn.Module) – Model to compile.

  • workspace (Path) – Workspace to use for compilation.

  • use_timing_cache (bool) – Whether to use timing cache for compilation. By default True.

Variables

  • backend (Backend) – Backend for compilation.

  • model (nn.Module) – Model for compilation.

  • workspace (Path) – Workspace for compilation.

  • modules (List[CompiledModule]) – Modules for compilation.

Examples

>>> cmanager = NvidiaCompileManager(model, workspace)
>>> cmanager.setup_modules(modules=["encoder", "decoder"])
>>> cmanager.compile()
backend

alias of NvidiaBackend

setup_modules(*, modules=None, module_types=None, builder_config=None, component=None, component_name=None, onnx_model=None, dtype=None, adapter_type='auto')

Setup modules for compilation.

Parameters

  • modules (Optional[Iterable[str]]) – Names of modules to setup.

  • module_types (Optional[Iterable[type]]) – Types of modules to setup.

  • builder_config (Optional[BuilderConfig]) – Builder configuration.

  • component (Optional[str]) – Component name. Mandatory for DiffusionPipeline from diffusers: actual component name. Optional for other models: name of any submodule.

  • component_name (Optional[str]) – Custom pretty name for the component.

  • onnx_model (Optional[Path]) – ONNX model to use for compilation.

  • dtype (torch.dtype) – Data type of compiled modules.

  • adapter_type (str) – Type of adapter to use, by default ‘auto’. Possible values are: ‘auto’, ‘default’, ‘hf_adapter’, ‘hf_unet_adapter’.

Examples

>>> # The whole `text_encoder` component
>>> cmanager.setup_modules(component="text_encoder")
>>> # Submodules of type `FluxTransformerBlock` within the `transformer` component
>>> cmanager.setup_modules(module_types=["FluxTransformerBlock"], component="transformer")
>>> # Specific modules
>>> cmanager.setup_modules(modules=["encoder", "decoder"])
setup_model(*, builder_config=None, onnx_model=None, dtype=None, adapter_type='auto')

Setup the model for compilation.

Parameters

  • builder_config (Optional[BuilderConfig]) – Builder configuration.

  • onnx_model (Optional[Path]) – ONNX model to use for compilation.

  • dtype (torch.dtype) – Data type of compiled modules.

  • adapter_type (str) – Type of adapter to use, by default ‘auto’. Possible values are: ‘auto’, ‘default’, ‘hf_adapter’, ‘hf_unet_adapter’.

compile(device='cuda', original_device='meta', cpu_offload=False, recompile_existing=False, dump_onnx=False, keep_compiled=True, save_compiled=True, dynamo=None, fuse_weights_quantizers=None)

Compile modules.

Parameters

  • device (str) – Device to compile the modules.

  • original_device (str) – Original device of the model.

  • cpu_offload (bool) – Whether to cpu offload the model.

  • recompile_existing (bool) – Whether to recompile the existing modules.

  • dump_onnx (bool) – Whether to dump the ONNX model.

  • keep_compiled (bool) – Whether to keep the compiled modules.

  • save_compiled (bool) – Whether to save the compiled modules.

  • dynamo (Optional[bool] = None,) – Whether to use dynamo for ONNX export. By default True for PyTorch version 2.9 or higher, False otherwise.

  • fuse_weights_quantizers (bool) – Whether to fuse weights quantizers. By default True for dynamo ONNX export.

collect_inputs()

Create a collect inputs manager to collect inputs for export.

This is useful when using fake mode where shapes cannot be properly collected but real inputs are needed.

Returns

  • collect_inputs_manager (CollectInputsManager) – Collect inputs manager.

enable(val=True)

Enable or disable compiled computation.

Parameters

  • val (bool) – Whether to enable or disable compiled computation.

remove()

Extract original modules from compiled modules.

set_axes_names(mappings)

Set axes names with mapping.

Parameters

  • mappings (Dict[type | str, Dict[str, str]]) – Mapping dictionary for every module defined by name or type.

Examples

Mapping by module type:

{
    torch.nn.Linear: {
        "input_0": "batch_size"
    }
}

Mapping by module name:

{
    "decoder": {
        "sample_2": "width",
        "sample_3": "height",
    }
}
shape_profile(type='static', opt='mode', fake_mode=False)

Create a shape profile manager to control shape collection.

Parameters

  • type (str) – Create static or dynamic axes profiles from collected shapes.

  • opt (str) – Optimal shape for dynamic axes profile. Can be “mode”, “min” or “max”. Default is “mode”.

  • fake_mode (bool) – Whether to enable FakeTensorMode for shape inference without actual computation. Default is False.

Returns