NvidiaSessionConfig

class qlip.inference.nvidia.NvidiaSessionConfig(*, device_memory=None, stream=None, use_cuda_graph=False, weight_streaming_budget_v2=-1, allocation_strategy='USER_MANAGED', do_input_check=True)

Bases: SessionConfig

Configuration for Nvidia Inference Session.

Variables

  • device_memory (Optional[tuple]) – Device memory for Nvidia engine context.

  • stream (cupy.cuda.Stream | torch.cuda.Stream | None) – CUDA stream.

  • use_cuda_graph (bool) – Use CUDA Graph.

  • weight_streaming_budget_v2 (int) – Weight streaming budget. If 0, stream all weights.

  • allocation_strategy (str) – Allocation strategy: STATIC, ON_PROFILE_CHANGE, USER_MANAGED

  • do_input_check (bool) – Whether to check input shapes and types before inference. By default True.