NvidiaSessionConfig¶
- class qlip.inference.nvidia.NvidiaSessionConfig(*, device_memory=None, stream=None, use_cuda_graph=False, weight_streaming_budget_v2=-1, allocation_strategy='USER_MANAGED', do_input_check=True)¶
Bases:
SessionConfigConfiguration for Nvidia Inference Session.
Variables
device_memory (
Optional[tuple]) – Device memory for Nvidia engine context.stream (
cupy.cuda.Stream | torch.cuda.Stream | None) – CUDA stream.use_cuda_graph (
bool) – Use CUDA Graph.weight_streaming_budget_v2 (
int) – Weight streaming budget. If 0, stream all weights.allocation_strategy (
str) – Allocation strategy: STATIC, ON_PROFILE_CHANGE, USER_MANAGEDdo_input_check (
bool) – Whether to check input shapes and types before inference. By default True.