NvidiaMemoryManager

class qlip.inference.nvidia.NvidiaMemoryManager(*infsessions, device_memory_size=None, output_memory_size=None, input_memory_size=None, shared_io_memory=False, stream=None, memory_allocator=<qlip.inference.nvidia.memory_allocation.MemoryAllocatorTorch object>)

Bases: object

Nvidia memory manager.

Pre-allocates memory and sets it to the inference sessions. Can allocate device, input and output memory.

Parameters

  • infsessions (List[NvidiaInferenceSession]) – List of Nvidia Inference Sessions.

  • device_memory_size (Optional[int]) – Device memory size.

  • output_memory_size (Optional[int]) – Output memory size.

  • input_memory_size (Optional[int]) – Input memory size.

  • shared_io_memory (bool) – Share input and output memory.

  • stream (cupy.cuda.Stream | torch.cuda.Stream | None) – CUDA stream for memory allocation.

  • memory_allocator (MemoryAllocator) – Memory allocator.

property is_active

Active if any type of memory is allocated: - device memory - input memory - output memory

set_stream(stream)

Set CUDA stream for memory allocation.

Parameters

  • stream (cupy.cuda.Stream | torch.cuda.Stream) – CUDA stream for memory allocation.

add_infsession(infsession)

Add inference session to the memory manager.

Parameters

  • infsession (NvidiaInferenceSession) – Nvidia inference session to add.

extend_infsessions(infsessions)

Extend inference sessions to the memory manager.

extract_device_memory_size()

Extract maximum device memory size from all inference sessions.

extract_output_memory_size()

Extract maximum output memory size from all inference sessions.

extract_input_memory_size()

Extract maximum input memory size from all inference sessions.

allocate_memory()

Allocate all types of memory.

allocate_device_memory()

Allocate device memory.

allocate_output_memory()

Allocate output memory.

allocate_input_memory()

Allocate input memory.

deallocate_memory()

Deallocate all types of memory.

reallocate_memory(model, method_name='forward')

Wrapper for model method to allocate memory before and deallocate after execution.

Parameters

  • model (torch.nn.Module) – Model to wrap.

  • method_name (str) – Method name to wrap.