NvidiaMemoryManager¶

class qlip.inference.nvidia.NvidiaMemoryManager(*infsessions, device_memory_size=None, output_memory_size=None, input_memory_size=None, shared_io_memory=False, stream=None, memory_allocator=<qlip.inference.nvidia.memory_allocation.MemoryAllocatorTorch object>)¶

Bases: object

Nvidia memory manager.

Pre-allocates memory and sets it to the inference sessions. Can allocate device, input and output memory.

Parameters

infsessions (List[NvidiaInferenceSession]) – List of Nvidia Inference Sessions.
device_memory_size (Optional[int]) – Device memory size.
output_memory_size (Optional[int]) – Output memory size.
input_memory_size (Optional[int]) – Input memory size.
shared_io_memory (bool) – Share input and output memory.
stream (cupy.cuda.Stream | torch.cuda.Stream | None) – CUDA stream for memory allocation.
memory_allocator (MemoryAllocator) – Memory allocator.

property is_active¶: Active if any type of memory is allocated: - device memory - input memory - output memory

set_stream(stream)¶

Set CUDA stream for memory allocation.

Parameters

stream (cupy.cuda.Stream | torch.cuda.Stream) – CUDA stream for memory allocation.

add_infsession(infsession)¶

Add inference session to the memory manager.

Parameters

infsession (NvidiaInferenceSession) – Nvidia inference session to add.

extend_infsessions(infsessions)¶: Extend inference sessions to the memory manager.

extract_device_memory_size()¶: Extract maximum device memory size from all inference sessions.

extract_output_memory_size()¶: Extract maximum output memory size from all inference sessions.

extract_input_memory_size()¶: Extract maximum input memory size from all inference sessions.

allocate_memory()¶: Allocate all types of memory.

allocate_device_memory()¶: Allocate device memory.

allocate_output_memory()¶: Allocate output memory.

allocate_input_memory()¶: Allocate input memory.

deallocate_memory()¶: Deallocate all types of memory.

reallocate_memory(model, method_name='forward')¶

Wrapper for model method to allocate memory before and deallocate after execution.

Parameters

model (torch.nn.Module) – Model to wrap.
method_name (str) – Method name to wrap.

NvidiaMemoryManager
- NvidiaMemoryManager