NvidiaMemoryManager¶
- class qlip.inference.nvidia.NvidiaMemoryManager(*infsessions, device_memory_size=None, output_memory_size=None, input_memory_size=None, shared_io_memory=False, stream=None, memory_allocator=<qlip.inference.nvidia.memory_allocation.MemoryAllocatorTorch object>)¶
Bases:
objectNvidia memory manager.
Pre-allocates memory and sets it to the inference sessions. Can allocate device, input and output memory.
Parameters
infsessions (
List[NvidiaInferenceSession]) – List of Nvidia Inference Sessions.device_memory_size (
Optional[int]) – Device memory size.output_memory_size (
Optional[int]) – Output memory size.input_memory_size (
Optional[int]) – Input memory size.shared_io_memory (
bool) – Share input and output memory.stream (
cupy.cuda.Stream | torch.cuda.Stream | None) – CUDA stream for memory allocation.memory_allocator (
MemoryAllocator) – Memory allocator.
- property is_active¶
Active if any type of memory is allocated: - device memory - input memory - output memory
- set_stream(stream)¶
Set CUDA stream for memory allocation.
Parameters
stream (
cupy.cuda.Stream | torch.cuda.Stream) – CUDA stream for memory allocation.
- add_infsession(infsession)¶
Add inference session to the memory manager.
Parameters
infsession (
NvidiaInferenceSession) – Nvidia inference session to add.
- extend_infsessions(infsessions)¶
Extend inference sessions to the memory manager.
- extract_device_memory_size()¶
Extract maximum device memory size from all inference sessions.
- extract_output_memory_size()¶
Extract maximum output memory size from all inference sessions.
- extract_input_memory_size()¶
Extract maximum input memory size from all inference sessions.
- allocate_memory()¶
Allocate all types of memory.
- allocate_device_memory()¶
Allocate device memory.
- allocate_output_memory()¶
Allocate output memory.
- allocate_input_memory()¶
Allocate input memory.
- deallocate_memory()¶
Deallocate all types of memory.
- reallocate_memory(model, method_name='forward')¶
Wrapper for model method to allocate memory before and deallocate after execution.
Parameters
model (
torch.nn.Module) – Model to wrap.method_name (
str) – Method name to wrap.
- NvidiaMemoryManager
NvidiaMemoryManagerNvidiaMemoryManager.is_activeNvidiaMemoryManager.set_stream()NvidiaMemoryManager.add_infsession()NvidiaMemoryManager.extend_infsessions()NvidiaMemoryManager.extract_device_memory_size()NvidiaMemoryManager.extract_output_memory_size()NvidiaMemoryManager.extract_input_memory_size()NvidiaMemoryManager.allocate_memory()NvidiaMemoryManager.allocate_device_memory()NvidiaMemoryManager.allocate_output_memory()NvidiaMemoryManager.allocate_input_memory()NvidiaMemoryManager.deallocate_memory()NvidiaMemoryManager.reallocate_memory()