nano-vllm/tests/test_offload_unified.py at tzj/vs_offload

Files

Zijie Tian e72725c12b test: add OffloadedTensor unified test suite

Add comprehensive test suite for OffloadedTensor implementation,
including basic functionality, chunked GEMM, and sync analysis.

Components:
- OffloadedTensor: Virtual GPU tensor with transparent CPU/GPU data movement
- OffloadManager: LRU cache management with performance stats
- ChunkedOffloadLinear: Chunked GEMM along seqlen dimension

Tests (10 total):
- Basic functionality, MLP integration, LRU eviction, correctness
- Memory analysis, 128K sequence, performance comparison, transformers layer
- Sync behavior analysis, profiler analysis

Key findings:
- 93.9% memory savings for 128K sequences (3156MB → 191MB)
- Constant memory footprint regardless of sequence length
- Only 8% performance overhead from chunked processing

Co-Authored-By: Claude <noreply@anthropic.com>

2026-01-18 10:41:40 +08:00

26 KiB

Raw Permalink Blame History

View Raw

26 KiB Raw Permalink Blame History

26 KiB

Raw Permalink Blame History