nano-vllm/nanovllm/kvcache/hybrid_manager.py at 7b5d3b34eb7af013eeb9ed0ce4401630122b9f23

Files

Zijie Tian 78050aef9f 🐛 fix: resolve CPU KV cache state leakage between requests

Root Cause:
- OffloadEngine.reset() cleared GPU buffers but NOT CPU cache
- Previous request's KV cache data persisted in CPU memory, contaminating subsequent requests

Fixes:
- Add k_cache_cpu.zero_() and v_cache_cpu.zero_() to OffloadEngine.reset()
- Add clear_decode_tracking(seq) call in HybridKVCacheManager.deallocate()

Results:
- niah_single_1 accuracy improved from ~80% to 94% (+14%)
- Remaining ~6% errors are model limitations, not state leakage

Also:
- Update docs/ruler_32k_chunked_offload_issue.md with fix details
- Remove debug planning files (findings.md, progress.md, task_plan.md)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-21 01:12:21 +08:00

20 KiB

Raw Blame History

View Raw

20 KiB Raw Blame History

20 KiB

Raw Blame History