Commit Graph

3 Commits

Author SHA1 Message Date
Zijie Tian
78050aef9f 🐛 fix: resolve CPU KV cache state leakage between requests
Root Cause:
- OffloadEngine.reset() cleared GPU buffers but NOT CPU cache
- Previous request's KV cache data persisted in CPU memory, contaminating subsequent requests

Fixes:
- Add k_cache_cpu.zero_() and v_cache_cpu.zero_() to OffloadEngine.reset()
- Add clear_decode_tracking(seq) call in HybridKVCacheManager.deallocate()

Results:
- niah_single_1 accuracy improved from ~80% to 94% (+14%)
- Remaining ~6% errors are model limitations, not state leakage

Also:
- Update docs/ruler_32k_chunked_offload_issue.md with fix details
- Remove debug planning files (findings.md, progress.md, task_plan.md)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 01:12:21 +08:00
Zijie Tian
1ab4676396 ♻️ refactor: consolidate RULER test files and document root cause
- test_ruler.py: add --fresh-llm, --sample-indices, --json-output options
- test_ruler.py: consolidate test_ruler_single_sample.py, test_ruler_sequential.py, test_ruler_samples.py
- docs: update chunked offload issue with root cause (state leakage confirmed)
- docs: add single-sample test results showing 100% accuracy for niah_single_1

Deleted redundant test files:
- tests/test_ruler_single_sample.py
- tests/test_ruler_sequential.py
- tests/test_ruler_samples.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 23:41:17 +08:00
Zijie Tian
16fbcf9e4c docs: add RULER 32K chunked offload issue documentation
- Document accuracy degradation issue in 32K context with chunked offload
- Add detailed hypothesis analysis and debugging approach
- Include 4-slot ring buffer experiment results

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 02:16:21 +08:00