Files
nano-vllm/docs
Zijie Tian 49519c7ce7 📝 docs: update offload accuracy issue with independent testing results
Document key finding: single request inference works correctly (100% accuracy).
The 66% accuracy issue in batch mode is due to state accumulation between
sequential requests in the same process.

- Add comparison table: independent (100%) vs batch (66%) testing modes
- Document root cause analysis: state cleanup issue between requests
- Add workaround using test_ruler_niah.sh for independent testing
- Update next steps to focus on OffloadEngine reset/cleanup logic

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 21:08:35 +08:00
..