nano-vllm/bench.py at f81b5ae8a9e7a2e5313b7e0ecd08ebc05fb22e44

Files

Zijie Tian e874229adc 📝 docs: add comprehensive GPU-only vs Offload benchmark results

- Add --block-size argument to bench.py for configurable KV cache block size
- Update bench_offload_results.md with complete benchmark analysis:
  - GPU-only: XAttention shows +15% to +41% speedup
  - CPU Offload: XAttention shows -14% to -59% slowdown
  - Block size 4096 recommended for best performance
  - Document why XAttention hurts Offload mode (transfer bottleneck)

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>

2026-01-27 22:32:07 +08:00

4.9 KiB

Raw Blame History

View Raw

4.9 KiB Raw Blame History

4.9 KiB

Raw Blame History