nano-vllm/bench_offload.py at c717072f31ad86a6b0ed727287a5f0409c7b1adf

Files

Zijie Tian 73c9dc46ff ✨ feat: add XAttention BSA support to bench_offload.py

- Add --model parameter (default: Llama-3.1-8B-Instruct)
- Add --enable-xattn flag for XAttention BSA sparse prefill
- Add --xattn-threshold and --xattn-stride parameters
- Change default num-gpu-blocks from 6 to 4
- Add benchmark results doc with Full vs XAttn comparison (32K/128K)

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>

2026-01-27 04:20:16 +08:00

5.7 KiB

Raw Blame History

View Raw

5.7 KiB Raw Blame History

5.7 KiB

Raw Blame History