- Add --model parameter (default: Llama-3.1-8B-Instruct)
- Add --enable-xattn flag for XAttention BSA sparse prefill
- Add --xattn-threshold and --xattn-stride parameters
- Change default num-gpu-blocks from 6 to 4
- Add benchmark results doc with Full vs XAttn comparison (32K/128K)
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>