Files
nano-vllm/docs
Zijie Tian 6da116de98 📝 docs: add GPU-Only XAttention guide with performance analysis
Add comprehensive documentation for GPU-only XAttention BSA mode:
- Architecture design and SparsePolicy interface
- Memory pre-allocation mechanism (alloc_policy_metadata)
- Performance analysis: 32K +15%, 64K +41% vs baseline
- CUDA Graph limitations explanation (variable seq_len in prefill)
- nsys profiling tools usage guide

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
2026-01-27 07:21:46 +08:00
..