Files
nano-vllm/docs
Zijie Tian f3e4611e3b 📝 docs: add XAttention performance analysis documentation
Add comprehensive performance analysis for XAttention:
- NVTX marker locations and usage
- Block size impact on offload mode (4096 vs 1024)
- Detailed timing breakdown for estimate vs compute phases
- softmax_fuse_block_sum_kernel analysis
- Optimization recommendations

Key findings:
- block_size=4096 is 2x faster than 1024 for 64K context
- find_blocks_chunked is bottleneck (40%) at block_size=4096
- estimate_gemm becomes bottleneck (24%) at block_size=1024

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
2026-01-28 00:57:20 +08:00
..