Files
nano-vllm/nanovllm
Zijie Tian 3aef6fc3a2 feat: add XAttention Triton operators for sparse attention estimation
Port XAttention operators from COMPASS project:
- flat_group_gemm_fuse_reshape: stride reshape GEMM kernel
- softmax_fuse_block_sum: fused softmax with block-level summation
- xattn_estimate: main estimation function for block sparse attention
- find_blocks_chunked: cumulative threshold-based block selection

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 04:27:07 +08:00
..
2025-06-15 10:36:45 +08:00
2025-06-15 01:31:24 +08:00
2025-08-31 22:55:34 +08:00