Zijie Tian
2c2383c786
⚡️ perf: optimize XAttention estimate with hierarchical block sum
Replace slow softmax_fuse_block_sum (block_size=4096) with optimized
hierarchical approach (estimate_block_size=1024):
- Add estimate_block_size parameter to XAttentionBSAPolicy (default 1024)
- Rewrite select_blocks to use hierarchical aggregation:
1. Fine-grained softmax with small block size (15x faster kernel)
2. Aggregate to CPU block level via reshape + sum
3. Score + threshold selection (replaces mask + voting)
Performance improvement (CPU Offload mode):
- softmax_fuse_block_sum: 48% → 1% of total time (44x faster)
- 128K: XAttention now +2.4% faster than Full (was -59%)
- 64K: -3.8% (was -21%)
- 32K: -6.0% (was -14%)
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
2026-01-28 06:47:13 +08:00
..
2026-01-18 20:34:06 +08:00
2026-01-28 06:47:13 +08:00
2026-01-20 04:27:19 +08:00
2026-01-20 04:48:20 +08:00
2026-01-27 04:44:36 +08:00
2026-01-27 03:42:12 +08:00
2026-01-21 02:59:21 +08:00
2026-01-21 21:56:24 +08:00
2026-01-18 20:34:06 +08:00
2026-01-28 06:47:13 +08:00
2026-01-27 04:36:31 +08:00
2026-01-27 07:21:46 +08:00
2026-01-18 20:34:06 +08:00
2026-01-28 06:24:20 +08:00
2026-01-24 04:32:05 +08:00
2026-01-28 04:06:45 +08:00
2026-01-18 20:34:06 +08:00
2026-01-21 01:12:21 +08:00
2026-01-18 20:34:06 +08:00
2026-01-20 02:50:03 +08:00
2026-01-20 02:10:40 +08:00
2026-01-20 02:25:46 +08:00
2026-01-20 02:50:03 +08:00
2026-01-19 21:19:21 +08:00
2026-01-23 09:35:18 +08:00
2026-01-22 01:13:17 +08:00
2026-01-23 03:22:25 +08:00
2026-01-28 00:57:20 +08:00