Zijie Tian
2771312565
[docs] Add sparse prefill integration plan from int-minference analysis
Consolidated analysis from int-minference-1/2/3 branches into a unified
integration plan for MInference, XAttention, and FlexPrefill strategies.
Key design decisions:
- Backward compatible: Keep existing SparsePolicy interface
- Unified BlockMask intermediate representation for new strategies
- XAttention/FlexPrefill use block_sparse_attn_func kernel
- MInference can optionally use block_sparse_attn (Phase 4)
Five-phase implementation plan:
1. BlockMask + block_sparse_attn wrapper
2. XAttention implementation
3. FlexPrefill implementation
4. Optional MInference refactoring
5. Integration and testing
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 23:33:09 +08:00
..
2026-01-08 21:19:38 +08:00
2026-01-09 16:10:28 +08:00
2026-01-08 21:19:38 +08:00
2026-01-08 23:22:38 +08:00
2026-01-08 21:19:38 +08:00
2026-01-10 21:14:32 +08:00
2026-01-08 21:19:38 +08:00
2026-01-08 23:42:30 +08:00
2026-01-10 23:33:09 +08:00