Commit Graph

1 Commits

Author SHA1 Message Date
Zijie Tian
2771312565 [docs] Add sparse prefill integration plan from int-minference analysis
Consolidated analysis from int-minference-1/2/3 branches into a unified
integration plan for MInference, XAttention, and FlexPrefill strategies.

Key design decisions:
- Backward compatible: Keep existing SparsePolicy interface
- Unified BlockMask intermediate representation for new strategies
- XAttention/FlexPrefill use block_sparse_attn_func kernel
- MInference can optionally use block_sparse_attn (Phase 4)

Five-phase implementation plan:
1. BlockMask + block_sparse_attn wrapper
2. XAttention implementation
3. FlexPrefill implementation
4. Optional MInference refactoring
5. Integration and testing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 23:33:09 +08:00