Zijie Tian
|
2826a649de
|
docs: add XAttention integration guide
Comprehensive documentation for XAttention sparse policy integration:
- Algorithm principles (chunked estimation + block sparse attention)
- COMPASS source code analysis
- Design decisions for CPU offload mode
- Implementation details (utils.py, kernels.py, xattn.py)
- Problem-solving (OOM, GQA, abstract method)
- Test validation results (RULER 32k benchmark)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2026-01-14 10:16:21 +08:00 |
|