Add comprehensive documentation for the MIT-Han-Lab Block-Sparse-Attention
library (3rdparty submodule, branch: tzj/minference).
The new document covers:
- Four sparse attention modes (dense, token/block streaming, block sparse)
- Hybrid mask support (different patterns per head)
- Complete API reference for all three functions
- Performance benchmarks (up to 3-4x speedup on A100)
- Integration considerations for nano-vllm
Co-Authored-By: Claude <noreply@anthropic.com>