[docs] Added Sparse Attn.

This commit is contained in:
Zijie Tian
2025-12-29 19:56:54 +08:00
parent 600af0f59c
commit bf4c63c7ec
2 changed files with 446 additions and 0 deletions

View File

@@ -6,6 +6,10 @@ This file provides guidance to Claude Code when working with this repository.
Nano-vLLM is a lightweight vLLM implementation (~1,200 lines) for fast offline LLM inference. Supports Qwen3 models with CPU offload for long-context inference.
## Sparse Attention
For sparse attention related content (block sparse attention, MInference, FlexPrefill, XAttention, AvgPool, etc.), refer to [`docs/sparse_attention_guide.md`](docs/sparse_attention_guide.md).
## Architecture
### Core Components