[docs] Added Sparse Attn.

2025-12-29 19:56:54 +08:00
parent 600af0f59c
commit bf4c63c7ec
2 changed files with 446 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -6,6 +6,10 @@ This file provides guidance to Claude Code when working with this repository.

 Nano-vLLM is a lightweight vLLM implementation (~1,200 lines) for fast offline LLM inference. Supports Qwen3 models with CPU offload for long-context inference.

+## Sparse Attention
+
+For sparse attention related content (block sparse attention, MInference, FlexPrefill, XAttention, AvgPool, etc.), refer to [`docs/sparse_attention_guide.md`](docs/sparse_attention_guide.md).
+
 ## Architecture

 ### Core Components