📚 docs: add 1M+ context length models reference list

- Add comprehensive list of 1M+ context models from Hugging Face - Categorize by type: text-only LLM vs vision-language models - Separate ≤10B (practical) from >10B (resource-intensive) models - Include Qwen, GLM, InternLM, Llama, MiniMax, Gradient AI series - Add VRAM requirements and technical comparison table - Update CLAUDE.md documentation index Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 09:04:55 +08:00
parent 2c2383c786
commit 4484ebbb77
2 changed files with 185 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -34,6 +34,7 @@ Nano-vLLM is a lightweight vLLM implementation (~1,200 lines) for fast offline L
 | [`docs/observer_architecture.md`](docs/observer_architecture.md) | 📊 Observer 架构: InferenceObserver (TTFT/TPOT)、MemoryObserver (H2D/D2H/D2D) 设计 |
 | [`docs/memory_communication_benchmark.md`](docs/memory_communication_benchmark.md) | 📊 通信量测试: Full vs XAttention 通信量对比 (32K/64K)、阶段分离统计 |
 | [`docs/estimate_block_size_performance.md`](docs/estimate_block_size_performance.md) | 🔥 PERF: estimate 阶段 block_size 性能分析，softmax_fuse_block_sum 最优点 (512-1024)，当前 4096 慢 15x |
+| [`docs/long_context_models_1m.md`](docs/long_context_models_1m.md) | 📚 REF: 1M+ 上下文长度模型列表 (Qwen/GLM/InternLM/Llama/VL)，≤10B 推荐模型 |

 ## Rules Index