📝 docs: add XAttention KV chunking density test results

Document the verification test for XAttention Triton kernel KV chunking:
- 32K and 64K test results with threshold 0.9/0.95/1.0
- Key finding: threshold=1.0 achieves alignment (~0% diff)
- threshold<1.0 shows 10-13% difference due to per-chunk threshold application
- Conclusion: softmax normalization is correct, issue is threshold accumulation

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
This commit is contained in:
Zijie Tian
2026-02-01 17:36:19 +08:00
parent 8ab53e7331
commit 8035e4db3d
2 changed files with 123 additions and 0 deletions

View File

@@ -38,6 +38,7 @@ Nano-vLLM is a lightweight vLLM implementation (~1,200 lines) for fast offline L
| [`docs/long_context_models_1m.md`](docs/long_context_models_1m.md) | 📚 REF: 1M+ 上下文长度模型列表 (Qwen/GLM/InternLM/Llama/VL)≤10B 推荐模型 |
| [`docs/new_model_integration_guide.md`](docs/new_model_integration_guide.md) | 🔧 GUIDE: 新模型整合指南 - 配置映射、RoPE变体、EOS处理、权重转换、验证清单 |
| [`docs/xattn_density_alignment_analysis.md`](docs/xattn_density_alignment_analysis.md) | 📊 ANALYSIS: GPU-only vs Offload 模式 density 对齐分析chunked softmax 边界效应5-7% 差异根因 |
| [`docs/xattn_kv_chunking_density_test.md`](docs/xattn_kv_chunking_density_test.md) | 🧪 TEST: XAttention KV chunking density 验证threshold=1.0 对齐threshold<1.0 差异 10-13% |
## Rules Index