📝 docs: add GPU-only density alignment test results

Document test results verifying XAttention density calculation in GPU-only mode matches independent xattn_estimate calls. Test results (Llama-3.1-8B-Instruct, threshold=0.9): - 4k: Layer 0 density 63.8%, verified ✅ - 8k: Layer 0 density 65.0%, verified ✅ - 16k: Layer 0 density 61.6%, verified ✅ - 32k: Layer 0 density 50.2%, verified ✅ - 64k: Layer 0 density 37.0%, verified ✅ All tests show exact match (attn_sums diff=0, mask exact match). Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
2026-02-02 11:22:34 +08:00
parent aeed6ccdfb
commit 232fcf043e
2 changed files with 103 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -40,6 +40,7 @@ Nano-vLLM is a lightweight vLLM implementation (~1,200 lines) for fast offline L
 | [`docs/new_model_integration_guide.md`](docs/new_model_integration_guide.md) | 🔧 GUIDE: 新模型整合指南 - 配置映射、RoPE变体、EOS处理、权重转换、验证清单 |
 | [`docs/xattn_density_alignment_analysis.md`](docs/xattn_density_alignment_analysis.md) | 📊 ANALYSIS: GPU-only vs Offload 模式 density 对齐分析，chunked softmax 边界效应，5-7% 差异根因 |
 | [`docs/xattn_kv_chunking_density_test.md`](docs/xattn_kv_chunking_density_test.md) | 🧪 TEST: XAttention KV chunking density 验证，threshold=1.0 对齐，threshold<1.0 差异 10-13% |
+| [`docs/gpuonly_density_alignment_test.md`](docs/gpuonly_density_alignment_test.md) | ✅ TEST: GPU-only density 对齐验证 (4K-64K)，xattn_bsa vs xattn_estimate 完全一致 |

 ## Rules Index