📝 docs: add GPU-only density alignment test results

Document test results verifying XAttention density calculation in
GPU-only mode matches independent xattn_estimate calls.

Test results (Llama-3.1-8B-Instruct, threshold=0.9):
- 4k:  Layer 0 density 63.8%, verified 
- 8k:  Layer 0 density 65.0%, verified 
- 16k: Layer 0 density 61.6%, verified 
- 32k: Layer 0 density 50.2%, verified 
- 64k: Layer 0 density 37.0%, verified 

All tests show exact match (attn_sums diff=0, mask exact match).

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
This commit is contained in:
Zijie Tian
2026-02-02 11:22:34 +08:00
parent aeed6ccdfb
commit 232fcf043e
2 changed files with 103 additions and 0 deletions

View File

@@ -40,6 +40,7 @@ Nano-vLLM is a lightweight vLLM implementation (~1,200 lines) for fast offline L
| [`docs/new_model_integration_guide.md`](docs/new_model_integration_guide.md) | 🔧 GUIDE: 新模型整合指南 - 配置映射、RoPE变体、EOS处理、权重转换、验证清单 |
| [`docs/xattn_density_alignment_analysis.md`](docs/xattn_density_alignment_analysis.md) | 📊 ANALYSIS: GPU-only vs Offload 模式 density 对齐分析chunked softmax 边界效应5-7% 差异根因 |
| [`docs/xattn_kv_chunking_density_test.md`](docs/xattn_kv_chunking_density_test.md) | 🧪 TEST: XAttention KV chunking density 验证threshold=1.0 对齐threshold<1.0 差异 10-13% |
| [`docs/gpuonly_density_alignment_test.md`](docs/gpuonly_density_alignment_test.md) | ✅ TEST: GPU-only density 对齐验证 (4K-64K)xattn_bsa vs xattn_estimate 完全一致 |
## Rules Index