nano-vllm/tests/test_xattn_estimate_alignment.py at 232fcf043e6d82ce805e40e072d82eafe9112720

Files

Zijie Tian 6e34efd58a 📝 docs: add storage overhead analysis and batch tests for KV chunking

- Update xattn_kv_chunking_kernels.md with:
  - Detailed storage overhead analysis (O(S) vs O(S²))
  - Peak memory optimization (8x reduction)
  - Support for independent Q/KV chunk sizes
  - Batch verification results (3K-64K seqlen)
  - ASCII pipeline diagram

- Add test_xattn_kv_chunking_batch.py for batch validation
- Fix causal mask post-processing in alignment test
- Update CLAUDE.md documentation index

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>

2026-02-01 19:22:36 +08:00

9.0 KiB

Raw Blame History

View Raw

9.0 KiB Raw Blame History

9.0 KiB

Raw Blame History