nano-vllm/tests at 6e34efd58a5009e32a27c7804006f372585be69d - nano-vllm - Gitea: Git with a cup of tea

zijie-tian/nano-vllm

Files

History

Zijie Tian 6e34efd58a 📝 docs: add storage overhead analysis and batch tests for KV chunking

- Update xattn_kv_chunking_kernels.md with:
  - Detailed storage overhead analysis (O(S) vs O(S²))
  - Peak memory optimization (8x reduction)
  - Support for independent Q/KV chunk sizes
  - Batch verification results (3K-64K seqlen)
  - ASCII pipeline diagram

- Add test_xattn_kv_chunking_batch.py for batch validation
- Fix causal mask post-processing in alignment test
- Update CLAUDE.md documentation index

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>

2026-02-01 19:22:36 +08:00

..

__init__.py

[WIP] NEED refactor nanovllm mechenism.

2025-12-22 23:52:56 +08:00

bench_estimate_block_size.py

📝 docs: add estimate block_size performance analysis

2026-01-28 06:24:28 +08:00

modeling_qwen3.py

[refactor] Refactor needle test.

2026-01-03 19:19:37 +08:00

test_chunk_attention_graph_reuse.py

📝 docs: add CUDA Graph optimization plan for offload mode decode

2026-01-22 02:12:24 +08:00

test_chunk_attention_graph.py

✨ feat: add chunk attention CUDA graph test for block sparse attention

2026-01-22 00:57:05 +08:00

test_cudagraph_memory.py

[test] Added test_cudagraph_memory.py.

2026-01-21 03:30:36 +08:00

test_hierarchical_estimate.py

✅ test: add hierarchical block sum estimation validation

2026-01-28 06:24:35 +08:00

test_needle_ref.py

[refactor] Refactor needle test.

2026-01-03 19:19:37 +08:00

test_needle.py

[WIP] Before integrate the xattn operator.

2026-01-19 21:19:21 +08:00

test_quest_policy.py

[WIP] move metadata to GPU.

2026-01-06 23:32:32 +08:00

test_ruler.py

✨ feat: add DensityObserver for XAttention sparse attention density tracking

2026-01-30 16:26:56 +08:00

test_sequential.py

[WIP] Before fix bench_offload.py.

2026-01-06 18:41:08 +08:00

test_xattn_bsa.py

[WIP] Before refactor the compute)_chunked_prefill.

2026-01-23 03:36:12 +08:00

test_xattn_chunked.py

[WIP] Before refactor the compute)_chunked_prefill.

2026-01-23 03:36:12 +08:00

test_xattn_estimate_alignment.py

📝 docs: add storage overhead analysis and batch tests for KV chunking

2026-02-01 19:22:36 +08:00

test_xattn_estimate_chunked.py

feat: add xattn_estimate_chunked for chunked prefill support

2026-01-22 01:13:17 +08:00

test_xattn_kernels.py

WIP: Enhance sparse attention with density tracking and block selection improvements

2026-01-31 14:48:23 +08:00

test_xattn_kv_chunking_batch.py

📝 docs: add storage overhead analysis and batch tests for KV chunking

2026-02-01 19:22:36 +08:00

utils.py

[WIP] Before fix bench_offload.py.

2026-01-06 18:41:08 +08:00