nano-vllm/tests at f049971f84064fa2da0429f3210d7e1ca9ed194c - nano-vllm - Gitea: Git with a cup of tea

zijie-tian/nano-vllm

Files

History

Zijie Tian f049971f84 ✅ test: add hierarchical block sum estimation validation

Validate the hierarchical estimation approach for XAttention:
- Test 1: Math equivalence (diff = 0.0) between hierarchical and direct
- Test 2: Score + threshold selection strategy (replaces mask + voting)
- Test 3: Performance benchmark (41x speedup)

Uses pure torch + xattn kernels, independent of nanovllm framework.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>

2026-01-28 06:24:35 +08:00

..

__init__.py

[WIP] NEED refactor nanovllm mechenism.

2025-12-22 23:52:56 +08:00

bench_estimate_block_size.py

📝 docs: add estimate block_size performance analysis

2026-01-28 06:24:28 +08:00

modeling_qwen3.py

[refactor] Refactor needle test.

2026-01-03 19:19:37 +08:00

test_chunk_attention_graph_reuse.py

📝 docs: add CUDA Graph optimization plan for offload mode decode

2026-01-22 02:12:24 +08:00

test_chunk_attention_graph.py

✨ feat: add chunk attention CUDA graph test for block sparse attention

2026-01-22 00:57:05 +08:00

test_cudagraph_memory.py

[test] Added test_cudagraph_memory.py.

2026-01-21 03:30:36 +08:00

test_hierarchical_estimate.py

✅ test: add hierarchical block sum estimation validation

2026-01-28 06:24:35 +08:00

test_needle_ref.py

[refactor] Refactor needle test.

2026-01-03 19:19:37 +08:00

test_needle.py

[WIP] Before integrate the xattn operator.

2026-01-19 21:19:21 +08:00

test_quest_policy.py

[WIP] move metadata to GPU.

2026-01-06 23:32:32 +08:00

test_ruler.py

✨ feat: add configurable stride and chunk_size for XAttention BSA

2026-01-23 10:37:04 +08:00

test_sequential.py

[WIP] Before fix bench_offload.py.

2026-01-06 18:41:08 +08:00

test_xattn_bsa.py

[WIP] Before refactor the compute)_chunked_prefill.

2026-01-23 03:36:12 +08:00

test_xattn_chunked.py

[WIP] Before refactor the compute)_chunked_prefill.

2026-01-23 03:36:12 +08:00

test_xattn_estimate_chunked.py

feat: add xattn_estimate_chunked for chunked prefill support

2026-01-22 01:13:17 +08:00

test_xattn_kernels.py

[WIP] Before refactor the compute)_chunked_prefill.

2026-01-23 03:36:12 +08:00

utils.py

[WIP] Before fix bench_offload.py.

2026-01-06 18:41:08 +08:00