nano-vllm/tests at 193ef55d18dca009922fd7dadc18782181ef5b67 - nano-vllm - Gitea: Git with a cup of tea

zijie-tian/nano-vllm

Files

History

Zijie Tian 193ef55d18 ♻️ refactor: use Q-chunked processing in xattn alignment test

Match xattn_estimate internal logic by processing Q in chunks:
- Reduces peak memory for attn_scores tensor
- Enables testing 64K sequences without OOM
- All 5 test files pass (3.6K to 64K)

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>

2026-02-01 18:08:15 +08:00

..

__init__.py

[WIP] NEED refactor nanovllm mechenism.

2025-12-22 23:52:56 +08:00

bench_estimate_block_size.py

📝 docs: add estimate block_size performance analysis

2026-01-28 06:24:28 +08:00

modeling_qwen3.py

[refactor] Refactor needle test.

2026-01-03 19:19:37 +08:00

test_chunk_attention_graph_reuse.py

📝 docs: add CUDA Graph optimization plan for offload mode decode

2026-01-22 02:12:24 +08:00

test_chunk_attention_graph.py

✨ feat: add chunk attention CUDA graph test for block sparse attention

2026-01-22 00:57:05 +08:00

test_cudagraph_memory.py

[test] Added test_cudagraph_memory.py.

2026-01-21 03:30:36 +08:00

test_hierarchical_estimate.py

✅ test: add hierarchical block sum estimation validation

2026-01-28 06:24:35 +08:00

test_needle_ref.py

[refactor] Refactor needle test.

2026-01-03 19:19:37 +08:00

test_needle.py

[WIP] Before integrate the xattn operator.

2026-01-19 21:19:21 +08:00

test_quest_policy.py

[WIP] move metadata to GPU.

2026-01-06 23:32:32 +08:00

test_ruler.py

✨ feat: add DensityObserver for XAttention sparse attention density tracking

2026-01-30 16:26:56 +08:00

test_sequential.py

[WIP] Before fix bench_offload.py.

2026-01-06 18:41:08 +08:00

test_xattn_bsa.py

[WIP] Before refactor the compute)_chunked_prefill.

2026-01-23 03:36:12 +08:00

test_xattn_chunked.py

[WIP] Before refactor the compute)_chunked_prefill.

2026-01-23 03:36:12 +08:00

test_xattn_estimate_alignment.py

♻️ refactor: use Q-chunked processing in xattn alignment test

2026-02-01 18:08:15 +08:00

test_xattn_estimate_chunked.py

feat: add xattn_estimate_chunked for chunked prefill support

2026-01-22 01:13:17 +08:00

test_xattn_kernels.py

WIP: Enhance sparse attention with density tracking and block selection improvements

2026-01-31 14:48:23 +08:00

utils.py

[WIP] Before fix bench_offload.py.

2026-01-06 18:41:08 +08:00