nano-vllm/tests at 51bd6783351ddb5ca8d58ea8d3c4950b6061840a - nano-vllm - Gitea: Git with a cup of tea

zijie-tian/nano-vllm

Files

History

Zijie Tian 51bd678335 📊 feat: distinguish compute density and communication density in DensityObserver

- Add record_comm_density() call in select_blocks to track CPU block selection
- Add get_per_layer_comm_density() method for detailed analysis
- Update print_summary() to show both densities and H2D savings ratio
- Set DensityObserver mode (offload/gpu_only) in test_ruler.py
- Update get_summary() to return both density types

Key insight: Comm density can be 100% even when compute density is ~37%
because sparse BSA blocks are distributed across all CPU blocks.
Since CPU block granularity is 32x coarser (4096 vs 128 tokens),
any() aggregation across heads/Q-blocks results in all CPU blocks being needed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-05 01:43:17 +08:00

..

__init__.py

[WIP] NEED refactor nanovllm mechenism.

2025-12-22 23:52:56 +08:00

bench_estimate_block_size.py

📝 docs: add estimate block_size performance analysis

2026-01-28 06:24:28 +08:00

modeling_qwen3.py

[refactor] Refactor needle test.

2026-01-03 19:19:37 +08:00

test_chunk_attention_graph_reuse.py

📝 docs: add CUDA Graph optimization plan for offload mode decode

2026-01-22 02:12:24 +08:00

test_chunk_attention_graph.py

✨ feat: add chunk attention CUDA graph test for block sparse attention

2026-01-22 00:57:05 +08:00

test_cudagraph_memory.py

[test] Added test_cudagraph_memory.py.

2026-01-21 03:30:36 +08:00

test_gpuonly_density_alignment.py

✅ test: add GPU-only density alignment verification test

2026-02-02 11:14:46 +08:00

test_hierarchical_estimate.py

✅ test: add hierarchical block sum estimation validation

2026-01-28 06:24:35 +08:00

test_needle_ref.py

[refactor] Refactor needle test.

2026-01-03 19:19:37 +08:00

test_needle.py

[WIP] Before integrate the xattn operator.

2026-01-19 21:19:21 +08:00

test_quest_policy.py

[WIP] move metadata to GPU.

2026-01-06 23:32:32 +08:00

test_ruler.py

📊 feat: distinguish compute density and communication density in DensityObserver

2026-02-05 01:43:17 +08:00

test_sequential.py

[WIP] Before fix bench_offload.py.

2026-01-06 18:41:08 +08:00

test_xattn_bsa.py

[WIP] Before refactor the compute)_chunked_prefill.

2026-01-23 03:36:12 +08:00

test_xattn_chunked.py

[WIP] Before refactor the compute)_chunked_prefill.

2026-01-23 03:36:12 +08:00

test_xattn_estimate_alignment.py

📝 docs: update density alignment test with Offload mode results

2026-02-02 14:22:40 +08:00

test_xattn_estimate_chunked.py

feat: add xattn_estimate_chunked for chunked prefill support

2026-01-22 01:13:17 +08:00

test_xattn_kernels.py

WIP: Enhance sparse attention with density tracking and block selection improvements

2026-01-31 14:48:23 +08:00

test_xattn_kv_chunking_batch.py

📝 docs: add storage overhead analysis and batch tests for KV chunking

2026-02-01 19:22:36 +08:00

utils.py

[WIP] Before fix bench_offload.py.

2026-01-06 18:41:08 +08:00