nano-vllm

zijie-tian/nano-vllm

Fork 0

Commit Graph

Author	SHA1	Message	Date
Zijie Tian	193ef55d18	♻️ refactor: use Q-chunked processing in xattn alignment test Match xattn_estimate internal logic by processing Q in chunks: - Reduces peak memory for attn_scores tensor - Enables testing 64K sequences without OOM - All 5 test files pass (3.6K to 64K) Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>	2026-02-01 18:08:15 +08:00
Zijie Tian	f173a3f7f5	✅ test: add xattn_estimate vs low-level kernels alignment test Test that xattn_estimate produces the same results as manually calling: - flat_group_gemm_fuse_reshape - softmax_fuse_block_sum - find_blocks_chunked Uses real KV cache data from results/kvcache/ directory. Verifies density calculation matches between high-level API and kernels. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>	2026-02-01 17:49:37 +08:00

Author

SHA1

Message

Date

Zijie Tian

193ef55d18

♻️ refactor: use Q-chunked processing in xattn alignment test

Match xattn_estimate internal logic by processing Q in chunks:
- Reduces peak memory for attn_scores tensor
- Enables testing 64K sequences without OOM
- All 5 test files pass (3.6K to 64K)

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>

2026-02-01 18:08:15 +08:00

Zijie Tian

f173a3f7f5

✅ test: add xattn_estimate vs low-level kernels alignment test

Test that xattn_estimate produces the same results as manually calling:
- flat_group_gemm_fuse_reshape
- softmax_fuse_block_sum
- find_blocks_chunked

Uses real KV cache data from results/kvcache/ directory.
Verifies density calculation matches between high-level API and kernels.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>

2026-02-01 17:49:37 +08:00

2 Commits