nano-vllm

Files

Zijie Tian 8ab53e7331 🚧 WIP: add DEBUG code for XAttention KV chunking density verification

Add instrumentation to compare GPU-only vs Offload mode density:
- Layer 0 DEBUG output for both modes
- Accumulate selected/total counts across chunks
- Proper causal mask with Q offset handling
- Skip normal offload logic for isolated testing

Test results (threshold=1.0 achieves alignment):
- 32K: GPU-only 0.9999, Offload 0.9999 (diff ~0%)
- 64K: GPU-only 0.9995, Offload 0.9995 (diff ~0%)

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>

2026-02-01 17:33:23 +08:00

__init__.py

✨ feat: add configurable stride and chunk_size for XAttention BSA

2026-01-23 10:37:04 +08:00

full_policy.py

WIP: Enhance sparse attention with density tracking and block selection improvements

2026-01-31 14:48:23 +08:00

policy.py

WIP: Enhance sparse attention with density tracking and block selection improvements

2026-01-31 14:48:23 +08:00

quest.py

WIP: Enhance sparse attention with density tracking and block selection improvements

2026-01-31 14:48:23 +08:00

xattn_bsa.py

🚧 WIP: add DEBUG code for XAttention KV chunking density verification

2026-02-01 17:33:23 +08:00