nano-vllm

Files

Zijie Tian a50b4c2ac2 ♻️ refactor: move select_blocks from policy to attention layer

Move block selection logic from compute_chunked_prefill/decode methods
to attention.py caller. This improves separation of concerns:

- attention.py now calls select_blocks() before compute_chunked_*()
- Policy methods receive pre-selected blocks via selected_blocks parameter
- Enables sparse policies to implement custom block selection without
  modifying the compute path

Changes:
- policy.py: Add selected_blocks parameter to abstract methods
- full_policy.py: Remove internal select_blocks calls, use passed blocks
- xattn_bsa.py: Sync signatures for prefill/decode methods
- attention.py: Add select_blocks calls before policy delegation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-23 05:21:28 +08:00

activation.py

fix

2025-06-15 13:28:29 +08:00

attention.py

♻️ refactor: move select_blocks from policy to attention layer

2026-01-23 05:21:28 +08:00

embed_head.py

simplify

2025-08-31 20:02:51 +08:00

layernorm.py

[refactor] Translate into english, void Chinese due to claude.