nano-vllm/nanovllm/ops/chunked_attention.py at da5e13e2bb723e2761f35efbb2e3e5f39cb49b62

Files

Zijie Tian 690456dbf9 ♻️ refactor: create ops module and move chunked_attention

- Create nanovllm/ops/ module for low-level attention operators
- Move chunked_attention.py from kvcache/ to ops/
- Update imports in full_policy.py (3 locations)
- Fix: remove dead code in OffloadEngine.reset() referencing
  non-existent layer_k/v_buffer_a/b attributes

Verified with needle test (32K offload): PASSED

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-20 02:50:14 +08:00

20 KiB

Raw Blame History

View Raw

20 KiB Raw Blame History

20 KiB

Raw Blame History