nano-vllm

Files

Zijie Tian 690456dbf9 ♻️ refactor: create ops module and move chunked_attention

- Create nanovllm/ops/ module for low-level attention operators
- Move chunked_attention.py from kvcache/ to ops/
- Update imports in full_policy.py (3 locations)
- Fix: remove dead code in OffloadEngine.reset() referencing
  non-existent layer_k/v_buffer_a/b attributes

Verified with needle test (32K offload): PASSED

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-20 02:50:14 +08:00

policies

[feat] Added chunked prefill and kvcache offload mechenism.

2025-12-10 03:47:37 +08:00

sparse

♻️ refactor: create ops module and move chunked_attention

2026-01-20 02:50:14 +08:00

__init__.py

[WIP] Before integrate the xattn operator.

2026-01-19 21:19:21 +08:00

base_manager.py

[feat] Added chunked prefill and kvcache offload mechenism.