- Create nanovllm/ops/ module for low-level attention operators - Move chunked_attention.py from kvcache/ to ops/ - Update imports in full_policy.py (3 locations) - Fix: remove dead code in OffloadEngine.reset() referencing non-existent layer_k/v_buffer_a/b attributes Verified with needle test (32K offload): PASSED Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
20 KiB
20 KiB