♻️ refactor: create ops module and move chunked_attention

- Create nanovllm/ops/ module for low-level attention operators - Move chunked_attention.py from kvcache/ to ops/ - Update imports in full_policy.py (3 locations) - Fix: remove dead code in OffloadEngine.reset() referencing non-existent layer_k/v_buffer_a/b attributes Verified with needle test (32K offload): PASSED Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 02:50:14 +08:00
parent e440c45e73
commit 690456dbf9
4 changed files with 22 additions and 10 deletions
--- a/nanovllm/ops/init.py
+++ b/nanovllm/ops/init.py
@@ -0,0 +1,19 @@
+"""
+Operators module for nano-vLLM.
+
+This module contains low-level attention operators and kernels.
+"""
+
+from nanovllm.ops.chunked_attention import (
+    flash_attn_with_lse,
+    merge_attention_outputs,
+    chunked_attention_varlen,
+    ChunkedPrefillState,
+)
+
+__all__ = [
+    "flash_attn_with_lse",
+    "merge_attention_outputs",
+    "chunked_attention_varlen",
+    "ChunkedPrefillState",
+]