♻️ refactor: create ops module and move chunked_attention
- Create nanovllm/ops/ module for low-level attention operators - Move chunked_attention.py from kvcache/ to ops/ - Update imports in full_policy.py (3 locations) - Fix: remove dead code in OffloadEngine.reset() referencing non-existent layer_k/v_buffer_a/b attributes Verified with needle test (32K offload): PASSED Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
19
nanovllm/ops/__init__.py
Normal file
19
nanovllm/ops/__init__.py
Normal file
@@ -0,0 +1,19 @@
|
||||
"""
|
||||
Operators module for nano-vLLM.
|
||||
|
||||
This module contains low-level attention operators and kernels.
|
||||
"""
|
||||
|
||||
from nanovllm.ops.chunked_attention import (
|
||||
flash_attn_with_lse,
|
||||
merge_attention_outputs,
|
||||
chunked_attention_varlen,
|
||||
ChunkedPrefillState,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"flash_attn_with_lse",
|
||||
"merge_attention_outputs",
|
||||
"chunked_attention_varlen",
|
||||
"ChunkedPrefillState",
|
||||
]
|
||||
Reference in New Issue
Block a user