[WIP] NEED refactor nanovllm mechenism.

2025-12-22 23:52:56 +08:00
parent 1907b625b6
commit 4dcef16c13
10 changed files with 223 additions and 1099 deletions
--- a/.claude/rules/code-analysis.md
+++ b/.claude/rules/code-analysis.md
@@ -0,0 +1,38 @@
+# Code Analysis
+
+## Use cclsp MCP for Code Navigation
+
+When analyzing code, understanding call chains, or exploring the codebase, **prefer using the cclsp MCP tools** over grep/glob-based searches:
+
+### Available cclsp Tools
+
+| Tool | Purpose |
+|------|---------|
+| `mcp__cclsp__find_definition` | Jump to symbol definition |
+| `mcp__cclsp__find_references` | Find all usages of a symbol |
+| `mcp__cclsp__rename_symbol` | Rename a symbol across the codebase |
+| `mcp__cclsp__get_diagnostics` | Get LSP diagnostics (errors, warnings) |
+| `mcp__cclsp__restart_server` | Restart the LSP server if needed |
+
+### When to Use cclsp
+
+1. **Understanding call chains**: Use `find_references` to trace how functions are called
+2. **Finding implementations**: Use `find_definition` to jump to actual code
+3. **Refactoring**: Use `rename_symbol` for safe cross-file renames
+4. **Code quality**: Use `get_diagnostics` to check for issues
+
+### Example Workflow
+
+```
+1. User asks: "How does the prefill flow work?"
+2. Use find_definition to locate key entry points (e.g., run_chunked_offload_prefill)
+3. Use find_references to trace the call chain through the codebase
+4. Read relevant code sections to understand the implementation
+```
+
+### Benefits over grep/glob
+
+- **Semantic understanding**: cclsp understands code structure, not just text patterns
+- **Accurate references**: Finds actual usages, not just text matches
+- **Cross-file navigation**: Follows imports and definitions across modules
+- **Type-aware**: Understands Python types and class hierarchies
--- a/.claude/rules/testing.md
+++ b/.claude/rules/testing.md
@@ -1,20 +1,98 @@
 # Testing

-## Chunked Attention Test
+## Test File Guidelines

-```bash
-CUDA_VISIBLE_DEVICES=4,5 python tests/test_chunked_attention.py 6 2048 64 2
-# Args: num_gpu_blocks input_len output_len num_prefetch_blocks
+### Naming Convention
+
+- All test files must be named `test_*.py`
+- Example: `test_offload_engine.py`, `test_ring_buffer.py`
+
+### Purpose
+
+Tests are **educational scripts** for understanding module behavior, NOT traditional unit tests:
+- Focus on demonstrating how modules work
+- Show the flow and interaction between components
+- Help developers understand implementation details
+
+### Code Style
+
+1. **Script-based structure**: Write tests as executable scripts, not pytest-style functions
+2. **Utility functions**: Extract reusable steps as helper functions at the top of the file
+3. **Main flow as script**: The actual test/demonstration logic runs as top-level script code
+
+```python
+# Example structure:
+
+import torch
+from nanovllm.kvcache import SomeModule
+
+# ============================================================
+# Utility Functions
+# ============================================================
+
+def verify(tensor, expected, name):
+    actual = tensor.mean().item()
+    assert abs(actual - expected) < 0.01, f"{name}: {actual} != {expected}"
+
+# ============================================================
+# Main Test Script
+# ============================================================
+
+# 1. Initialize
+module = SomeModule(param=value)
+
+# 2. Test feature X
+result = module.do_something()
+assert result == expected_value
+
+# 3. Test feature Y
+...
+
+print("test_xxx: PASSED")
 ```

-## CPU Offload Testing
+### Comments
+
+- Keep comments concise and clear
+- Only add comments where the code isn't self-explanatory
+- Use section headers (`# === Section ===`) to organize logical blocks
+
+### Output
+
+- **Minimize print statements** - the code should be self-explanatory
+- Only print a final "PASSED" message at the end
+- Use `assert` for verification instead of printing results
+- If the user needs explanation, they will ask
+
+## Running Tests

 ```bash
-# Basic test with limited GPU blocks to trigger offload
-CUDA_VISIBLE_DEVICES=4,5 python tests/test_chunked_attention.py 6 2048 64 2
+# Run a specific test
+python tests/test_offload_engine.py

-# Verify consistency (run multiple times, output should be identical)
-for i in 1 2 3; do
-  CUDA_VISIBLE_DEVICES=4,5 python tests/test_chunked_attention.py 6 2048 32 2 2>&1 | tail -3
-done
+# Run with specific GPU
+CUDA_VISIBLE_DEVICES=0 python tests/test_ring_buffer.py
+```
+
+## Benchmarks
+
+```bash
+# Standard GPU benchmark
+python bench.py
+
+# CPU offload benchmark
+python bench_offload.py
+
+# vLLM comparison benchmark
+python bench_vllm.py
+```
+
+## Quick Verification
+
+```bash
+# Import test
+python -c "from nanovllm import LLM"
+
+# Run offload benchmark (tests CPU-primary ring buffer mode)
+python bench_offload.py
 ```