[WIP] NEED refactor nanovllm mechenism.
This commit is contained in:
38
.claude/rules/code-analysis.md
Normal file
38
.claude/rules/code-analysis.md
Normal file
@@ -0,0 +1,38 @@
|
||||
# Code Analysis
|
||||
|
||||
## Use cclsp MCP for Code Navigation
|
||||
|
||||
When analyzing code, understanding call chains, or exploring the codebase, **prefer using the cclsp MCP tools** over grep/glob-based searches:
|
||||
|
||||
### Available cclsp Tools
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| `mcp__cclsp__find_definition` | Jump to symbol definition |
|
||||
| `mcp__cclsp__find_references` | Find all usages of a symbol |
|
||||
| `mcp__cclsp__rename_symbol` | Rename a symbol across the codebase |
|
||||
| `mcp__cclsp__get_diagnostics` | Get LSP diagnostics (errors, warnings) |
|
||||
| `mcp__cclsp__restart_server` | Restart the LSP server if needed |
|
||||
|
||||
### When to Use cclsp
|
||||
|
||||
1. **Understanding call chains**: Use `find_references` to trace how functions are called
|
||||
2. **Finding implementations**: Use `find_definition` to jump to actual code
|
||||
3. **Refactoring**: Use `rename_symbol` for safe cross-file renames
|
||||
4. **Code quality**: Use `get_diagnostics` to check for issues
|
||||
|
||||
### Example Workflow
|
||||
|
||||
```
|
||||
1. User asks: "How does the prefill flow work?"
|
||||
2. Use find_definition to locate key entry points (e.g., run_chunked_offload_prefill)
|
||||
3. Use find_references to trace the call chain through the codebase
|
||||
4. Read relevant code sections to understand the implementation
|
||||
```
|
||||
|
||||
### Benefits over grep/glob
|
||||
|
||||
- **Semantic understanding**: cclsp understands code structure, not just text patterns
|
||||
- **Accurate references**: Finds actual usages, not just text matches
|
||||
- **Cross-file navigation**: Follows imports and definitions across modules
|
||||
- **Type-aware**: Understands Python types and class hierarchies
|
||||
@@ -1,20 +1,98 @@
|
||||
# Testing
|
||||
|
||||
## Chunked Attention Test
|
||||
## Test File Guidelines
|
||||
|
||||
```bash
|
||||
CUDA_VISIBLE_DEVICES=4,5 python tests/test_chunked_attention.py 6 2048 64 2
|
||||
# Args: num_gpu_blocks input_len output_len num_prefetch_blocks
|
||||
### Naming Convention
|
||||
|
||||
- All test files must be named `test_*.py`
|
||||
- Example: `test_offload_engine.py`, `test_ring_buffer.py`
|
||||
|
||||
### Purpose
|
||||
|
||||
Tests are **educational scripts** for understanding module behavior, NOT traditional unit tests:
|
||||
- Focus on demonstrating how modules work
|
||||
- Show the flow and interaction between components
|
||||
- Help developers understand implementation details
|
||||
|
||||
### Code Style
|
||||
|
||||
1. **Script-based structure**: Write tests as executable scripts, not pytest-style functions
|
||||
2. **Utility functions**: Extract reusable steps as helper functions at the top of the file
|
||||
3. **Main flow as script**: The actual test/demonstration logic runs as top-level script code
|
||||
|
||||
```python
|
||||
# Example structure:
|
||||
|
||||
import torch
|
||||
from nanovllm.kvcache import SomeModule
|
||||
|
||||
# ============================================================
|
||||
# Utility Functions
|
||||
# ============================================================
|
||||
|
||||
def verify(tensor, expected, name):
|
||||
actual = tensor.mean().item()
|
||||
assert abs(actual - expected) < 0.01, f"{name}: {actual} != {expected}"
|
||||
|
||||
# ============================================================
|
||||
# Main Test Script
|
||||
# ============================================================
|
||||
|
||||
# 1. Initialize
|
||||
module = SomeModule(param=value)
|
||||
|
||||
# 2. Test feature X
|
||||
result = module.do_something()
|
||||
assert result == expected_value
|
||||
|
||||
# 3. Test feature Y
|
||||
...
|
||||
|
||||
print("test_xxx: PASSED")
|
||||
```
|
||||
|
||||
## CPU Offload Testing
|
||||
### Comments
|
||||
|
||||
- Keep comments concise and clear
|
||||
- Only add comments where the code isn't self-explanatory
|
||||
- Use section headers (`# === Section ===`) to organize logical blocks
|
||||
|
||||
### Output
|
||||
|
||||
- **Minimize print statements** - the code should be self-explanatory
|
||||
- Only print a final "PASSED" message at the end
|
||||
- Use `assert` for verification instead of printing results
|
||||
- If the user needs explanation, they will ask
|
||||
|
||||
## Running Tests
|
||||
|
||||
```bash
|
||||
# Basic test with limited GPU blocks to trigger offload
|
||||
CUDA_VISIBLE_DEVICES=4,5 python tests/test_chunked_attention.py 6 2048 64 2
|
||||
# Run a specific test
|
||||
python tests/test_offload_engine.py
|
||||
|
||||
# Verify consistency (run multiple times, output should be identical)
|
||||
for i in 1 2 3; do
|
||||
CUDA_VISIBLE_DEVICES=4,5 python tests/test_chunked_attention.py 6 2048 32 2 2>&1 | tail -3
|
||||
done
|
||||
# Run with specific GPU
|
||||
CUDA_VISIBLE_DEVICES=0 python tests/test_ring_buffer.py
|
||||
```
|
||||
|
||||
## Benchmarks
|
||||
|
||||
```bash
|
||||
# Standard GPU benchmark
|
||||
python bench.py
|
||||
|
||||
# CPU offload benchmark
|
||||
python bench_offload.py
|
||||
|
||||
# vLLM comparison benchmark
|
||||
python bench_vllm.py
|
||||
```
|
||||
|
||||
## Quick Verification
|
||||
|
||||
```bash
|
||||
# Import test
|
||||
python -c "from nanovllm import LLM"
|
||||
|
||||
# Run offload benchmark (tests CPU-primary ring buffer mode)
|
||||
python bench_offload.py
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user