[WIP] NEED refactor nanovllm mechenism.

This commit is contained in:
Zijie Tian
2025-12-22 23:52:56 +08:00
parent 1907b625b6
commit 4dcef16c13
10 changed files with 223 additions and 1099 deletions

View File

@@ -0,0 +1,38 @@
# Code Analysis
## Use cclsp MCP for Code Navigation
When analyzing code, understanding call chains, or exploring the codebase, **prefer using the cclsp MCP tools** over grep/glob-based searches:
### Available cclsp Tools
| Tool | Purpose |
|------|---------|
| `mcp__cclsp__find_definition` | Jump to symbol definition |
| `mcp__cclsp__find_references` | Find all usages of a symbol |
| `mcp__cclsp__rename_symbol` | Rename a symbol across the codebase |
| `mcp__cclsp__get_diagnostics` | Get LSP diagnostics (errors, warnings) |
| `mcp__cclsp__restart_server` | Restart the LSP server if needed |
### When to Use cclsp
1. **Understanding call chains**: Use `find_references` to trace how functions are called
2. **Finding implementations**: Use `find_definition` to jump to actual code
3. **Refactoring**: Use `rename_symbol` for safe cross-file renames
4. **Code quality**: Use `get_diagnostics` to check for issues
### Example Workflow
```
1. User asks: "How does the prefill flow work?"
2. Use find_definition to locate key entry points (e.g., run_chunked_offload_prefill)
3. Use find_references to trace the call chain through the codebase
4. Read relevant code sections to understand the implementation
```
### Benefits over grep/glob
- **Semantic understanding**: cclsp understands code structure, not just text patterns
- **Accurate references**: Finds actual usages, not just text matches
- **Cross-file navigation**: Follows imports and definitions across modules
- **Type-aware**: Understands Python types and class hierarchies

View File

@@ -1,20 +1,98 @@
# Testing
## Chunked Attention Test
## Test File Guidelines
```bash
CUDA_VISIBLE_DEVICES=4,5 python tests/test_chunked_attention.py 6 2048 64 2
# Args: num_gpu_blocks input_len output_len num_prefetch_blocks
### Naming Convention
- All test files must be named `test_*.py`
- Example: `test_offload_engine.py`, `test_ring_buffer.py`
### Purpose
Tests are **educational scripts** for understanding module behavior, NOT traditional unit tests:
- Focus on demonstrating how modules work
- Show the flow and interaction between components
- Help developers understand implementation details
### Code Style
1. **Script-based structure**: Write tests as executable scripts, not pytest-style functions
2. **Utility functions**: Extract reusable steps as helper functions at the top of the file
3. **Main flow as script**: The actual test/demonstration logic runs as top-level script code
```python
# Example structure:
import torch
from nanovllm.kvcache import SomeModule
# ============================================================
# Utility Functions
# ============================================================
def verify(tensor, expected, name):
actual = tensor.mean().item()
assert abs(actual - expected) < 0.01, f"{name}: {actual} != {expected}"
# ============================================================
# Main Test Script
# ============================================================
# 1. Initialize
module = SomeModule(param=value)
# 2. Test feature X
result = module.do_something()
assert result == expected_value
# 3. Test feature Y
...
print("test_xxx: PASSED")
```
## CPU Offload Testing
### Comments
- Keep comments concise and clear
- Only add comments where the code isn't self-explanatory
- Use section headers (`# === Section ===`) to organize logical blocks
### Output
- **Minimize print statements** - the code should be self-explanatory
- Only print a final "PASSED" message at the end
- Use `assert` for verification instead of printing results
- If the user needs explanation, they will ask
## Running Tests
```bash
# Basic test with limited GPU blocks to trigger offload
CUDA_VISIBLE_DEVICES=4,5 python tests/test_chunked_attention.py 6 2048 64 2
# Run a specific test
python tests/test_offload_engine.py
# Verify consistency (run multiple times, output should be identical)
for i in 1 2 3; do
CUDA_VISIBLE_DEVICES=4,5 python tests/test_chunked_attention.py 6 2048 32 2 2>&1 | tail -3
done
# Run with specific GPU
CUDA_VISIBLE_DEVICES=0 python tests/test_ring_buffer.py
```
## Benchmarks
```bash
# Standard GPU benchmark
python bench.py
# CPU offload benchmark
python bench_offload.py
# vLLM comparison benchmark
python bench_vllm.py
```
## Quick Verification
```bash
# Import test
python -c "from nanovllm import LLM"
# Run offload benchmark (tests CPU-primary ring buffer mode)
python bench_offload.py
```