[WIP] NEED refactor nanovllm mechenism.

This commit is contained in:
Zijie Tian
2025-12-22 23:52:56 +08:00
parent 1907b625b6
commit 4dcef16c13
10 changed files with 223 additions and 1099 deletions

View File

@@ -1,20 +1,98 @@
# Testing
## Chunked Attention Test
## Test File Guidelines
```bash
CUDA_VISIBLE_DEVICES=4,5 python tests/test_chunked_attention.py 6 2048 64 2
# Args: num_gpu_blocks input_len output_len num_prefetch_blocks
### Naming Convention
- All test files must be named `test_*.py`
- Example: `test_offload_engine.py`, `test_ring_buffer.py`
### Purpose
Tests are **educational scripts** for understanding module behavior, NOT traditional unit tests:
- Focus on demonstrating how modules work
- Show the flow and interaction between components
- Help developers understand implementation details
### Code Style
1. **Script-based structure**: Write tests as executable scripts, not pytest-style functions
2. **Utility functions**: Extract reusable steps as helper functions at the top of the file
3. **Main flow as script**: The actual test/demonstration logic runs as top-level script code
```python
# Example structure:
import torch
from nanovllm.kvcache import SomeModule
# ============================================================
# Utility Functions
# ============================================================
def verify(tensor, expected, name):
actual = tensor.mean().item()
assert abs(actual - expected) < 0.01, f"{name}: {actual} != {expected}"
# ============================================================
# Main Test Script
# ============================================================
# 1. Initialize
module = SomeModule(param=value)
# 2. Test feature X
result = module.do_something()
assert result == expected_value
# 3. Test feature Y
...
print("test_xxx: PASSED")
```
## CPU Offload Testing
### Comments
- Keep comments concise and clear
- Only add comments where the code isn't self-explanatory
- Use section headers (`# === Section ===`) to organize logical blocks
### Output
- **Minimize print statements** - the code should be self-explanatory
- Only print a final "PASSED" message at the end
- Use `assert` for verification instead of printing results
- If the user needs explanation, they will ask
## Running Tests
```bash
# Basic test with limited GPU blocks to trigger offload
CUDA_VISIBLE_DEVICES=4,5 python tests/test_chunked_attention.py 6 2048 64 2
# Run a specific test
python tests/test_offload_engine.py
# Verify consistency (run multiple times, output should be identical)
for i in 1 2 3; do
CUDA_VISIBLE_DEVICES=4,5 python tests/test_chunked_attention.py 6 2048 32 2 2>&1 | tail -3
done
# Run with specific GPU
CUDA_VISIBLE_DEVICES=0 python tests/test_ring_buffer.py
```
## Benchmarks
```bash
# Standard GPU benchmark
python bench.py
# CPU offload benchmark
python bench_offload.py
# vLLM comparison benchmark
python bench_vllm.py
```
## Quick Verification
```bash
# Import test
python -c "from nanovllm import LLM"
# Run offload benchmark (tests CPU-primary ring buffer mode)
python bench_offload.py
```