Files
nano-vllm/.claude/rules/commands.md
2025-12-15 00:20:54 +08:00

27 lines
432 B
Markdown

# Commands
## Installation
```bash
pip install -e .
```
## Running
```bash
# Run example
python example.py
# Run benchmarks
python bench.py # Standard benchmark
python bench_offload.py # CPU offload benchmark
```
## Config Defaults
- `max_num_batched_tokens`: 16384
- `max_num_seqs`: 512
- `kvcache_block_size`: 256
- `gpu_memory_utilization`: 0.9
- `enforce_eager`: False (enables CUDA graphs)