Files
nano-vllm/.claude/rules/commands.md
2025-12-15 21:43:33 +08:00

433 B

Commands

Installation

pip install -e .

Running

# Run example
python example.py

# Run benchmarks
python bench.py                    # Standard benchmark
python bench_offload.py            # CPU offload benchmark

Config Defaults

  • max_num_batched_tokens: 16384
  • max_num_seqs: 512
  • kvcache_block_size: 4096
  • gpu_memory_utilization: 0.9
  • enforce_eager: False (enables CUDA graphs)