432 B
432 B
Commands
Installation
pip install -e .
Running
# Run example
python example.py
# Run benchmarks
python bench.py # Standard benchmark
python bench_offload.py # CPU offload benchmark
Config Defaults
max_num_batched_tokens: 16384max_num_seqs: 512kvcache_block_size: 256gpu_memory_utilization: 0.9enforce_eager: False (enables CUDA graphs)