[WIP] remove num_prefetch_blocks varible.

This commit is contained in:
Zijie Tian
2025-12-24 18:22:26 +08:00
parent b264de903d
commit 782437c486
10 changed files with 465 additions and 18 deletions

View File

@@ -237,7 +237,6 @@ Warmup uses a reasonable sequence length (`block_size * 2`) instead of `max_mode
| `max_num_seqs` | 512 | Max concurrent sequences |
| `gpu_memory_utilization` | 0.9 | GPU memory fraction for KV cache |
| `enforce_eager` | False | Disable CUDA graphs if True |
| `num_prefetch_blocks` | 2 | Ring buffer pipeline depth (deprecated, uses num_gpu_blocks) |
## Benchmarking