Zijie Tian
|
3956a30b14
|
🔧 chore: add --use-v1 flag to bench_vllm.py
Allow switching between vLLM V1/V2 engines via command line flag.
Default behavior now uses V2 (VLLM_USE_V1=0).
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
|
2026-01-27 09:14:55 +08:00 |
|
Zijie Tian
|
59473fa432
|
🔧 chore: add configurable arguments to bench_vllm.py
Add --model, --gpu-util, and --enforce-eager arguments for flexible
vLLM benchmarking comparisons.
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
|
2026-01-27 09:07:49 +08:00 |
|
Zijie Tian
|
aa953ecb59
|
[refactor] Aligned the bench.
|
2026-01-07 04:25:06 +08:00 |
|
Zijie Tian
|
82ed34fc2d
|
[opt] optimize nanovllm performance compareable with vllm.
|
2025-12-25 03:47:07 +08:00 |
|
Zijie Tian
|
08d83185ce
|
[fix] fix bench*.py.
|
2025-12-22 19:53:50 +08:00 |
|
Zijie Tian
|
0b6f19242d
|
[feat] Added chunked prefill and kvcache offload mechenism.
|
2025-12-10 03:47:37 +08:00 |
|
Zijie Tian
|
761929390e
|
[bench] Added vllm vs nano-vllm bench.
|
2025-12-10 00:44:57 +08:00 |
|