Files
nano-vllm/nanovllm/kvcache/offload_engine.py
Zijie Tian 18bc433f09 perf: improve NVTX profiling with colored ranges and configurable slots
- Switch from torch.cuda.nvtx to nvtx package for colored range support
- Add color coding: blue for H2D, green for D2H decode, orange for D2H prefill
- Add --num-gpu-blocks parameter to profile_offload.sh
- Include slot count in output filename for easier comparison

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
2026-01-27 03:42:05 +08:00

34 KiB