📝 docs: add CPU offload optimization strategies guide

- Document chunk size optimization (simplest, most effective)
- Analyze CUDA Graph limitations for offload scenarios
- Cover CUDA Graph applicability for MLP/Proj layers
- Survey frontier research: InfiniGen, ShadowKV, L2 Prefetch, KVPR
- Add optimization priority recommendations

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
This commit is contained in:
Zijie Tian
2026-01-27 04:44:36 +08:00
parent 73c9dc46ff
commit 0d31b3f71f
2 changed files with 301 additions and 0 deletions

View File

@@ -28,6 +28,7 @@ Nano-vLLM is a lightweight vLLM implementation (~1,200 lines) for fast offline L
| [`docs/nsys_wrong_event_order_bug.md`](docs/nsys_wrong_event_order_bug.md) | 🐛 NSYS BUG: Ring buffer pipeline 触发 nsys 时间戳乱序问题的调试记录 |
| [`docs/cpu_scheduling_latency_analysis.md`](docs/cpu_scheduling_latency_analysis.md) | ⚡ PERF: CPU 调度延迟分析kernel 间隙来源GPU 利用率优化方向 |
| [`docs/bench_offload_results.md`](docs/bench_offload_results.md) | 📊 BENCH: CPU offload 性能测试结果Full vs XAttention 对比 (32K/128K) |
| [`docs/cpu_offload_optimization_strategies.md`](docs/cpu_offload_optimization_strategies.md) | 🚀 OPT: CPU offload 优化策略chunk size、CUDA Graph、前沿研究(InfiniGen/ShadowKV) |
## Rules Index