Zijie Tian
|
0d31b3f71f
|
📝 docs: add CPU offload optimization strategies guide
- Document chunk size optimization (simplest, most effective)
- Analyze CUDA Graph limitations for offload scenarios
- Cover CUDA Graph applicability for MLP/Proj layers
- Survey frontier research: InfiniGen, ShadowKV, L2 Prefetch, KVPR
- Add optimization priority recommendations
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
|
2026-01-27 04:44:36 +08:00 |
|