Commit Graph

1 Commits

Author SHA1 Message Date
Zijie Tian
cfb188c34a docs: add chunked prefill analysis for ultra-long sequences
Add comprehensive analysis document covering:
- MLP activation memory bottlenecks with SwiGLU architecture
- Chunked MLP strategy (98% memory reduction)
- Chunked prefill for single layers (78% memory reduction)
- Streaming Chunked Prefill (最优方案): GPU memory becomes constant
- Memory formulas and implementation guidance
- Theoretical maximum: 4M tokens on 24GB GPU (128× improvement)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-16 10:38:02 +08:00