docs: add chunked prefill analysis for ultra-long sequences

Add comprehensive analysis document covering:
- MLP activation memory bottlenecks with SwiGLU architecture
- Chunked MLP strategy (98% memory reduction)
- Chunked prefill for single layers (78% memory reduction)
- Streaming Chunked Prefill (最优方案): GPU memory becomes constant
- Memory formulas and implementation guidance
- Theoretical maximum: 4M tokens on 24GB GPU (128× improvement)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Zijie Tian
2026-01-16 10:38:02 +08:00
parent 2826a649de
commit cfb188c34a
2 changed files with 1056 additions and 0 deletions

File diff suppressed because it is too large Load Diff