nano-vllm

Files

Zijie Tian c51a640a29 🐛 fix: remove torch.compile from add_rms_forward to avoid recompilation

The add_rms_forward method processes two input tensors (x and residual),
which causes torch.compile recompilation issues. Keep @torch.compile only
on rms_forward which processes a single input.

This prevents unnecessary recompilation overhead during inference.

Co-Authored-By: Claude <noreply@anthropic.com>

2026-01-14 07:02:02 +08:00

comm

[WIP] Added sgDMA operator for scatter kvcache communication.

2025-12-24 23:48:52 +08:00

debug

[refactor] Refactor the kvcache offload.

2026-01-04 19:37:03 +08:00

engine

♻️ refactor: chunked LayerNorm/QKV/MLP for 64k memory optimization

2026-01-14 07:01:57 +08:00

kvcache

[claudesquad] update from 'multi-request-2' on 13 Jan 26 02:01 CST