Files
nano-vllm/nanovllm
Zijie Tian c51a640a29 🐛 fix: remove torch.compile from add_rms_forward to avoid recompilation
The add_rms_forward method processes two input tensors (x and residual),
which causes torch.compile recompilation issues. Keep @torch.compile only
on rms_forward which processes a single input.

This prevents unnecessary recompilation overhead during inference.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-14 07:02:02 +08:00
..
2025-06-15 10:36:45 +08:00
2025-06-15 01:31:24 +08:00
2025-08-31 22:55:34 +08:00