🐛 fix: remove torch.compile from add_rms_forward to avoid recompilation

The add_rms_forward method processes two input tensors (x and residual),
which causes torch.compile recompilation issues. Keep @torch.compile only
on rms_forward which processes a single input.

This prevents unnecessary recompilation overhead during inference.

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Zijie Tian
2026-01-14 07:02:02 +08:00
parent dce6ad6b74
commit c51a640a29

View File

@@ -27,13 +27,13 @@ class RMSNorm(nn.Module):
x = x.to(orig_dtype).mul_(self.weight)
return x
@torch.compile
def add_rms_forward(
self,
x: torch.Tensor,
residual: torch.Tensor,
) -> tuple[torch.Tensor, torch.Tensor]:
# Input MUST be 2D [N, D] to avoid recompilation due to rank mismatch
# Note: @torch.compile removed due to OOM with 64k sequences (memory fragmentation)
orig_dtype = x.dtype
x = x.float().add_(residual.float())
residual = x.to(orig_dtype)