🐛 fix: remove torch.compile from add_rms_forward to avoid recompilation
The add_rms_forward method processes two input tensors (x and residual), which causes torch.compile recompilation issues. Keep @torch.compile only on rms_forward which processes a single input. This prevents unnecessary recompilation overhead during inference. Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -27,13 +27,13 @@ class RMSNorm(nn.Module):
|
|||||||
x = x.to(orig_dtype).mul_(self.weight)
|
x = x.to(orig_dtype).mul_(self.weight)
|
||||||
return x
|
return x
|
||||||
|
|
||||||
@torch.compile
|
|
||||||
def add_rms_forward(
|
def add_rms_forward(
|
||||||
self,
|
self,
|
||||||
x: torch.Tensor,
|
x: torch.Tensor,
|
||||||
residual: torch.Tensor,
|
residual: torch.Tensor,
|
||||||
) -> tuple[torch.Tensor, torch.Tensor]:
|
) -> tuple[torch.Tensor, torch.Tensor]:
|
||||||
# Input MUST be 2D [N, D] to avoid recompilation due to rank mismatch
|
# Input MUST be 2D [N, D] to avoid recompilation due to rank mismatch
|
||||||
|
# Note: @torch.compile removed due to OOM with 64k sequences (memory fragmentation)
|
||||||
orig_dtype = x.dtype
|
orig_dtype = x.dtype
|
||||||
x = x.float().add_(residual.float())
|
x = x.float().add_(residual.float())
|
||||||
residual = x.to(orig_dtype)
|
residual = x.to(orig_dtype)
|
||||||
|
|||||||
Reference in New Issue
Block a user