The add_rms_forward method processes two input tensors (x and residual), which causes torch.compile recompilation issues. Keep @torch.compile only on rms_forward which processes a single input. This prevents unnecessary recompilation overhead during inference. Co-Authored-By: Claude <noreply@anthropic.com>
1.6 KiB
Executable File
1.6 KiB
Executable File