feat: add Qwen2/2.5 model support

Separate Qwen2 from Qwen3 implementation:
- Qwen2: Uses QKV bias, no QK norm
- Qwen3: Has optional QK norm when no bias

Tested with Qwen2.5-7B-Instruct-1M, RULER niah_single_1 passed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Zijie Tian
2026-01-28 13:44:32 +08:00
parent a239bfb40d
commit e09a2a5b10
3 changed files with 209 additions and 1 deletions

View File

@@ -187,7 +187,7 @@ class Qwen3Model(nn.Module):
return hidden_states
@register_model("Qwen3ForCausalLM", "Qwen2ForCausalLM")
@register_model("Qwen3ForCausalLM")
class Qwen3ForCausalLM(nn.Module):
packed_modules_mapping = {
"q_proj": ("qkv_proj", "q"),