✨ feat: add Qwen2/2.5 model support
Separate Qwen2 from Qwen3 implementation: - Qwen2: Uses QKV bias, no QK norm - Qwen3: Has optional QK norm when no bias Tested with Qwen2.5-7B-Instruct-1M, RULER niah_single_1 passed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -187,7 +187,7 @@ class Qwen3Model(nn.Module):
|
||||
return hidden_states
|
||||
|
||||
|
||||
@register_model("Qwen3ForCausalLM", "Qwen2ForCausalLM")
|
||||
@register_model("Qwen3ForCausalLM")
|
||||
class Qwen3ForCausalLM(nn.Module):
|
||||
packed_modules_mapping = {
|
||||
"q_proj": ("qkv_proj", "q"),
|
||||
|
||||
Reference in New Issue
Block a user