Commit Graph

11 Commits

Author SHA1 Message Date
Zijie Tian
e09a2a5b10 feat: add Qwen2/2.5 model support
Separate Qwen2 from Qwen3 implementation:
- Qwen2: Uses QKV bias, no QK norm
- Qwen3: Has optional QK norm when no bias

Tested with Qwen2.5-7B-Instruct-1M, RULER niah_single_1 passed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 13:44:32 +08:00
Zijie Tian
03a8c033cb [claudesquad] update from 'add-llama-1' on 10 Jan 26 21:03 CST 2026-01-10 21:03:45 +08:00
Zijie Tian
babfa17354 [refactor] Translate into english, void Chinese due to claude. 2025-12-11 00:30:24 +08:00
GeeeekExplorer
2f21442653 support qwen2 2025-11-04 01:44:42 +08:00
GeeeekExplorer
df99418f7d simplify 2025-08-31 20:02:51 +08:00
GeeeekExplorer
38baf0bbe4 remove assert shape 2025-06-27 23:00:30 +08:00
GeeeekExplorer
cde3fc22c2 simplify 2025-06-21 17:19:15 +08:00
cheunglei
53b3ef2e32 support tensor parallel 2025-06-15 01:31:24 +08:00
GeeeekExplorer
08c84ec08d multi file loader 2025-06-12 01:00:09 +08:00
GeeeekExplorer
b98e1ca305 fix 2025-06-10 21:25:54 +08:00
GeeeekExplorer
a5a4909e6a init commit 2025-06-10 00:27:01 +08:00