[claudesquad] update from 'add-llama-1' on 10 Jan 26 21:03 CST

2026-01-10 21:03:45 +08:00
parent 6575099a06
commit 03a8c033cb
10 changed files with 858 additions and 7 deletions
--- a/progress.md
+++ b/progress.md
@@ -0,0 +1,76 @@
+# Progress Log: Multi-Model Support
+
+## Session: 2026-01-10
+
+### Initial Analysis Complete
+
+**Time**: Session start
+
+**Actions:**
+1. Read `nanovllm/engine/model_runner.py` - 确认硬编码位置 (line 35)
+2. Read `nanovllm/models/qwen3.py` - 理解 Qwen3 模型结构
+3. Read `nanovllm/utils/loader.py` - 理解权重加载机制
+4. Read `nanovllm/layers/rotary_embedding.py` - 发现 RoPE scaling 限制
+5. Read `/home/zijie/models/Llama-3.1-8B-Instruct/config.json` - 理解 Llama 配置
+
+**Key Findings:**
+- 模型加载在 `model_runner.py:35` 硬编码为 Qwen3
+- RoPE 目前不支持 scaling (`assert rope_scaling is None`)
+- Llama 3.1 需要 "llama3" 类型的 RoPE scaling
+- Llama 无 q_norm/k_norm，无 attention bias
+
+**Created:**
+- `task_plan.md` - 6 阶段实施计划
+- `findings.md` - 技术分析和发现
+
+---
+
+### Phase Status
+
+| Phase | Status | Notes |
+|-------|--------|-------|
+| 1. Model Registry | **COMPLETED** | `registry.py`, `__init__.py` |
+| 2. Llama3 RoPE | **COMPLETED** | `rotary_embedding.py` |
+| 3. Llama Model | **COMPLETED** | `llama.py` |
+| 4. ModelRunner | **COMPLETED** | Dynamic loading |
+| 5. Qwen3 Register | **COMPLETED** | `@register_model` decorator |
+| 6. Testing | **COMPLETED** | Both Llama & Qwen3 pass |
+
+---
+
+## Test Results
+
+### Llama 3.1-8B-Instruct (32K needle, GPU 0, offload)
+```
+Input: 32768 tokens
+Expected: 7492
+Output: 7492
+Status: PASSED
+Prefill: 1644 tok/s
+```
+
+### Qwen3-4B (8K needle, GPU 1, offload) - Regression Test
+```
+Input: 8192 tokens
+Expected: 7492
+Output: 7492
+Status: PASSED
+Prefill: 3295 tok/s
+```
+
+---
+
+## Files Modified This Session
+
+| File | Action | Description |
+|------|--------|-------------|
+| `nanovllm/models/registry.py` | created | Model registry with `@register_model` decorator |
+| `nanovllm/models/__init__.py` | created | Export registry functions, import models |
+| `nanovllm/models/llama.py` | created | Llama model implementation |
+| `nanovllm/models/qwen3.py` | modified | Added `@register_model` decorator |
+| `nanovllm/layers/rotary_embedding.py` | modified | Added Llama3 RoPE scaling |
+| `nanovllm/engine/model_runner.py` | modified | Dynamic model loading via registry |
+| `.claude/rules/gpu-testing.md` | created | GPU testing rules |
+| `task_plan.md` | created | Implementation plan |
+| `findings.md` | created | Technical findings |
+| `progress.md` | created | Progress tracking |