[claudesquad] update from 'add-llama-1' on 10 Jan 26 21:03 CST

2026-01-10 21:03:45 +08:00
parent 6575099a06
commit 03a8c033cb
10 changed files with 858 additions and 7 deletions
--- a/task_plan.md
+++ b/task_plan.md
@@ -0,0 +1,144 @@
+# Task Plan: Multi-Model Support for nanovllm
+
+## Goal
+扩展 nanovllm 框架以支持多种模型（当前只支持 Qwen3），特别是添加 Llama-3.1-8B-Instruct 支持，并建立可扩展的模型添加范式。
+
+## Current State Analysis
+
+### 硬编码问题位置
+- `nanovllm/engine/model_runner.py:35`: 直接实例化 `Qwen3ForCausalLM(hf_config)`
+- `nanovllm/engine/model_runner.py:9`: 硬编码导入 `from nanovllm.models.qwen3 import Qwen3ForCausalLM`
+
+### Qwen3 vs Llama 3.1 架构差异
+
+| Feature | Qwen3 | Llama 3.1 |
+|---------|-------|-----------|
+| Config Class | Qwen3Config | LlamaConfig |
+| attention_bias | True (可配置) | False |
+| q_norm/k_norm | 有 (when bias=False) | 无 |
+| mlp_bias | N/A | False |
+| RoPE Scaling | None (目前) | llama3 类型 |
+| RoPE theta | 1000000 | 500000 |
+| hidden_act | silu | silu |
+| tie_word_embeddings | True | False |
+
+### 关键限制
+- `rotary_embedding.py:59`: `assert rope_scaling is None` - 不支持 RoPE scaling
+
+---
+
+## Phases
+
+### Phase 1: Create Model Registry Pattern [pending]
+**Files to modify:**
+- `nanovllm/models/__init__.py` (new)
+- `nanovllm/models/registry.py` (new)
+
+**Tasks:**
+1. 创建模型注册表机制
+2. 定义模型注册装饰器 `@register_model`
+3. 实现 `get_model_class(hf_config)` 函数，根据 `architectures` 字段自动选择模型
+
+**Design:**
+```python
+MODEL_REGISTRY: dict[str, type] = {}
+
+def register_model(*architectures):
+    """Decorator to register a model class for given architecture names."""
+    def decorator(cls):
+        for arch in architectures:
+            MODEL_REGISTRY[arch] = cls
+        return cls
+    return decorator
+
+def get_model_class(hf_config) -> type:
+    """Get model class based on HF config architectures."""
+    for arch in hf_config.architectures:
+        if arch in MODEL_REGISTRY:
+            return MODEL_REGISTRY[arch]
+    raise ValueError(f"Unsupported architecture: {hf_config.architectures}")
+```
+
+### Phase 2: Add Llama3 RoPE Scaling Support [pending]
+**Files to modify:**
+- `nanovllm/layers/rotary_embedding.py`
+
+**Tasks:**
+1. 实现 `Llama3RotaryEmbedding` 类，支持 llama3 rope_type
+2. 修改 `get_rope()` 函数，根据 rope_scaling 类型选择实现
+3. 保持向后兼容（rope_scaling=None 使用原实现）
+
+**Llama3 RoPE Scaling Formula:**
+```python
+# From transformers:
+# low_freq_factor, high_freq_factor, original_max_position_embeddings
+# Adjust frequencies based on wavelength thresholds
+```
+
+### Phase 3: Implement Llama Model [pending]
+**Files to create:**
+- `nanovllm/models/llama.py`
+
+**Tasks:**
+1. 创建 `LlamaAttention` 类（无 q_norm/k_norm，无 QKV bias）
+2. 创建 `LlamaMLP` 类（与 Qwen3MLP 类似，无 bias）
+3. 创建 `LlamaDecoderLayer` 类
+4. 创建 `LlamaModel` 和 `LlamaForCausalLM` 类
+5. 添加 `packed_modules_mapping` 以支持权重加载
+6. 使用 `@register_model("LlamaForCausalLM")` 注册
+
+### Phase 4: Modify ModelRunner for Dynamic Loading [pending]
+**Files to modify:**
+- `nanovllm/engine/model_runner.py`
+
+**Tasks:**
+1. 移除硬编码 `from nanovllm.models.qwen3 import Qwen3ForCausalLM`
+2. 导入 `from nanovllm.models import get_model_class`
+3. 替换 `self.model = Qwen3ForCausalLM(hf_config)` 为:
+   ```python
+   model_class = get_model_class(hf_config)
+   self.model = model_class(hf_config)
+   ```
+
+### Phase 5: Register Qwen3 Model [pending]
+**Files to modify:**
+- `nanovllm/models/qwen3.py`
+
+**Tasks:**
+1. 导入 `from nanovllm.models.registry import register_model`
+2. 添加 `@register_model("Qwen3ForCausalLM", "Qwen2ForCausalLM")` 装饰器
+
+### Phase 6: Test with Llama-3.1-8B-Instruct [pending]
+**Files:**
+- `tests/test_needle.py` (existing, use for validation)
+
+**Tasks:**
+1. 运行 needle 测试: `python tests/test_needle.py --model ~/models/Llama-3.1-8B-Instruct`
+2. 验证模型加载正确
+3. 验证推理输出正确
+
+---
+
+## Errors Encountered
+| Error | Attempt | Resolution |
+|-------|---------|------------|
+| (none yet) | | |
+
+---
+
+## Success Criteria
+- [x] 分析完成：理解当前架构和需要的改动
+- [ ] Phase 1: 模型注册表实现
+- [ ] Phase 2: Llama3 RoPE scaling 支持
+- [ ] Phase 3: Llama 模型实现
+- [ ] Phase 4: ModelRunner 动态加载
+- [ ] Phase 5: Qwen3 模型注册
+- [ ] Phase 6: Llama needle 测试通过
+
+---
+
+## Notes
+- 保持现有 Qwen3 功能不变
+- 遵循现有代码风格
+- 复用现有 layers 组件（Linear, RMSNorm, Embedding 等）
+- 只添加必要的代码，不过度工程化