# Task Plan: Multi-Model Support for nanovllm ## Goal 扩展 nanovllm 框架以支持多种模型(当前只支持 Qwen3),特别是添加 Llama-3.1-8B-Instruct 支持,并建立可扩展的模型添加范式。 ## Current State Analysis ### 硬编码问题位置 - `nanovllm/engine/model_runner.py:35`: 直接实例化 `Qwen3ForCausalLM(hf_config)` - `nanovllm/engine/model_runner.py:9`: 硬编码导入 `from nanovllm.models.qwen3 import Qwen3ForCausalLM` ### Qwen3 vs Llama 3.1 架构差异 | Feature | Qwen3 | Llama 3.1 | |---------|-------|-----------| | Config Class | Qwen3Config | LlamaConfig | | attention_bias | True (可配置) | False | | q_norm/k_norm | 有 (when bias=False) | 无 | | mlp_bias | N/A | False | | RoPE Scaling | None (目前) | llama3 类型 | | RoPE theta | 1000000 | 500000 | | hidden_act | silu | silu | | tie_word_embeddings | True | False | ### 关键限制 - `rotary_embedding.py:59`: `assert rope_scaling is None` - 不支持 RoPE scaling --- ## Phases ### Phase 1: Create Model Registry Pattern [pending] **Files to modify:** - `nanovllm/models/__init__.py` (new) - `nanovllm/models/registry.py` (new) **Tasks:** 1. 创建模型注册表机制 2. 定义模型注册装饰器 `@register_model` 3. 实现 `get_model_class(hf_config)` 函数,根据 `architectures` 字段自动选择模型 **Design:** ```python MODEL_REGISTRY: dict[str, type] = {} def register_model(*architectures): """Decorator to register a model class for given architecture names.""" def decorator(cls): for arch in architectures: MODEL_REGISTRY[arch] = cls return cls return decorator def get_model_class(hf_config) -> type: """Get model class based on HF config architectures.""" for arch in hf_config.architectures: if arch in MODEL_REGISTRY: return MODEL_REGISTRY[arch] raise ValueError(f"Unsupported architecture: {hf_config.architectures}") ``` ### Phase 2: Add Llama3 RoPE Scaling Support [pending] **Files to modify:** - `nanovllm/layers/rotary_embedding.py` **Tasks:** 1. 实现 `Llama3RotaryEmbedding` 类,支持 llama3 rope_type 2. 修改 `get_rope()` 函数,根据 rope_scaling 类型选择实现 3. 保持向后兼容(rope_scaling=None 使用原实现) **Llama3 RoPE Scaling Formula:** ```python # From transformers: # low_freq_factor, high_freq_factor, original_max_position_embeddings # Adjust frequencies based on wavelength thresholds ``` ### Phase 3: Implement Llama Model [pending] **Files to create:** - `nanovllm/models/llama.py` **Tasks:** 1. 创建 `LlamaAttention` 类(无 q_norm/k_norm,无 QKV bias) 2. 创建 `LlamaMLP` 类(与 Qwen3MLP 类似,无 bias) 3. 创建 `LlamaDecoderLayer` 类 4. 创建 `LlamaModel` 和 `LlamaForCausalLM` 类 5. 添加 `packed_modules_mapping` 以支持权重加载 6. 使用 `@register_model("LlamaForCausalLM")` 注册 ### Phase 4: Modify ModelRunner for Dynamic Loading [pending] **Files to modify:** - `nanovllm/engine/model_runner.py` **Tasks:** 1. 移除硬编码 `from nanovllm.models.qwen3 import Qwen3ForCausalLM` 2. 导入 `from nanovllm.models import get_model_class` 3. 替换 `self.model = Qwen3ForCausalLM(hf_config)` 为: ```python model_class = get_model_class(hf_config) self.model = model_class(hf_config) ``` ### Phase 5: Register Qwen3 Model [pending] **Files to modify:** - `nanovllm/models/qwen3.py` **Tasks:** 1. 导入 `from nanovllm.models.registry import register_model` 2. 添加 `@register_model("Qwen3ForCausalLM", "Qwen2ForCausalLM")` 装饰器 ### Phase 6: Test with Llama-3.1-8B-Instruct [pending] **Files:** - `tests/test_needle.py` (existing, use for validation) **Tasks:** 1. 运行 needle 测试: `python tests/test_needle.py --model ~/models/Llama-3.1-8B-Instruct` 2. 验证模型加载正确 3. 验证推理输出正确 --- ## Errors Encountered | Error | Attempt | Resolution | |-------|---------|------------| | (none yet) | | | --- ## Success Criteria - [x] 分析完成:理解当前架构和需要的改动 - [ ] Phase 1: 模型注册表实现 - [ ] Phase 2: Llama3 RoPE scaling 支持 - [ ] Phase 3: Llama 模型实现 - [ ] Phase 4: ModelRunner 动态加载 - [ ] Phase 5: Qwen3 模型注册 - [ ] Phase 6: Llama needle 测试通过 --- ## Notes - 保持现有 Qwen3 功能不变 - 遵循现有代码风格 - 复用现有 layers 组件(Linear, RMSNorm, Embedding 等) - 只添加必要的代码,不过度工程化