145 lines
4.4 KiB
Markdown
145 lines
4.4 KiB
Markdown
# Task Plan: Multi-Model Support for nanovllm
|
||
|
||
## Goal
|
||
扩展 nanovllm 框架以支持多种模型(当前只支持 Qwen3),特别是添加 Llama-3.1-8B-Instruct 支持,并建立可扩展的模型添加范式。
|
||
|
||
## Current State Analysis
|
||
|
||
### 硬编码问题位置
|
||
- `nanovllm/engine/model_runner.py:35`: 直接实例化 `Qwen3ForCausalLM(hf_config)`
|
||
- `nanovllm/engine/model_runner.py:9`: 硬编码导入 `from nanovllm.models.qwen3 import Qwen3ForCausalLM`
|
||
|
||
### Qwen3 vs Llama 3.1 架构差异
|
||
|
||
| Feature | Qwen3 | Llama 3.1 |
|
||
|---------|-------|-----------|
|
||
| Config Class | Qwen3Config | LlamaConfig |
|
||
| attention_bias | True (可配置) | False |
|
||
| q_norm/k_norm | 有 (when bias=False) | 无 |
|
||
| mlp_bias | N/A | False |
|
||
| RoPE Scaling | None (目前) | llama3 类型 |
|
||
| RoPE theta | 1000000 | 500000 |
|
||
| hidden_act | silu | silu |
|
||
| tie_word_embeddings | True | False |
|
||
|
||
### 关键限制
|
||
- `rotary_embedding.py:59`: `assert rope_scaling is None` - 不支持 RoPE scaling
|
||
|
||
---
|
||
|
||
## Phases
|
||
|
||
### Phase 1: Create Model Registry Pattern [pending]
|
||
**Files to modify:**
|
||
- `nanovllm/models/__init__.py` (new)
|
||
- `nanovllm/models/registry.py` (new)
|
||
|
||
**Tasks:**
|
||
1. 创建模型注册表机制
|
||
2. 定义模型注册装饰器 `@register_model`
|
||
3. 实现 `get_model_class(hf_config)` 函数,根据 `architectures` 字段自动选择模型
|
||
|
||
**Design:**
|
||
```python
|
||
MODEL_REGISTRY: dict[str, type] = {}
|
||
|
||
def register_model(*architectures):
|
||
"""Decorator to register a model class for given architecture names."""
|
||
def decorator(cls):
|
||
for arch in architectures:
|
||
MODEL_REGISTRY[arch] = cls
|
||
return cls
|
||
return decorator
|
||
|
||
def get_model_class(hf_config) -> type:
|
||
"""Get model class based on HF config architectures."""
|
||
for arch in hf_config.architectures:
|
||
if arch in MODEL_REGISTRY:
|
||
return MODEL_REGISTRY[arch]
|
||
raise ValueError(f"Unsupported architecture: {hf_config.architectures}")
|
||
```
|
||
|
||
### Phase 2: Add Llama3 RoPE Scaling Support [pending]
|
||
**Files to modify:**
|
||
- `nanovllm/layers/rotary_embedding.py`
|
||
|
||
**Tasks:**
|
||
1. 实现 `Llama3RotaryEmbedding` 类,支持 llama3 rope_type
|
||
2. 修改 `get_rope()` 函数,根据 rope_scaling 类型选择实现
|
||
3. 保持向后兼容(rope_scaling=None 使用原实现)
|
||
|
||
**Llama3 RoPE Scaling Formula:**
|
||
```python
|
||
# From transformers:
|
||
# low_freq_factor, high_freq_factor, original_max_position_embeddings
|
||
# Adjust frequencies based on wavelength thresholds
|
||
```
|
||
|
||
### Phase 3: Implement Llama Model [pending]
|
||
**Files to create:**
|
||
- `nanovllm/models/llama.py`
|
||
|
||
**Tasks:**
|
||
1. 创建 `LlamaAttention` 类(无 q_norm/k_norm,无 QKV bias)
|
||
2. 创建 `LlamaMLP` 类(与 Qwen3MLP 类似,无 bias)
|
||
3. 创建 `LlamaDecoderLayer` 类
|
||
4. 创建 `LlamaModel` 和 `LlamaForCausalLM` 类
|
||
5. 添加 `packed_modules_mapping` 以支持权重加载
|
||
6. 使用 `@register_model("LlamaForCausalLM")` 注册
|
||
|
||
### Phase 4: Modify ModelRunner for Dynamic Loading [pending]
|
||
**Files to modify:**
|
||
- `nanovllm/engine/model_runner.py`
|
||
|
||
**Tasks:**
|
||
1. 移除硬编码 `from nanovllm.models.qwen3 import Qwen3ForCausalLM`
|
||
2. 导入 `from nanovllm.models import get_model_class`
|
||
3. 替换 `self.model = Qwen3ForCausalLM(hf_config)` 为:
|
||
```python
|
||
model_class = get_model_class(hf_config)
|
||
self.model = model_class(hf_config)
|
||
```
|
||
|
||
### Phase 5: Register Qwen3 Model [pending]
|
||
**Files to modify:**
|
||
- `nanovllm/models/qwen3.py`
|
||
|
||
**Tasks:**
|
||
1. 导入 `from nanovllm.models.registry import register_model`
|
||
2. 添加 `@register_model("Qwen3ForCausalLM", "Qwen2ForCausalLM")` 装饰器
|
||
|
||
### Phase 6: Test with Llama-3.1-8B-Instruct [pending]
|
||
**Files:**
|
||
- `tests/test_needle.py` (existing, use for validation)
|
||
|
||
**Tasks:**
|
||
1. 运行 needle 测试: `python tests/test_needle.py --model ~/models/Llama-3.1-8B-Instruct`
|
||
2. 验证模型加载正确
|
||
3. 验证推理输出正确
|
||
|
||
---
|
||
|
||
## Errors Encountered
|
||
| Error | Attempt | Resolution |
|
||
|-------|---------|------------|
|
||
| (none yet) | | |
|
||
|
||
---
|
||
|
||
## Success Criteria
|
||
- [x] 分析完成:理解当前架构和需要的改动
|
||
- [ ] Phase 1: 模型注册表实现
|
||
- [ ] Phase 2: Llama3 RoPE scaling 支持
|
||
- [ ] Phase 3: Llama 模型实现
|
||
- [ ] Phase 4: ModelRunner 动态加载
|
||
- [ ] Phase 5: Qwen3 模型注册
|
||
- [ ] Phase 6: Llama needle 测试通过
|
||
|
||
---
|
||
|
||
## Notes
|
||
- 保持现有 Qwen3 功能不变
|
||
- 遵循现有代码风格
|
||
- 复用现有 layers 组件(Linear, RMSNorm, Embedding 等)
|
||
- 只添加必要的代码,不过度工程化
|