Files
nano-vllm/task_plan.md

145 lines
4.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Task Plan: Multi-Model Support for nanovllm
## Goal
扩展 nanovllm 框架以支持多种模型(当前只支持 Qwen3特别是添加 Llama-3.1-8B-Instruct 支持,并建立可扩展的模型添加范式。
## Current State Analysis
### 硬编码问题位置
- `nanovllm/engine/model_runner.py:35`: 直接实例化 `Qwen3ForCausalLM(hf_config)`
- `nanovllm/engine/model_runner.py:9`: 硬编码导入 `from nanovllm.models.qwen3 import Qwen3ForCausalLM`
### Qwen3 vs Llama 3.1 架构差异
| Feature | Qwen3 | Llama 3.1 |
|---------|-------|-----------|
| Config Class | Qwen3Config | LlamaConfig |
| attention_bias | True (可配置) | False |
| q_norm/k_norm | 有 (when bias=False) | 无 |
| mlp_bias | N/A | False |
| RoPE Scaling | None (目前) | llama3 类型 |
| RoPE theta | 1000000 | 500000 |
| hidden_act | silu | silu |
| tie_word_embeddings | True | False |
### 关键限制
- `rotary_embedding.py:59`: `assert rope_scaling is None` - 不支持 RoPE scaling
---
## Phases
### Phase 1: Create Model Registry Pattern [pending]
**Files to modify:**
- `nanovllm/models/__init__.py` (new)
- `nanovllm/models/registry.py` (new)
**Tasks:**
1. 创建模型注册表机制
2. 定义模型注册装饰器 `@register_model`
3. 实现 `get_model_class(hf_config)` 函数,根据 `architectures` 字段自动选择模型
**Design:**
```python
MODEL_REGISTRY: dict[str, type] = {}
def register_model(*architectures):
"""Decorator to register a model class for given architecture names."""
def decorator(cls):
for arch in architectures:
MODEL_REGISTRY[arch] = cls
return cls
return decorator
def get_model_class(hf_config) -> type:
"""Get model class based on HF config architectures."""
for arch in hf_config.architectures:
if arch in MODEL_REGISTRY:
return MODEL_REGISTRY[arch]
raise ValueError(f"Unsupported architecture: {hf_config.architectures}")
```
### Phase 2: Add Llama3 RoPE Scaling Support [pending]
**Files to modify:**
- `nanovllm/layers/rotary_embedding.py`
**Tasks:**
1. 实现 `Llama3RotaryEmbedding` 类,支持 llama3 rope_type
2. 修改 `get_rope()` 函数,根据 rope_scaling 类型选择实现
3. 保持向后兼容rope_scaling=None 使用原实现)
**Llama3 RoPE Scaling Formula:**
```python
# From transformers:
# low_freq_factor, high_freq_factor, original_max_position_embeddings
# Adjust frequencies based on wavelength thresholds
```
### Phase 3: Implement Llama Model [pending]
**Files to create:**
- `nanovllm/models/llama.py`
**Tasks:**
1. 创建 `LlamaAttention` 类(无 q_norm/k_norm无 QKV bias
2. 创建 `LlamaMLP` 类(与 Qwen3MLP 类似,无 bias
3. 创建 `LlamaDecoderLayer`
4. 创建 `LlamaModel``LlamaForCausalLM`
5. 添加 `packed_modules_mapping` 以支持权重加载
6. 使用 `@register_model("LlamaForCausalLM")` 注册
### Phase 4: Modify ModelRunner for Dynamic Loading [pending]
**Files to modify:**
- `nanovllm/engine/model_runner.py`
**Tasks:**
1. 移除硬编码 `from nanovllm.models.qwen3 import Qwen3ForCausalLM`
2. 导入 `from nanovllm.models import get_model_class`
3. 替换 `self.model = Qwen3ForCausalLM(hf_config)` 为:
```python
model_class = get_model_class(hf_config)
self.model = model_class(hf_config)
```
### Phase 5: Register Qwen3 Model [pending]
**Files to modify:**
- `nanovllm/models/qwen3.py`
**Tasks:**
1. 导入 `from nanovllm.models.registry import register_model`
2. 添加 `@register_model("Qwen3ForCausalLM", "Qwen2ForCausalLM")` 装饰器
### Phase 6: Test with Llama-3.1-8B-Instruct [pending]
**Files:**
- `tests/test_needle.py` (existing, use for validation)
**Tasks:**
1. 运行 needle 测试: `python tests/test_needle.py --model ~/models/Llama-3.1-8B-Instruct`
2. 验证模型加载正确
3. 验证推理输出正确
---
## Errors Encountered
| Error | Attempt | Resolution |
|-------|---------|------------|
| (none yet) | | |
---
## Success Criteria
- [x] 分析完成:理解当前架构和需要的改动
- [ ] Phase 1: 模型注册表实现
- [ ] Phase 2: Llama3 RoPE scaling 支持
- [ ] Phase 3: Llama 模型实现
- [ ] Phase 4: ModelRunner 动态加载
- [ ] Phase 5: Qwen3 模型注册
- [ ] Phase 6: Llama needle 测试通过
---
## Notes
- 保持现有 Qwen3 功能不变
- 遵循现有代码风格
- 复用现有 layers 组件Linear, RMSNorm, Embedding 等)
- 只添加必要的代码,不过度工程化