77 lines
2.3 KiB
Markdown
77 lines
2.3 KiB
Markdown
# Progress Log: Multi-Model Support
|
||
|
||
## Session: 2026-01-10
|
||
|
||
### Initial Analysis Complete
|
||
|
||
**Time**: Session start
|
||
|
||
**Actions:**
|
||
1. Read `nanovllm/engine/model_runner.py` - 确认硬编码位置 (line 35)
|
||
2. Read `nanovllm/models/qwen3.py` - 理解 Qwen3 模型结构
|
||
3. Read `nanovllm/utils/loader.py` - 理解权重加载机制
|
||
4. Read `nanovllm/layers/rotary_embedding.py` - 发现 RoPE scaling 限制
|
||
5. Read `/home/zijie/models/Llama-3.1-8B-Instruct/config.json` - 理解 Llama 配置
|
||
|
||
**Key Findings:**
|
||
- 模型加载在 `model_runner.py:35` 硬编码为 Qwen3
|
||
- RoPE 目前不支持 scaling (`assert rope_scaling is None`)
|
||
- Llama 3.1 需要 "llama3" 类型的 RoPE scaling
|
||
- Llama 无 q_norm/k_norm,无 attention bias
|
||
|
||
**Created:**
|
||
- `task_plan.md` - 6 阶段实施计划
|
||
- `findings.md` - 技术分析和发现
|
||
|
||
---
|
||
|
||
### Phase Status
|
||
|
||
| Phase | Status | Notes |
|
||
|-------|--------|-------|
|
||
| 1. Model Registry | **COMPLETED** | `registry.py`, `__init__.py` |
|
||
| 2. Llama3 RoPE | **COMPLETED** | `rotary_embedding.py` |
|
||
| 3. Llama Model | **COMPLETED** | `llama.py` |
|
||
| 4. ModelRunner | **COMPLETED** | Dynamic loading |
|
||
| 5. Qwen3 Register | **COMPLETED** | `@register_model` decorator |
|
||
| 6. Testing | **COMPLETED** | Both Llama & Qwen3 pass |
|
||
|
||
---
|
||
|
||
## Test Results
|
||
|
||
### Llama 3.1-8B-Instruct (32K needle, GPU 0, offload)
|
||
```
|
||
Input: 32768 tokens
|
||
Expected: 7492
|
||
Output: 7492
|
||
Status: PASSED
|
||
Prefill: 1644 tok/s
|
||
```
|
||
|
||
### Qwen3-4B (8K needle, GPU 1, offload) - Regression Test
|
||
```
|
||
Input: 8192 tokens
|
||
Expected: 7492
|
||
Output: 7492
|
||
Status: PASSED
|
||
Prefill: 3295 tok/s
|
||
```
|
||
|
||
---
|
||
|
||
## Files Modified This Session
|
||
|
||
| File | Action | Description |
|
||
|------|--------|-------------|
|
||
| `nanovllm/models/registry.py` | created | Model registry with `@register_model` decorator |
|
||
| `nanovllm/models/__init__.py` | created | Export registry functions, import models |
|
||
| `nanovllm/models/llama.py` | created | Llama model implementation |
|
||
| `nanovllm/models/qwen3.py` | modified | Added `@register_model` decorator |
|
||
| `nanovllm/layers/rotary_embedding.py` | modified | Added Llama3 RoPE scaling |
|
||
| `nanovllm/engine/model_runner.py` | modified | Dynamic model loading via registry |
|
||
| `.claude/rules/gpu-testing.md` | created | GPU testing rules |
|
||
| `task_plan.md` | created | Implementation plan |
|
||
| `findings.md` | created | Technical findings |
|
||
| `progress.md` | created | Progress tracking |
|