zijie-tian/nano-vllm

Files

Zijie Tian 03a8c033cb [claudesquad] update from 'add-llama-1' on 10 Jan 26 21:03 CST

2026-01-10 21:03:45 +08:00

2.3 KiB

Raw Blame History

Progress Log: Multi-Model Support

Session: 2026-01-10

Initial Analysis Complete

Time: Session start

Actions:

Read nanovllm/engine/model_runner.py - 确认硬编码位置 (line 35)
Read nanovllm/models/qwen3.py - 理解 Qwen3 模型结构
Read nanovllm/utils/loader.py - 理解权重加载机制
Read nanovllm/layers/rotary_embedding.py - 发现 RoPE scaling 限制
Read /home/zijie/models/Llama-3.1-8B-Instruct/config.json - 理解 Llama 配置

Key Findings:

模型加载在 model_runner.py:35 硬编码为 Qwen3
RoPE 目前不支持 scaling (assert rope_scaling is None)
Llama 3.1 需要 "llama3" 类型的 RoPE scaling
Llama 无 q_norm/k_norm，无 attention bias

Created:

task_plan.md - 6 阶段实施计划
findings.md - 技术分析和发现

Phase Status

Phase	Status	Notes
1. Model Registry	COMPLETED	`registry.py`, `__init__.py`
2. Llama3 RoPE	COMPLETED	`rotary_embedding.py`
3. Llama Model	COMPLETED	`llama.py`
4. ModelRunner	COMPLETED	Dynamic loading
5. Qwen3 Register	COMPLETED	`@register_model` decorator
6. Testing	COMPLETED	Both Llama & Qwen3 pass

Test Results

Llama 3.1-8B-Instruct (32K needle, GPU 0, offload)

Input: 32768 tokens
Expected: 7492
Output: 7492
Status: PASSED
Prefill: 1644 tok/s

Qwen3-4B (8K needle, GPU 1, offload) - Regression Test

Input: 8192 tokens
Expected: 7492
Output: 7492
Status: PASSED
Prefill: 3295 tok/s

Files Modified This Session

File	Action	Description
`nanovllm/models/registry.py`	created	Model registry with `@register_model` decorator
`nanovllm/models/__init__.py`	created	Export registry functions, import models
`nanovllm/models/llama.py`	created	Llama model implementation
`nanovllm/models/qwen3.py`	modified	Added `@register_model` decorator
`nanovllm/layers/rotary_embedding.py`	modified	Added Llama3 RoPE scaling
`nanovllm/engine/model_runner.py`	modified	Dynamic model loading via registry
`.claude/rules/gpu-testing.md`	created	GPU testing rules
`task_plan.md`	created	Implementation plan
`findings.md`	created	Technical findings
`progress.md`	created	Progress tracking