Merge branch 'zijie/fix-dist-3': Fix distributed port conflict

- Auto port allocation with _find_free_port() in model_runner.py - Resource management refactor with close() + context manager in llm_engine.py - Add tests/test_port_conflict.py and tests/run_parallel_niah.sh - Remove docs/torch_distributed_port_issue.md (issue fixed) - Ignore tests/data/ directory Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 16:20:44 +08:00
parent de6f36bdb2
commit 64971c8e8a
10 changed files with 784 additions and 792 deletions
--- a/progress.md
+++ b/progress.md
@@ -1,76 +1,89 @@
-# Progress Log: Multi-Model Support
+# Progress Log: Fix Torch Distributed Port Conflict

-## Session: 2026-01-10
+## Status: COMPLETED & CLEANED UP

-### Initial Analysis Complete
+## Session: 2026-01-12

-**Time**: Session start
-
-**Actions:**
-1. Read `nanovllm/engine/model_runner.py` - 确认硬编码位置 (line 35)
-2. Read `nanovllm/models/qwen3.py` - 理解 Qwen3 模型结构
-3. Read `nanovllm/utils/loader.py` - 理解权重加载机制
-4. Read `nanovllm/layers/rotary_embedding.py` - 发现 RoPE scaling 限制
-5. Read `/home/zijie/models/Llama-3.1-8B-Instruct/config.json` - 理解 Llama 配置
-
-**Key Findings:**
- 模型加载在 `model_runner.py:35` 硬编码为 Qwen3
- RoPE 目前不支持 scaling (`assert rope_scaling is None`)
- Llama 3.1 需要 "llama3" 类型的 RoPE scaling
- Llama 无 q_norm/k_norm，无 attention bias
-
-**Created:**
- `task_plan.md` - 6 阶段实施计划
- `findings.md` - 技术分析和发现
+### Task Overview
+修复在同一 Python 进程中顺序创建多个 LLM 实例时的 EADDRINUSE 端口冲突问题，以及支持多卡环境下同时启动多个独立进程。

 ---

 ### Phase Status

-| Phase | Status | Notes |
-|-------|--------|-------|
-| 1. Model Registry | **COMPLETED** | `registry.py`, `__init__.py` |
-| 2. Llama3 RoPE | **COMPLETED** | `rotary_embedding.py` |
-| 3. Llama Model | **COMPLETED** | `llama.py` |
-| 4. ModelRunner | **COMPLETED** | Dynamic loading |
-| 5. Qwen3 Register | **COMPLETED** | `@register_model` decorator |
-| 6. Testing | **COMPLETED** | Both Llama & Qwen3 pass |
+| Phase | Description | Status |
+|-------|-------------|--------|
+| Phase 1 | ModelRunner 动态端口分配 | COMPLETED |
+| Phase 2 | LLMEngine close() 和 context manager | COMPLETED |
+| Phase 3 | 测试验证（GPU 4,5） | COMPLETED |
+| Phase 4 | 更新文档 | COMPLETED |

 ---

-## Test Results
+### Implementation Summary

-### Llama 3.1-8B-Instruct (32K needle, GPU 0, offload)
-```
-Input: 32768 tokens
-Expected: 7492
-Output: 7492
-Status: PASSED
-Prefill: 1644 tok/s
-```
+#### Phase 1: Dynamic Port Allocation
+**File**: `nanovllm/engine/model_runner.py`
+- Added `_find_free_port()` function using socket binding
+- Modified port selection logic: use env var if set, otherwise auto-assign
+- Added logging for auto-assigned ports

-### Qwen3-4B (8K needle, GPU 1, offload) - Regression Test
-```
-Input: 8192 tokens
-Expected: 7492
-Output: 7492
-Status: PASSED
-Prefill: 3295 tok/s
-```
+#### Phase 2: Resource Cleanup Enhancement
+**File**: `nanovllm/engine/llm_engine.py`
+- Added `_closed` flag for idempotent cleanup
+- Added `close()` method for explicit resource release
+- Added `__del__()` for GC fallback
+- Added `__enter__()` and `__exit__()` for context manager support
+- Modified atexit registration to use `_atexit_handler`
+
+#### Phase 3: Testing (GPU 4,5)
+**File**: `tests/test_port_conflict.py`
+- Created comprehensive test script
+
+**Test Results**:
+| Test | Status | Notes |
+|------|--------|-------|
+| Sequential creation (3 instances) | PASSED | Ports: 50405, 47835, 53011 |
+| Context manager | PASSED | Auto-cleanup works |
+| Parallel processes (GPU 4,5) | PASSED | Ports: 34631, 56097 |
+
+#### Phase 4: Documentation
+**File**: `docs/torch_distributed_port_issue.md`
+- Updated status to RESOLVED
+- Documented solution details
+- Added usage examples

 ---

-## Files Modified This Session
+### Files Modified

 | File | Action | Description |
 |------|--------|-------------|
-| `nanovllm/models/registry.py` | created | Model registry with `@register_model` decorator |
-| `nanovllm/models/__init__.py` | created | Export registry functions, import models |
-| `nanovllm/models/llama.py` | created | Llama model implementation |
-| `nanovllm/models/qwen3.py` | modified | Added `@register_model` decorator |
-| `nanovllm/layers/rotary_embedding.py` | modified | Added Llama3 RoPE scaling |
-| `nanovllm/engine/model_runner.py` | modified | Dynamic model loading via registry |
-| `.claude/rules/gpu-testing.md` | created | GPU testing rules |
-| `task_plan.md` | created | Implementation plan |
-| `findings.md` | created | Technical findings |
-| `progress.md` | created | Progress tracking |
+| `nanovllm/engine/model_runner.py` | Modified | Added `_find_free_port()`, dynamic port logic |
+| `nanovllm/engine/llm_engine.py` | Modified | Added `close()`, `__del__`, context manager |
+| `tests/test_port_conflict.py` | Created | Test script for port conflict fix |
+| `docs/torch_distributed_port_issue.md` | Deleted | Issue resolved, doc removed |
+| `CLAUDE.md` | Modified | Removed port conflict warnings, updated doc index |
+
+---
+
+### Key Features After Fix
+
+1. **Multi-GPU Parallel Testing**
+   ```bash
+   CUDA_VISIBLE_DEVICES=0 python test1.py &
+   CUDA_VISIBLE_DEVICES=1 python test2.py &
+   # Both run with different auto-assigned ports
+   ```
+
+2. **Sequential LLM Creation**
+   ```python
+   for i in range(3):
+       with LLM(model_path) as llm:
+           outputs = llm.generate(prompts, params)
+       # Automatically cleaned up
+   ```
+
+3. **Backward Compatible**
+   - `NANOVLLM_DIST_PORT` env var still works
+   - `llm.exit()` still works (alias for `close()`)