Files
nano-vllm/task_plan.md
Zijie Tian 64971c8e8a Merge branch 'zijie/fix-dist-3': Fix distributed port conflict
- Auto port allocation with _find_free_port() in model_runner.py
- Resource management refactor with close() + context manager in llm_engine.py
- Add tests/test_port_conflict.py and tests/run_parallel_niah.sh
- Remove docs/torch_distributed_port_issue.md (issue fixed)
- Ignore tests/data/ directory

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 16:27:25 +08:00

231 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Task Plan: Fix Torch Distributed Port Conflict
## Goal
支持多卡环境下同时启动多个独立的 nanovllm 进程进行测试,无需手动管理端口。
## Problem Analysis
### 核心问题
```
当前:所有 nanovllm 实例默认使用端口 2333
└── 多个独立进程同时运行时会冲突!
CUDA_VISIBLE_DEVICES=0 python test1.py # 绑定端口 2333 ✓
CUDA_VISIBLE_DEVICES=1 python test2.py # 尝试绑定 2333 → EADDRINUSE ❌
```
### 根本原因
- 端口是系统级资源,与 GPU 无关
- 即使使用不同 GPU端口仍会冲突
- 当前硬编码默认端口 `2333`
---
## Solution: Dynamic Port Allocation
### 核心方案
```python
def _find_free_port() -> int:
"""让系统自动分配一个空闲端口"""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(('', 0))
return s.getsockname()[1]
# 优先使用环境变量,否则自动分配
port = os.environ.get("NANOVLLM_DIST_PORT")
if port is None:
port = _find_free_port()
else:
port = int(port)
```
### 效果
```bash
# 无需手动指定端口,可以同时运行多个测试
CUDA_VISIBLE_DEVICES=0 python test1.py & # 自动端口 54321
CUDA_VISIBLE_DEVICES=1 python test2.py & # 自动端口 54322
CUDA_VISIBLE_DEVICES=2 python test3.py & # 自动端口 54323
# 仍然支持手动指定(向后兼容)
NANOVLLM_DIST_PORT=2333 python test.py
```
---
## Implementation Phases
### Phase 1: ModelRunner 动态端口 [pending]
**File**: `nanovllm/engine/model_runner.py`
```python
import socket
def _find_free_port() -> int:
"""Find a free port for distributed communication."""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(('', 0))
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
return s.getsockname()[1]
class ModelRunner:
def __init__(self, config: Config, rank: int, event: Event | list[Event]):
# ... existing code ...
import os
port = os.environ.get("NANOVLLM_DIST_PORT")
if port is None:
port = _find_free_port()
logger.info(f"Auto-assigned distributed port: {port}")
else:
port = int(port)
dist.init_process_group("nccl", f"tcp://localhost:{port}", ...)
```
### Phase 2: LLMEngine 资源清理增强 [pending]
**File**: `nanovllm/engine/llm_engine.py`
添加 `close()` 方法和 context manager 支持,确保资源正确释放:
```python
class LLMEngine:
def __init__(self, model, **kwargs):
# ... existing code ...
self._closed = False
atexit.register(self._atexit_handler)
def _atexit_handler(self):
if not self._closed:
self.close()
def close(self):
"""Explicitly close the engine and release all resources."""
if self._closed:
return
self._closed = True
try:
atexit.unregister(self._atexit_handler)
except Exception:
pass
self.model_runner.call("exit")
del self.model_runner
for p in self.ps:
p.join()
def exit(self):
"""Alias for close() - backward compatibility."""
self.close()
def __del__(self):
try:
self.close()
except Exception:
pass
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
return False
```
### Phase 3: 测试验证 [pending]
**File**: `tests/test_multiple_processes.py` (新建)
```python
"""Test multiple independent nanovllm processes."""
import subprocess
import sys
import time
def test_parallel_processes():
"""Test running multiple nanovllm processes in parallel."""
script = '''
import sys
sys.path.insert(0, ".")
from nanovllm import LLM, SamplingParams
import os
gpu = os.environ.get("CUDA_VISIBLE_DEVICES", "0")
print(f"[GPU {gpu}] Starting LLM")
llm = LLM("path/to/model", enable_cpu_offload=True)
outputs = llm.generate(["Hello"], SamplingParams(max_tokens=10))
print(f"[GPU {gpu}] Output: {outputs[0]['text'][:50]}")
llm.close()
print(f"[GPU {gpu}] Done")
'''
# Start 2 processes on different GPUs
procs = []
for gpu in [0, 1]:
env = {"CUDA_VISIBLE_DEVICES": str(gpu)}
p = subprocess.Popen(
[sys.executable, "-c", script],
env={**os.environ, **env}
)
procs.append(p)
time.sleep(1) # Stagger start slightly
# Wait for all
for p in procs:
assert p.wait() == 0, f"Process failed with code {p.returncode}"
print("PASSED: test_parallel_processes")
if __name__ == "__main__":
test_parallel_processes()
```
### Phase 4: 文档更新 [pending]
**File**: `docs/torch_distributed_port_issue.md`
更新文档标记问题已通过动态端口分配解决。
---
## Usage After Fix
### 场景 1: 多进程并行测试(主要场景)
```bash
# 无需任何额外配置,直接运行
CUDA_VISIBLE_DEVICES=0 python test_group1.py &
CUDA_VISIBLE_DEVICES=1 python test_group2.py &
CUDA_VISIBLE_DEVICES=2 python test_group3.py &
wait
```
### 场景 2: 同一进程顺序创建(也支持)
```python
for i in range(3):
with LLM(model_path) as llm:
outputs = llm.generate(prompts, params)
# 自动清理,下一个可以使用新的随机端口
```
### 场景 3: 手动指定端口(向后兼容)
```bash
NANOVLLM_DIST_PORT=2333 python test.py
```
---
## Success Criteria
- [ ] 多个独立进程可以同时运行(不同 GPU
- [ ] 无需手动指定端口
- [ ] 向后兼容(环境变量仍有效)
- [ ] 同一进程顺序创建也能工作
- [ ] 资源正确清理
---
## Files to Modify
| File | Action | Status |
|------|--------|--------|
| `nanovllm/engine/model_runner.py` | Add `_find_free_port()` | pending |
| `nanovllm/engine/llm_engine.py` | Add `close()`, context manager | pending |
| `tests/test_multiple_processes.py` | Create | pending |
| `docs/torch_distributed_port_issue.md` | Update | pending |