[WIP] Before debug plan.
This commit is contained in:
48
progress.md
Normal file
48
progress.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# Progress Log: nanovllm State Leakage Debug
|
||||
|
||||
## Session: 2026-01-20
|
||||
|
||||
### Entry 1: Initial Analysis Complete
|
||||
**Time**: 开始
|
||||
|
||||
**Completed**:
|
||||
- [x] 读取 `docs/ruler_32k_chunked_offload_issue.md` 理解问题描述
|
||||
- [x] 读取 `nanovllm/kvcache/offload_engine.py` 分析 reset() 实现
|
||||
- [x] 读取 `nanovllm/kvcache/hybrid_manager.py` 分析 deallocate() 实现
|
||||
- [x] 读取 `nanovllm/engine/llm_engine.py` 分析请求处理流程
|
||||
- [x] 创建 planning files (task_plan.md, findings.md, progress.md)
|
||||
|
||||
**Key Finding**:
|
||||
`OffloadEngine.reset()` 清除了 GPU buffers 但**没有清除 CPU cache**。这是最可能的状态泄漏源头。
|
||||
|
||||
**Next Steps**:
|
||||
1. 验证 CPU cache 假设 - 添加 CPU cache 清零到 reset()
|
||||
2. 运行对比测试确认修复效果
|
||||
3. 检查其他可能的状态泄漏点
|
||||
|
||||
---
|
||||
|
||||
### Entry 2: (待填写)
|
||||
**Time**:
|
||||
|
||||
**Completed**:
|
||||
|
||||
**Issues**:
|
||||
|
||||
**Next Steps**:
|
||||
|
||||
---
|
||||
|
||||
## Test Results Summary
|
||||
| Test | Before Fix | After Fix | Notes |
|
||||
|------|------------|-----------|-------|
|
||||
| niah_single_1 (fresh-llm) | 100% | - | Baseline |
|
||||
| niah_single_1 (batch) | ~80% | - | State leakage |
|
||||
| multikey_1 | ~94% | - | |
|
||||
| multikey_2 | ~94% | - | |
|
||||
| multikey_3 | ~56% | - | |
|
||||
|
||||
## Files Modified
|
||||
| File | Change | Status |
|
||||
|------|--------|--------|
|
||||
| (待记录) | | |
|
||||
Reference in New Issue
Block a user