Zijie Tian
64971c8e8a
Merge branch 'zijie/fix-dist-3': Fix distributed port conflict
...
- Auto port allocation with _find_free_port() in model_runner.py
- Resource management refactor with close() + context manager in llm_engine.py
- Add tests/test_port_conflict.py and tests/run_parallel_niah.sh
- Remove docs/torch_distributed_port_issue.md (issue fixed)
- Ignore tests/data/ directory
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-12 16:27:25 +08:00
Zijie Tian
de6f36bdb2
[docs] Added dist port issue.
2026-01-12 15:16:39 +08:00
Zijie Tian
a6cc703d73
[tests] Added test_niah_standalone.py.
2026-01-12 00:16:37 +08:00
Zijie Tian
1425510a2e
[claudesquad] update from 'fix-bug-2' on 09 Jan 26 16:05 CST
2026-01-09 16:05:36 +08:00
Zijie Tian
ea4e904de0
[claudesquad] update from 'int-minference-1' on 08 Jan 26 23:22 CST
2026-01-08 23:22:38 +08:00
Zijie Tian
d8a87da1c3
[claudesquad] update from 'layer-prefill-1' on 08 Jan 26 03:36 CST
2026-01-08 03:36:39 +08:00
Zijie Tian
2a6e0a2c02
[feat] Added Quest Sparsity Policy.
2026-01-07 03:29:21 +08:00
Zijie Tian
0e691f2d85
[WIP] move metadata to GPU.
2026-01-06 23:32:32 +08:00
Zijie Tian
edb5273e34
[WIP] Added basic test for quest.
2026-01-06 22:30:31 +08:00
Zijie Tian
535f2037ab
[WIP] Before fix bench_offload.py.
2026-01-06 18:41:08 +08:00
Zijie Tian
e554d5482b
[refactor] Delete unnesscessory test, and refacrtor the offload prefix cache.
2026-01-05 20:31:42 +08:00
Zijie Tian
d623043a3c
[WIP] FIXED decode and prefill NEEDLE test.
2026-01-05 01:51:46 +08:00
Zijie Tian
e897380127
[test] Added test_align.py and Before change nanovllm attention.
2026-01-04 22:48:01 +08:00
Zijie Tian
24096431ed
[refactor] refactor test_align.py.
2026-01-04 20:55:40 +08:00
Zijie Tian
00ed17c640
[feat] Added debug tools.
2026-01-03 22:36:40 +08:00
Zijie Tian
8c3418725b
[refactor] Refactor needle test.
2026-01-03 19:19:37 +08:00
Zijie Tian
b3685c9190
[test] Added test_align.py
2026-01-03 18:55:58 +08:00
Zijie Tian
6927a75ac3
[refactor] refactor needle.py.
2026-01-03 18:33:48 +08:00
Zijie Tian
ff8b09cd35
[test] Added test_needle_ref.py.
2026-01-02 22:03:23 +08:00
Zijie Tian
74ee6d0895
[WIP] need to fix model to normally decode.
2026-01-01 05:18:27 +08:00
Zijie Tian
62b8a63314
[refactor] Refactor the test_chunked_prefill/decode.
2026-01-01 03:32:26 +08:00
Zijie Tian
965c8aff12
[WIP] need change flashattention to debug.
2026-01-01 00:58:22 +08:00
Zijie Tian
30462fe89a
[WIP] Before fix needle.
2025-12-31 23:35:25 +08:00
Zijie Tian
ccd1b3d4ab
[WIP] Before modify nanovllm CPU-GPU kvcache.
2025-12-31 22:41:07 +08:00
Zijie Tian
31e90a7268
[test] Added offload correct verify.
2025-12-31 20:59:53 +08:00
Zijie Tian
484d0de9f9
[feat] Added debug hook to offload_engine.py.
2025-12-31 19:44:39 +08:00
Zijie Tian
7af721c12c
[WIP] Before modify to FlashInfer.
2025-12-30 01:11:13 +08:00
Zijie Tian
89f8020d38
[WIP] fixing attention compute error.
2025-12-30 00:31:48 +08:00
Zijie Tian
82ed34fc2d
[opt] optimize nanovllm performance compareable with vllm.
2025-12-25 03:47:07 +08:00
Zijie Tian
16fcf8350b
[WIP] replace merge attention with triton kernel.
2025-12-25 01:07:05 +08:00
Zijie Tian
cf5e7df093
[WIP] Added sgDMA operator for scatter kvcache communication.
2025-12-24 23:48:52 +08:00
Zijie Tian
6ec1b23982
[WIP] NEED to modify communication.
2025-12-24 21:57:51 +08:00
Zijie Tian
782437c486
[WIP] remove num_prefetch_blocks varible.
2025-12-24 18:22:26 +08:00
Zijie Tian
b264de903d
[test] Added a simple test_prefill.py.
2025-12-23 00:26:25 +08:00
Zijie Tian
4dcef16c13
[WIP] NEED refactor nanovllm mechenism.
2025-12-22 23:52:56 +08:00
Zijie Tian
051f2295c9
[feat] Added sparse KVcache feature, NEED VERIFY.
2025-12-22 08:51:02 +08:00
Zijie Tian
1081ab51ea
[refactor] Refactor offload code to multi-chunk.
2025-12-15 01:13:58 +08:00
Zijie Tian
61edb8a344
[feat] Finished offload. Still need optimize performance.
2025-12-12 02:27:40 +08:00
Zijie Tian
babfa17354
[refactor] Translate into english, void Chinese due to claude.
2025-12-11 00:30:24 +08:00
Zijie Tian
190df5f70d
[refactor] Refactor current gpu and cpu block allocation strategy.
2025-12-10 21:23:31 +08:00
Zijie Tian
0a247ccb1b
[feat] Added num_gpu_blocks limit gpu blocks.
2025-12-10 20:17:42 +08:00
Zijie Tian
0b6f19242d
[feat] Added chunked prefill and kvcache offload mechenism.
2025-12-10 03:47:37 +08:00