Zijie Tian
|
74ee6d0895
|
[WIP] need to fix model to normally decode.
|
2026-01-01 05:18:27 +08:00 |
|
Zijie Tian
|
62b8a63314
|
[refactor] Refactor the test_chunked_prefill/decode.
|
2026-01-01 03:32:26 +08:00 |
|
Zijie Tian
|
965c8aff12
|
[WIP] need change flashattention to debug.
|
2026-01-01 00:58:22 +08:00 |
|
Zijie Tian
|
30462fe89a
|
[WIP] Before fix needle.
|
2025-12-31 23:35:25 +08:00 |
|
Zijie Tian
|
ccd1b3d4ab
|
[WIP] Before modify nanovllm CPU-GPU kvcache.
|
2025-12-31 22:41:07 +08:00 |
|
Zijie Tian
|
31e90a7268
|
[test] Added offload correct verify.
|
2025-12-31 20:59:53 +08:00 |
|
Zijie Tian
|
484d0de9f9
|
[feat] Added debug hook to offload_engine.py.
|
2025-12-31 19:44:39 +08:00 |
|
Zijie Tian
|
7af721c12c
|
[WIP] Before modify to FlashInfer.
|
2025-12-30 01:11:13 +08:00 |
|
Zijie Tian
|
89f8020d38
|
[WIP] fixing attention compute error.
|
2025-12-30 00:31:48 +08:00 |
|
Zijie Tian
|
82ed34fc2d
|
[opt] optimize nanovllm performance compareable with vllm.
|
2025-12-25 03:47:07 +08:00 |
|
Zijie Tian
|
16fcf8350b
|
[WIP] replace merge attention with triton kernel.
|
2025-12-25 01:07:05 +08:00 |
|
Zijie Tian
|
cf5e7df093
|
[WIP] Added sgDMA operator for scatter kvcache communication.
|
2025-12-24 23:48:52 +08:00 |
|
Zijie Tian
|
6ec1b23982
|
[WIP] NEED to modify communication.
|
2025-12-24 21:57:51 +08:00 |
|
Zijie Tian
|
782437c486
|
[WIP] remove num_prefetch_blocks varible.
|
2025-12-24 18:22:26 +08:00 |
|
Zijie Tian
|
b264de903d
|
[test] Added a simple test_prefill.py.
|
2025-12-23 00:26:25 +08:00 |
|
Zijie Tian
|
4dcef16c13
|
[WIP] NEED refactor nanovllm mechenism.
|
2025-12-22 23:52:56 +08:00 |
|
Zijie Tian
|
051f2295c9
|
[feat] Added sparse KVcache feature, NEED VERIFY.
|
2025-12-22 08:51:02 +08:00 |
|
Zijie Tian
|
1081ab51ea
|
[refactor] Refactor offload code to multi-chunk.
|
2025-12-15 01:13:58 +08:00 |
|
Zijie Tian
|
61edb8a344
|
[feat] Finished offload. Still need optimize performance.
|
2025-12-12 02:27:40 +08:00 |
|
Zijie Tian
|
babfa17354
|
[refactor] Translate into english, void Chinese due to claude.
|
2025-12-11 00:30:24 +08:00 |
|
Zijie Tian
|
190df5f70d
|
[refactor] Refactor current gpu and cpu block allocation strategy.
|
2025-12-10 21:23:31 +08:00 |
|
Zijie Tian
|
0a247ccb1b
|
[feat] Added num_gpu_blocks limit gpu blocks.
|
2025-12-10 20:17:42 +08:00 |
|
Zijie Tian
|
0b6f19242d
|
[feat] Added chunked prefill and kvcache offload mechenism.
|
2025-12-10 03:47:37 +08:00 |
|