Zijie Tian
|
89f8020d38
|
[WIP] fixing attention compute error.
|
2025-12-30 00:31:48 +08:00 |
|
Zijie Tian
|
782437c486
|
[WIP] remove num_prefetch_blocks varible.
|
2025-12-24 18:22:26 +08:00 |
|
Zijie Tian
|
4dcef16c13
|
[WIP] NEED refactor nanovllm mechenism.
|
2025-12-22 23:52:56 +08:00 |
|
Zijie Tian
|
1081ab51ea
|
[refactor] Refactor offload code to multi-chunk.
|
2025-12-15 01:13:58 +08:00 |
|
Zijie Tian
|
61edb8a344
|
[feat] Finished offload. Still need optimize performance.
|
2025-12-12 02:27:40 +08:00 |
|
Zijie Tian
|
babfa17354
|
[refactor] Translate into english, void Chinese due to claude.
|
2025-12-11 00:30:24 +08:00 |
|
Zijie Tian
|
190df5f70d
|
[refactor] Refactor current gpu and cpu block allocation strategy.
|
2025-12-10 21:23:31 +08:00 |
|
Zijie Tian
|
0a247ccb1b
|
[feat] Added num_gpu_blocks limit gpu blocks.
|
2025-12-10 20:17:42 +08:00 |
|