Commit Graph

14 Commits

Author SHA1 Message Date
Zijie Tian
16fcf8350b [WIP] replace merge attention with triton kernel. 2025-12-25 01:07:05 +08:00
Zijie Tian
6ec1b23982 [WIP] NEED to modify communication. 2025-12-24 21:57:51 +08:00
Zijie Tian
782437c486 [WIP] remove num_prefetch_blocks varible. 2025-12-24 18:22:26 +08:00
Zijie Tian
4dcef16c13 [WIP] NEED refactor nanovllm mechenism. 2025-12-22 23:52:56 +08:00
Zijie Tian
91a0f09a24 [feat] Optimized with ASYNC offload. 2025-12-15 07:21:35 +08:00
Zijie Tian
b8b6478506 [feat] Need to optimized with async prefetch. 2025-12-15 06:58:40 +08:00
Zijie Tian
1081ab51ea [refactor] Refactor offload code to multi-chunk. 2025-12-15 01:13:58 +08:00
Zijie Tian
9b8165af5a [fix] Fixed kvcache offload problem. 2025-12-12 01:35:30 +08:00
Zijie Tian
babfa17354 [refactor] Translate into english, void Chinese due to claude. 2025-12-11 00:30:24 +08:00
Zijie Tian
e85c2b4776 [fix] Fixed kvcache offload bugs. 2025-12-10 22:34:00 +08:00
Zijie Tian
190df5f70d [refactor] Refactor current gpu and cpu block allocation strategy. 2025-12-10 21:23:31 +08:00
Zijie Tian
0a247ccb1b [feat] Added num_gpu_blocks limit gpu blocks. 2025-12-10 20:17:42 +08:00
Zijie Tian
87055cc5ce [refactor] Implement real chunked prefill mechenism. 2025-12-10 18:34:01 +08:00
Zijie Tian
0b6f19242d [feat] Added chunked prefill and kvcache offload mechenism. 2025-12-10 03:47:37 +08:00