Commit Graph

35 Commits

Author SHA1 Message Date
Zijie Tian
30462fe89a [WIP] Before fix needle. 2025-12-31 23:35:25 +08:00
Zijie Tian
89f8020d38 [WIP] fixing attention compute error. 2025-12-30 00:31:48 +08:00
Zijie Tian
1907b625b6 [refactor] Remove legacy mode path. 2025-12-22 20:17:56 +08:00
Zijie Tian
051f2295c9 [feat] Added sparse KVcache feature, NEED VERIFY. 2025-12-22 08:51:02 +08:00
Zijie Tian
dc7807a211 [feat] Fixed warmup memory overhead. 2025-12-15 21:39:14 +08:00
Zijie Tian
b8b6478506 [feat] Need to optimized with async prefetch. 2025-12-15 06:58:40 +08:00
Zijie Tian
1081ab51ea [refactor] Refactor offload code to multi-chunk. 2025-12-15 01:13:58 +08:00
Zijie Tian
60d24f7c12 [feat] Added bench_offload.py and GreedySampler. 2025-12-12 00:24:08 +08:00
Zijie Tian
babfa17354 [refactor] Translate into english, void Chinese due to claude. 2025-12-11 00:30:24 +08:00
Zijie Tian
e85c2b4776 [fix] Fixed kvcache offload bugs. 2025-12-10 22:34:00 +08:00
Zijie Tian
190df5f70d [refactor] Refactor current gpu and cpu block allocation strategy. 2025-12-10 21:23:31 +08:00
Zijie Tian
0a247ccb1b [feat] Added num_gpu_blocks limit gpu blocks. 2025-12-10 20:17:42 +08:00
Zijie Tian
87055cc5ce [refactor] Implement real chunked prefill mechenism. 2025-12-10 18:34:01 +08:00
Zijie Tian
0b6f19242d [feat] Added chunked prefill and kvcache offload mechenism. 2025-12-10 03:47:37 +08:00
GeeeekExplorer
2f21442653 support qwen2 2025-11-04 01:44:42 +08:00
GeeeekExplorer
df99418f7d simplify 2025-08-31 20:02:51 +08:00
PeterDing
f5b4840276 fix(model_runner): correct position indexing to be 0-based
- Change position calculation from len(seq) to len(seq) - 1
2025-07-04 14:29:12 +08:00
GeeeekExplorer
cb0b3dec3f remove rng state 2025-06-27 22:50:33 +08:00
GeeeekExplorer
1caeec8dfa same as vllm 2025-06-27 18:50:56 +08:00
GeeeekExplorer
658520b788 warmup and allocate 2025-06-27 01:51:57 +08:00
GeeeekExplorer
03cfc13bb3 faster pickle 2025-06-23 00:51:52 +08:00
GeeeekExplorer
cde3fc22c2 simplify 2025-06-21 17:19:15 +08:00
jinghuan-Chen
ffafaeb133 Release CUDA Graphs resource before exit. 2025-06-18 16:17:31 +08:00
GeeeekExplorer
bc0ad5a116 better 2025-06-17 23:33:38 +08:00
GeeeekExplorer
7e42fa6f63 fix 2025-06-15 13:28:29 +08:00
GeeeekExplorer
fc778a4da9 better 2025-06-15 10:36:45 +08:00
cheunglei
53b3ef2e32 support tensor parallel 2025-06-15 01:31:24 +08:00
GeeeekExplorer
b6136383c9 support fast pickle 2025-06-14 13:36:57 +08:00
GeeeekExplorer
4a8aa090a7 fix 2025-06-14 00:56:07 +08:00
GeeeekExplorer
98a1551a7d support CUDA_VISIBLE_DEVICES 2025-06-12 23:14:01 +08:00
GeeeekExplorer
fee58d44e4 fix 2025-06-12 01:00:31 +08:00
GeeeekExplorer
08c84ec08d multi file loader 2025-06-12 01:00:09 +08:00
GeeeekExplorer
386290d69e refactor 2025-06-11 21:12:57 +08:00
GeeeekExplorer
b98e1ca305 fix 2025-06-10 21:25:54 +08:00
GeeeekExplorer
a5a4909e6a init commit 2025-06-10 00:27:01 +08:00