Commit Graph

70 Commits

Author SHA1 Message Date
Zijie Tian
08d83185ce [fix] fix bench*.py. 2025-12-22 19:53:50 +08:00
Zijie Tian
051f2295c9 [feat] Added sparse KVcache feature, NEED VERIFY. 2025-12-22 08:51:02 +08:00
Zijie Tian
8df0c7517b [docs] refactor CLAUDE.md. 2025-12-15 21:43:33 +08:00
Zijie Tian
dc7807a211 [feat] Fixed warmup memory overhead. 2025-12-15 21:39:14 +08:00
Zijie Tian
91a0f09a24 [feat] Optimized with ASYNC offload. 2025-12-15 07:21:35 +08:00
Zijie Tian
b8b6478506 [feat] Need to optimized with async prefetch. 2025-12-15 06:58:40 +08:00
Zijie Tian
1081ab51ea [refactor] Refactor offload code to multi-chunk. 2025-12-15 01:13:58 +08:00
Zijie Tian
5949537faf [docs] Start ues CLAUDE rules. 2025-12-15 00:20:54 +08:00
Zijie Tian
a37f07943c [docs] Update the CLAUDE.md. 2025-12-15 00:13:27 +08:00
Zijie Tian
61edb8a344 [feat] Finished offload. Still need optimize performance. 2025-12-12 02:27:40 +08:00
Zijie Tian
9b8165af5a [fix] Fixed kvcache offload problem. 2025-12-12 01:35:30 +08:00
Zijie Tian
60d24f7c12 [feat] Added bench_offload.py and GreedySampler. 2025-12-12 00:24:08 +08:00
Zijie Tian
0bd7ba7536 [fix] Fixed chunked_attention.py implement. 2025-12-11 22:39:50 +08:00
Zijie Tian
b9ed77cbbb [fix] Fix import error. 2025-12-11 05:31:06 +08:00
Zijie Tian
babfa17354 [refactor] Translate into english, void Chinese due to claude. 2025-12-11 00:30:24 +08:00
Zijie Tian
e85c2b4776 [fix] Fixed kvcache offload bugs. 2025-12-10 22:34:00 +08:00
Zijie Tian
190df5f70d [refactor] Refactor current gpu and cpu block allocation strategy. 2025-12-10 21:23:31 +08:00
Zijie Tian
0a247ccb1b [feat] Added num_gpu_blocks limit gpu blocks. 2025-12-10 20:17:42 +08:00
Zijie Tian
01f19ee4a6 [feat] Added logger into nanovllm. 2025-12-10 19:53:38 +08:00
Zijie Tian
87055cc5ce [refactor] Implement real chunked prefill mechenism. 2025-12-10 18:34:01 +08:00
Zijie Tian
0b6f19242d [feat] Added chunked prefill and kvcache offload mechenism. 2025-12-10 03:47:37 +08:00
Zijie Tian
204fe2b38f [feat] Added metric into tqdm bar. 2025-12-10 00:52:13 +08:00
Zijie Tian
761929390e [bench] Added vllm vs nano-vllm bench. 2025-12-10 00:44:57 +08:00
GeeeekExplorer
2f21442653 support qwen2 2025-11-04 01:44:42 +08:00
GeeeekExplorer
db1b49dce4 add logo and trendshift 2025-11-04 00:45:10 +08:00
GeeeekExplorer
6ef2a4f630 compile random sampling 2025-08-31 22:55:34 +08:00
GeeeekExplorer
df99418f7d simplify 2025-08-31 20:02:51 +08:00
Xingkai Yu
6a6d217de7 Merge pull request #67 from PeterDing/fix/decoding-positions
fix(model_runner): correct position indexing to be 0-based
2025-08-31 18:05:45 +08:00
PeterDing
f5b4840276 fix(model_runner): correct position indexing to be 0-based
- Change position calculation from len(seq) to len(seq) - 1
2025-07-04 14:29:12 +08:00
GeeeekExplorer
38baf0bbe4 remove assert shape 2025-06-27 23:00:30 +08:00
Xingkai Yu
2de882a395 Merge pull request #60 from GeeeekExplorer/warmup 2025-06-27 22:52:11 +08:00
GeeeekExplorer
cb0b3dec3f remove rng state 2025-06-27 22:50:33 +08:00
Xingkai Yu
6802cb2f42 Merge pull request #54 from TonyLianLong/patch-1 2025-06-27 22:44:38 +08:00
GeeeekExplorer
1caeec8dfa same as vllm 2025-06-27 18:50:56 +08:00
GeeeekExplorer
658520b788 warmup and allocate 2025-06-27 01:51:57 +08:00
Long(Tony) Lian
c2ee8b8dff Update pyproject.toml to fix missing files 2025-06-25 17:57:38 -07:00
papadopoulos Aggelos-Michael
cfc4cb6710 docs: add manual download instructions 2025-06-24 23:38:28 +08:00
Xingkai Yu
37eb91f890 Merge pull request #39 from xiaohajiayou/main 2025-06-24 22:51:58 +08:00
xiaohajiayou
054aec852d Fix: Division-by-Zero Risk and Typo 2025-06-24 02:02:33 +08:00
GeeeekExplorer
03cfc13bb3 faster pickle 2025-06-23 00:51:52 +08:00
Xingkai Yu
8162578b60 star history 2025-06-22 15:13:04 +08:00
GeeeekExplorer
cde3fc22c2 simplify 2025-06-21 17:19:15 +08:00
Xingkai Yu
ad4e95fbdc update .gitignore 2025-06-21 07:28:40 +08:00
GeeeekExplorer
801365a611 update bench 2025-06-19 23:28:11 +08:00
Xingkai Yu
fa0078174e Merge pull request #24 from jinghuan-Chen/fix/Release-CUDA-Graphs-resource-before-exit 2025-06-18 17:15:28 +08:00
jinghuan-Chen
ffafaeb133 Release CUDA Graphs resource before exit. 2025-06-18 16:17:31 +08:00
Xingkai Yu
4fc764f175 Merge pull request #22 from cheunglei/use_spawn 2025-06-17 23:53:59 +08:00
cheunglei
b5ace32982 use spawn 2025-06-17 23:49:15 +08:00
GeeeekExplorer
bc0ad5a116 better 2025-06-17 23:33:38 +08:00
GeeeekExplorer
7e42fa6f63 fix 2025-06-15 13:28:29 +08:00