Zijie Tian
|
82ed34fc2d
|
[opt] optimize nanovllm performance compareable with vllm.
|
2025-12-25 03:47:07 +08:00 |
|
Zijie Tian
|
16fcf8350b
|
[WIP] replace merge attention with triton kernel.
|
2025-12-25 01:07:05 +08:00 |
|
Zijie Tian
|
cf5e7df093
|
[WIP] Added sgDMA operator for scatter kvcache communication.
|
2025-12-24 23:48:52 +08:00 |
|
Zijie Tian
|
6ec1b23982
|
[WIP] NEED to modify communication.
|
2025-12-24 21:57:51 +08:00 |
|
Zijie Tian
|
782437c486
|
[WIP] remove num_prefetch_blocks varible.
|
2025-12-24 18:22:26 +08:00 |
|
Zijie Tian
|
b264de903d
|
[test] Added a simple test_prefill.py.
|
2025-12-23 00:26:25 +08:00 |
|
Zijie Tian
|
4dcef16c13
|
[WIP] NEED refactor nanovllm mechenism.
|
2025-12-22 23:52:56 +08:00 |
|
Zijie Tian
|
1907b625b6
|
[refactor] Remove legacy mode path.
|
2025-12-22 20:17:56 +08:00 |
|
Zijie Tian
|
08d83185ce
|
[fix] fix bench*.py.
|
2025-12-22 19:53:50 +08:00 |
|
Zijie Tian
|
051f2295c9
|
[feat] Added sparse KVcache feature, NEED VERIFY.
|
2025-12-22 08:51:02 +08:00 |
|
Zijie Tian
|
8df0c7517b
|
[docs] refactor CLAUDE.md.
|
2025-12-15 21:43:33 +08:00 |
|
Zijie Tian
|
dc7807a211
|
[feat] Fixed warmup memory overhead.
|
2025-12-15 21:39:14 +08:00 |
|
Zijie Tian
|
91a0f09a24
|
[feat] Optimized with ASYNC offload.
|
2025-12-15 07:21:35 +08:00 |
|
Zijie Tian
|
b8b6478506
|
[feat] Need to optimized with async prefetch.
|
2025-12-15 06:58:40 +08:00 |
|
Zijie Tian
|
1081ab51ea
|
[refactor] Refactor offload code to multi-chunk.
|
2025-12-15 01:13:58 +08:00 |
|
Zijie Tian
|
5949537faf
|
[docs] Start ues CLAUDE rules.
|
2025-12-15 00:20:54 +08:00 |
|
Zijie Tian
|
a37f07943c
|
[docs] Update the CLAUDE.md.
|
2025-12-15 00:13:27 +08:00 |
|
Zijie Tian
|
61edb8a344
|
[feat] Finished offload. Still need optimize performance.
|
2025-12-12 02:27:40 +08:00 |
|
Zijie Tian
|
9b8165af5a
|
[fix] Fixed kvcache offload problem.
|
2025-12-12 01:35:30 +08:00 |
|
Zijie Tian
|
60d24f7c12
|
[feat] Added bench_offload.py and GreedySampler.
|
2025-12-12 00:24:08 +08:00 |
|
Zijie Tian
|
0bd7ba7536
|
[fix] Fixed chunked_attention.py implement.
|
2025-12-11 22:39:50 +08:00 |
|
Zijie Tian
|
b9ed77cbbb
|
[fix] Fix import error.
|
2025-12-11 05:31:06 +08:00 |
|
Zijie Tian
|
babfa17354
|
[refactor] Translate into english, void Chinese due to claude.
|
2025-12-11 00:30:24 +08:00 |
|
Zijie Tian
|
e85c2b4776
|
[fix] Fixed kvcache offload bugs.
|
2025-12-10 22:34:00 +08:00 |
|
Zijie Tian
|
190df5f70d
|
[refactor] Refactor current gpu and cpu block allocation strategy.
|
2025-12-10 21:23:31 +08:00 |
|
Zijie Tian
|
0a247ccb1b
|
[feat] Added num_gpu_blocks limit gpu blocks.
|
2025-12-10 20:17:42 +08:00 |
|
Zijie Tian
|
01f19ee4a6
|
[feat] Added logger into nanovllm.
|
2025-12-10 19:53:38 +08:00 |
|
Zijie Tian
|
87055cc5ce
|
[refactor] Implement real chunked prefill mechenism.
|
2025-12-10 18:34:01 +08:00 |
|
Zijie Tian
|
0b6f19242d
|
[feat] Added chunked prefill and kvcache offload mechenism.
|
2025-12-10 03:47:37 +08:00 |
|
Zijie Tian
|
204fe2b38f
|
[feat] Added metric into tqdm bar.
|
2025-12-10 00:52:13 +08:00 |
|
Zijie Tian
|
761929390e
|
[bench] Added vllm vs nano-vllm bench.
|
2025-12-10 00:44:57 +08:00 |
|
GeeeekExplorer
|
2f21442653
|
support qwen2
|
2025-11-04 01:44:42 +08:00 |
|
GeeeekExplorer
|
db1b49dce4
|
add logo and trendshift
|
2025-11-04 00:45:10 +08:00 |
|
GeeeekExplorer
|
6ef2a4f630
|
compile random sampling
|
2025-08-31 22:55:34 +08:00 |
|
GeeeekExplorer
|
df99418f7d
|
simplify
|
2025-08-31 20:02:51 +08:00 |
|
Xingkai Yu
|
6a6d217de7
|
Merge pull request #67 from PeterDing/fix/decoding-positions
fix(model_runner): correct position indexing to be 0-based
|
2025-08-31 18:05:45 +08:00 |
|
PeterDing
|
f5b4840276
|
fix(model_runner): correct position indexing to be 0-based
- Change position calculation from len(seq) to len(seq) - 1
|
2025-07-04 14:29:12 +08:00 |
|
GeeeekExplorer
|
38baf0bbe4
|
remove assert shape
|
2025-06-27 23:00:30 +08:00 |
|
Xingkai Yu
|
2de882a395
|
Merge pull request #60 from GeeeekExplorer/warmup
|
2025-06-27 22:52:11 +08:00 |
|
GeeeekExplorer
|
cb0b3dec3f
|
remove rng state
|
2025-06-27 22:50:33 +08:00 |
|
Xingkai Yu
|
6802cb2f42
|
Merge pull request #54 from TonyLianLong/patch-1
|
2025-06-27 22:44:38 +08:00 |
|
GeeeekExplorer
|
1caeec8dfa
|
same as vllm
|
2025-06-27 18:50:56 +08:00 |
|
GeeeekExplorer
|
658520b788
|
warmup and allocate
|
2025-06-27 01:51:57 +08:00 |
|
Long(Tony) Lian
|
c2ee8b8dff
|
Update pyproject.toml to fix missing files
|
2025-06-25 17:57:38 -07:00 |
|
papadopoulos Aggelos-Michael
|
cfc4cb6710
|
docs: add manual download instructions
|
2025-06-24 23:38:28 +08:00 |
|
Xingkai Yu
|
37eb91f890
|
Merge pull request #39 from xiaohajiayou/main
|
2025-06-24 22:51:58 +08:00 |
|
xiaohajiayou
|
054aec852d
|
Fix: Division-by-Zero Risk and Typo
|
2025-06-24 02:02:33 +08:00 |
|
GeeeekExplorer
|
03cfc13bb3
|
faster pickle
|
2025-06-23 00:51:52 +08:00 |
|
Xingkai Yu
|
8162578b60
|
star history
|
2025-06-22 15:13:04 +08:00 |
|
GeeeekExplorer
|
cde3fc22c2
|
simplify
|
2025-06-21 17:19:15 +08:00 |
|