Commit Graph

77 Commits

Author SHA1 Message Date
Zijie Tian
2a6e0a2c02 [feat] Added Quest Sparsity Policy. 2026-01-07 03:29:21 +08:00
Zijie Tian
c99a6f3d3f [WIP] Before add Quest policy. 2026-01-07 02:32:30 +08:00
Zijie Tian
0e691f2d85 [WIP] move metadata to GPU. 2026-01-06 23:32:32 +08:00
Zijie Tian
690492e074 [WIP] Before refactor policies. 2026-01-06 20:47:55 +08:00
Zijie Tian
7cc8a394a5 [fix] Fixed bench_offload.py, BUT performance DEGRAD. 2026-01-06 18:46:48 +08:00
Zijie Tian
535f2037ab [WIP] Before fix bench_offload.py. 2026-01-06 18:41:08 +08:00
Zijie Tian
c7ac39dfbd [refactor] Before add sprae policy. 2026-01-05 21:19:24 +08:00
Zijie Tian
e554d5482b [refactor] Delete unnesscessory test, and refacrtor the offload prefix cache. 2026-01-05 20:31:42 +08:00
Zijie Tian
247c5312d9 [fix] Fixed decode misalign. 2026-01-05 19:00:44 +08:00
Zijie Tian
054aaff403 [fix] Fixed needle test bug. 2026-01-05 18:34:09 +08:00
Zijie Tian
d623043a3c [WIP] FIXED decode and prefill NEEDLE test. 2026-01-05 01:51:46 +08:00
Zijie Tian
e897380127 [test] Added test_align.py and Before change nanovllm attention. 2026-01-04 22:48:01 +08:00
Zijie Tian
772313db8f [refactor] Refactor the kvcache offload. 2026-01-04 19:37:03 +08:00
Zijie Tian
00ed17c640 [feat] Added debug tools. 2026-01-03 22:36:40 +08:00
Zijie Tian
74ee6d0895 [WIP] need to fix model to normally decode. 2026-01-01 05:18:27 +08:00
Zijie Tian
965c8aff12 [WIP] need change flashattention to debug. 2026-01-01 00:58:22 +08:00
Zijie Tian
30462fe89a [WIP] Before fix needle. 2025-12-31 23:35:25 +08:00
Zijie Tian
ccd1b3d4ab [WIP] Before modify nanovllm CPU-GPU kvcache. 2025-12-31 22:41:07 +08:00
Zijie Tian
484d0de9f9 [feat] Added debug hook to offload_engine.py. 2025-12-31 19:44:39 +08:00
Zijie Tian
89f8020d38 [WIP] fixing attention compute error. 2025-12-30 00:31:48 +08:00
Zijie Tian
82ed34fc2d [opt] optimize nanovllm performance compareable with vllm. 2025-12-25 03:47:07 +08:00
Zijie Tian
16fcf8350b [WIP] replace merge attention with triton kernel. 2025-12-25 01:07:05 +08:00
Zijie Tian
cf5e7df093 [WIP] Added sgDMA operator for scatter kvcache communication. 2025-12-24 23:48:52 +08:00
Zijie Tian
6ec1b23982 [WIP] NEED to modify communication. 2025-12-24 21:57:51 +08:00
Zijie Tian
782437c486 [WIP] remove num_prefetch_blocks varible. 2025-12-24 18:22:26 +08:00
Zijie Tian
4dcef16c13 [WIP] NEED refactor nanovllm mechenism. 2025-12-22 23:52:56 +08:00
Zijie Tian
1907b625b6 [refactor] Remove legacy mode path. 2025-12-22 20:17:56 +08:00
Zijie Tian
051f2295c9 [feat] Added sparse KVcache feature, NEED VERIFY. 2025-12-22 08:51:02 +08:00
Zijie Tian
dc7807a211 [feat] Fixed warmup memory overhead. 2025-12-15 21:39:14 +08:00
Zijie Tian
91a0f09a24 [feat] Optimized with ASYNC offload. 2025-12-15 07:21:35 +08:00
Zijie Tian
b8b6478506 [feat] Need to optimized with async prefetch. 2025-12-15 06:58:40 +08:00
Zijie Tian
1081ab51ea [refactor] Refactor offload code to multi-chunk. 2025-12-15 01:13:58 +08:00
Zijie Tian
61edb8a344 [feat] Finished offload. Still need optimize performance. 2025-12-12 02:27:40 +08:00
Zijie Tian
9b8165af5a [fix] Fixed kvcache offload problem. 2025-12-12 01:35:30 +08:00
Zijie Tian
60d24f7c12 [feat] Added bench_offload.py and GreedySampler. 2025-12-12 00:24:08 +08:00
Zijie Tian
0bd7ba7536 [fix] Fixed chunked_attention.py implement. 2025-12-11 22:39:50 +08:00
Zijie Tian
b9ed77cbbb [fix] Fix import error. 2025-12-11 05:31:06 +08:00
Zijie Tian
babfa17354 [refactor] Translate into english, void Chinese due to claude. 2025-12-11 00:30:24 +08:00
Zijie Tian
e85c2b4776 [fix] Fixed kvcache offload bugs. 2025-12-10 22:34:00 +08:00
Zijie Tian
190df5f70d [refactor] Refactor current gpu and cpu block allocation strategy. 2025-12-10 21:23:31 +08:00
Zijie Tian
0a247ccb1b [feat] Added num_gpu_blocks limit gpu blocks. 2025-12-10 20:17:42 +08:00
Zijie Tian
01f19ee4a6 [feat] Added logger into nanovllm. 2025-12-10 19:53:38 +08:00
Zijie Tian
87055cc5ce [refactor] Implement real chunked prefill mechenism. 2025-12-10 18:34:01 +08:00
Zijie Tian
0b6f19242d [feat] Added chunked prefill and kvcache offload mechenism. 2025-12-10 03:47:37 +08:00
Zijie Tian
204fe2b38f [feat] Added metric into tqdm bar. 2025-12-10 00:52:13 +08:00
Zijie Tian
761929390e [bench] Added vllm vs nano-vllm bench. 2025-12-10 00:44:57 +08:00
GeeeekExplorer
2f21442653 support qwen2 2025-11-04 01:44:42 +08:00
GeeeekExplorer
6ef2a4f630 compile random sampling 2025-08-31 22:55:34 +08:00
GeeeekExplorer
df99418f7d simplify 2025-08-31 20:02:51 +08:00
PeterDing
f5b4840276 fix(model_runner): correct position indexing to be 0-based
- Change position calculation from len(seq) to len(seq) - 1
2025-07-04 14:29:12 +08:00