Commit Graph

  • ff8b09cd35 [test] Added test_needle_ref.py. tzj/minference Zijie Tian 2026-01-02 22:03:23 +08:00
  • 74ee6d0895 [WIP] need to fix model to normally decode. Zijie Tian 2026-01-01 05:18:27 +08:00
  • 62b8a63314 [refactor] Refactor the test_chunked_prefill/decode. Zijie Tian 2026-01-01 03:32:26 +08:00
  • 965c8aff12 [WIP] need change flashattention to debug. Zijie Tian 2026-01-01 00:58:22 +08:00
  • 30462fe89a [WIP] Before fix needle. Zijie Tian 2025-12-31 23:35:25 +08:00
  • ccd1b3d4ab [WIP] Before modify nanovllm CPU-GPU kvcache. Zijie Tian 2025-12-31 22:41:07 +08:00
  • 31e90a7268 [test] Added offload correct verify. Zijie Tian 2025-12-31 20:59:53 +08:00
  • 484d0de9f9 [feat] Added debug hook to offload_engine.py. Zijie Tian 2025-12-31 19:44:39 +08:00
  • 7af721c12c [WIP] Before modify to FlashInfer. Zijie Tian 2025-12-30 01:11:13 +08:00
  • 89f8020d38 [WIP] fixing attention compute error. Zijie Tian 2025-12-30 00:31:48 +08:00
  • bf4c63c7ec [docs] Added Sparse Attn. Zijie Tian 2025-12-29 19:56:54 +08:00
  • 600af0f59c [fix] Fixed compile problem. Zijie Tian 2025-12-26 21:02:43 +08:00
  • 82ed34fc2d [opt] optimize nanovllm performance compareable with vllm. Zijie Tian 2025-12-25 03:47:07 +08:00
  • 16fcf8350b [WIP] replace merge attention with triton kernel. Zijie Tian 2025-12-25 01:07:05 +08:00
  • cf5e7df093 [WIP] Added sgDMA operator for scatter kvcache communication. Zijie Tian 2025-12-24 23:48:52 +08:00
  • 6ec1b23982 [WIP] NEED to modify communication. Zijie Tian 2025-12-24 21:57:51 +08:00
  • 782437c486 [WIP] remove num_prefetch_blocks varible. Zijie Tian 2025-12-24 18:22:26 +08:00
  • b264de903d [test] Added a simple test_prefill.py. Zijie Tian 2025-12-23 00:26:25 +08:00
  • 4dcef16c13 [WIP] NEED refactor nanovllm mechenism. Zijie Tian 2025-12-22 23:52:56 +08:00
  • 1907b625b6 [refactor] Remove legacy mode path. Zijie Tian 2025-12-22 20:17:56 +08:00
  • 08d83185ce [fix] fix bench*.py. Zijie Tian 2025-12-22 19:53:50 +08:00
  • 051f2295c9 [feat] Added sparse KVcache feature, NEED VERIFY. Zijie Tian 2025-12-22 08:51:02 +08:00
  • 8df0c7517b [docs] refactor CLAUDE.md. Zijie Tian 2025-12-15 21:43:33 +08:00
  • dc7807a211 [feat] Fixed warmup memory overhead. Zijie Tian 2025-12-15 21:39:14 +08:00
  • 91a0f09a24 [feat] Optimized with ASYNC offload. Zijie Tian 2025-12-15 07:21:35 +08:00
  • b8b6478506 [feat] Need to optimized with async prefetch. Zijie Tian 2025-12-15 06:58:40 +08:00
  • 1081ab51ea [refactor] Refactor offload code to multi-chunk. Zijie Tian 2025-12-15 01:13:58 +08:00
  • 5949537faf [docs] Start ues CLAUDE rules. Zijie Tian 2025-12-15 00:20:54 +08:00
  • a37f07943c [docs] Update the CLAUDE.md. Zijie Tian 2025-12-15 00:13:27 +08:00
  • 61edb8a344 [feat] Finished offload. Still need optimize performance. Zijie Tian 2025-12-12 02:27:40 +08:00
  • 9b8165af5a [fix] Fixed kvcache offload problem. Zijie Tian 2025-12-12 01:35:30 +08:00
  • 60d24f7c12 [feat] Added bench_offload.py and GreedySampler. Zijie Tian 2025-12-12 00:24:08 +08:00
  • 0bd7ba7536 [fix] Fixed chunked_attention.py implement. Zijie Tian 2025-12-11 22:39:50 +08:00
  • b9ed77cbbb [fix] Fix import error. Zijie Tian 2025-12-11 05:31:06 +08:00
  • babfa17354 [refactor] Translate into english, void Chinese due to claude. Zijie Tian 2025-12-11 00:30:24 +08:00
  • e85c2b4776 [fix] Fixed kvcache offload bugs. Zijie Tian 2025-12-10 22:34:00 +08:00
  • 190df5f70d [refactor] Refactor current gpu and cpu block allocation strategy. Zijie Tian 2025-12-10 21:23:31 +08:00
  • 0a247ccb1b [feat] Added num_gpu_blocks limit gpu blocks. Zijie Tian 2025-12-10 20:17:42 +08:00
  • 01f19ee4a6 [feat] Added logger into nanovllm. Zijie Tian 2025-12-10 19:53:38 +08:00
  • 87055cc5ce [refactor] Implement real chunked prefill mechenism. Zijie Tian 2025-12-10 18:34:01 +08:00
  • 0b6f19242d [feat] Added chunked prefill and kvcache offload mechenism. Zijie Tian 2025-12-10 03:47:37 +08:00
  • 204fe2b38f [feat] Added metric into tqdm bar. Zijie Tian 2025-12-10 00:52:13 +08:00
  • 761929390e [bench] Added vllm vs nano-vllm bench. Zijie Tian 2025-12-10 00:44:57 +08:00
  • 2f21442653 support qwen2 GeeeekExplorer 2025-11-04 01:44:09 +08:00
  • db1b49dce4 add logo and trendshift GeeeekExplorer 2025-11-04 00:35:12 +08:00
  • 6ef2a4f630 compile random sampling GeeeekExplorer 2025-08-31 22:55:34 +08:00
  • df99418f7d simplify GeeeekExplorer 2025-08-31 19:44:57 +08:00
  • 6a6d217de7 Merge pull request #67 from PeterDing/fix/decoding-positions Xingkai Yu 2025-08-31 18:05:45 +08:00
  • f5b4840276 fix(model_runner): correct position indexing to be 0-based PeterDing 2025-07-04 14:29:12 +08:00
  • 38baf0bbe4 remove assert shape GeeeekExplorer 2025-06-27 23:00:30 +08:00
  • 2de882a395 Merge pull request #60 from GeeeekExplorer/warmup Xingkai Yu 2025-06-27 22:52:11 +08:00
  • cb0b3dec3f remove rng state GeeeekExplorer 2025-06-27 22:50:33 +08:00
  • 6802cb2f42 Merge pull request #54 from TonyLianLong/patch-1 Xingkai Yu 2025-06-27 22:44:38 +08:00
  • 1caeec8dfa same as vllm GeeeekExplorer 2025-06-27 18:50:56 +08:00
  • 658520b788 warmup and allocate GeeeekExplorer 2025-06-27 01:51:57 +08:00
  • c2ee8b8dff Update pyproject.toml to fix missing files Long(Tony) Lian 2025-06-25 17:57:38 -07:00
  • cfc4cb6710 docs: add manual download instructions papadopoulos Aggelos-Michael 2025-06-24 18:38:28 +03:00
  • 37eb91f890 Merge pull request #39 from xiaohajiayou/main Xingkai Yu 2025-06-24 22:51:58 +08:00
  • 054aec852d Fix: Division-by-Zero Risk and Typo xiaohajiayou 2025-06-24 02:02:33 +08:00
  • 03cfc13bb3 faster pickle GeeeekExplorer 2025-06-23 00:51:52 +08:00
  • 8162578b60 star history Xingkai Yu 2025-06-22 15:13:04 +08:00
  • cde3fc22c2 simplify GeeeekExplorer 2025-06-21 17:04:53 +08:00
  • ad4e95fbdc update .gitignore Xingkai Yu 2025-06-21 07:28:40 +08:00
  • 801365a611 update bench GeeeekExplorer 2025-06-19 23:24:43 +08:00
  • fa0078174e Merge pull request #24 from jinghuan-Chen/fix/Release-CUDA-Graphs-resource-before-exit Xingkai Yu 2025-06-18 17:15:28 +08:00
  • ffafaeb133 Release CUDA Graphs resource before exit. jinghuan-Chen 2025-06-18 16:17:31 +08:00
  • 4fc764f175 Merge pull request #22 from cheunglei/use_spawn Xingkai Yu 2025-06-17 23:53:59 +08:00
  • b5ace32982 use spawn cheunglei 2025-06-17 22:48:44 +08:00
  • bc0ad5a116 better GeeeekExplorer 2025-06-17 23:15:02 +08:00
  • 7e42fa6f63 fix GeeeekExplorer 2025-06-15 13:09:05 +08:00
  • 326b121fad Merge pull request #10 from MARD1NO/refine_return_hint_in_schedule Xingkai Yu 2025-06-15 10:39:51 +08:00
  • ba96387043 Merge pull request #11 from GeeeekExplorer/tp_dev Xingkai Yu 2025-06-15 10:37:21 +08:00
  • fc778a4da9 better GeeeekExplorer 2025-06-15 10:31:48 +08:00
  • c1fd4ea3c2 Merge pull request #9 from cheunglei/tp_dev Xingkai Yu 2025-06-15 10:22:18 +08:00
  • 98bbbefb68 schedule return bool args MARD1NO 2025-06-15 10:15:05 +08:00
  • 53b3ef2e32 support tensor parallel cheunglei 2025-06-15 01:31:24 +08:00
  • b6136383c9 support fast pickle GeeeekExplorer 2025-06-14 13:36:57 +08:00
  • 4a8aa090a7 fix GeeeekExplorer 2025-06-14 00:36:32 +08:00
  • 9b59dae751 Merge pull request #4 from cheunglei/main Xingkai Yu 2025-06-13 23:46:18 +08:00
  • 0ea7414b19 require xxhash cheunglei 2025-06-13 23:40:07 +08:00
  • 59aa3ff57c better GeeeekExplorer 2025-06-13 13:07:33 +08:00
  • 135d1b38a2 release GeeeekExplorer 2025-06-13 00:41:33 +08:00
  • 98a1551a7d support CUDA_VISIBLE_DEVICES GeeeekExplorer 2025-06-12 23:14:01 +08:00
  • ec3c60d96f update bench GeeeekExplorer 2025-06-12 09:47:09 +08:00
  • f16adb729e refactor GeeeekExplorer 2025-06-12 09:41:12 +08:00
  • fee58d44e4 fix GeeeekExplorer 2025-06-11 21:17:23 +08:00
  • 08c84ec08d multi file loader GeeeekExplorer 2025-06-11 22:32:48 +08:00
  • 386290d69e refactor GeeeekExplorer 2025-06-11 21:12:57 +08:00
  • b98e1ca305 fix GeeeekExplorer 2025-06-10 08:52:58 +08:00
  • a5a4909e6a init commit GeeeekExplorer 2025-06-10 00:23:23 +08:00