Zijie Tian
|
ccf27d3a74
|
[claudesquad] update from 'perf_opt-1' on 07 Jan 26 05:58 CST
|
2026-01-07 05:58:23 +08:00 |
|
Zijie Tian
|
58a06501c1
|
Merge branch 'zijie/debug_chunk-2' into tzj/minference
|
2026-01-07 03:30:38 +08:00 |
|
Zijie Tian
|
2a6e0a2c02
|
[feat] Added Quest Sparsity Policy.
|
2026-01-07 03:29:21 +08:00 |
|
Zijie Tian
|
2fe50bab50
|
[claudesquad] update from 'debug_chunk-2' on 07 Jan 26 03:27 CST
|
2026-01-07 03:27:27 +08:00 |
|
Zijie Tian
|
c99a6f3d3f
|
[WIP] Before add Quest policy.
|
2026-01-07 02:32:30 +08:00 |
|
Zijie Tian
|
7cc8a394a5
|
[fix] Fixed bench_offload.py, BUT performance DEGRAD.
|
2026-01-06 18:46:48 +08:00 |
|
Zijie Tian
|
535f2037ab
|
[WIP] Before fix bench_offload.py.
|
2026-01-06 18:41:08 +08:00 |
|
Zijie Tian
|
247c5312d9
|
[fix] Fixed decode misalign.
|
2026-01-05 19:00:44 +08:00 |
|
Zijie Tian
|
054aaff403
|
[fix] Fixed needle test bug.
|
2026-01-05 18:34:09 +08:00 |
|
Zijie Tian
|
772313db8f
|
[refactor] Refactor the kvcache offload.
|
2026-01-04 19:37:03 +08:00 |
|
Zijie Tian
|
74ee6d0895
|
[WIP] need to fix model to normally decode.
|
2026-01-01 05:18:27 +08:00 |
|
Zijie Tian
|
965c8aff12
|
[WIP] need change flashattention to debug.
|
2026-01-01 00:58:22 +08:00 |
|
Zijie Tian
|
30462fe89a
|
[WIP] Before fix needle.
|
2025-12-31 23:35:25 +08:00 |
|
Zijie Tian
|
484d0de9f9
|
[feat] Added debug hook to offload_engine.py.
|
2025-12-31 19:44:39 +08:00 |
|
Zijie Tian
|
89f8020d38
|
[WIP] fixing attention compute error.
|
2025-12-30 00:31:48 +08:00 |
|
Zijie Tian
|
82ed34fc2d
|
[opt] optimize nanovllm performance compareable with vllm.
|
2025-12-25 03:47:07 +08:00 |
|
Zijie Tian
|
6ec1b23982
|
[WIP] NEED to modify communication.
|
2025-12-24 21:57:51 +08:00 |
|
Zijie Tian
|
782437c486
|
[WIP] remove num_prefetch_blocks varible.
|
2025-12-24 18:22:26 +08:00 |
|
Zijie Tian
|
051f2295c9
|
[feat] Added sparse KVcache feature, NEED VERIFY.
|
2025-12-22 08:51:02 +08:00 |
|
Zijie Tian
|
91a0f09a24
|
[feat] Optimized with ASYNC offload.
|
2025-12-15 07:21:35 +08:00 |
|
Zijie Tian
|
b8b6478506
|
[feat] Need to optimized with async prefetch.
|
2025-12-15 06:58:40 +08:00 |
|
Zijie Tian
|
1081ab51ea
|
[refactor] Refactor offload code to multi-chunk.
|
2025-12-15 01:13:58 +08:00 |
|
Zijie Tian
|
61edb8a344
|
[feat] Finished offload. Still need optimize performance.
|
2025-12-12 02:27:40 +08:00 |
|
Zijie Tian
|
9b8165af5a
|
[fix] Fixed kvcache offload problem.
|
2025-12-12 01:35:30 +08:00 |
|
Zijie Tian
|
babfa17354
|
[refactor] Translate into english, void Chinese due to claude.
|
2025-12-11 00:30:24 +08:00 |
|
Zijie Tian
|
e85c2b4776
|
[fix] Fixed kvcache offload bugs.
|
2025-12-10 22:34:00 +08:00 |
|
Zijie Tian
|
190df5f70d
|
[refactor] Refactor current gpu and cpu block allocation strategy.
|
2025-12-10 21:23:31 +08:00 |
|
Zijie Tian
|
0a247ccb1b
|
[feat] Added num_gpu_blocks limit gpu blocks.
|
2025-12-10 20:17:42 +08:00 |
|
Zijie Tian
|
87055cc5ce
|
[refactor] Implement real chunked prefill mechenism.
|
2025-12-10 18:34:01 +08:00 |
|
Zijie Tian
|
0b6f19242d
|
[feat] Added chunked prefill and kvcache offload mechenism.
|
2025-12-10 03:47:37 +08:00 |
|
Zijie Tian
|
761929390e
|
[bench] Added vllm vs nano-vllm bench.
|
2025-12-10 00:44:57 +08:00 |
|
GeeeekExplorer
|
df99418f7d
|
simplify
|
2025-08-31 20:02:51 +08:00 |
|
GeeeekExplorer
|
1caeec8dfa
|
same as vllm
|
2025-06-27 18:50:56 +08:00 |
|
GeeeekExplorer
|
658520b788
|
warmup and allocate
|
2025-06-27 01:51:57 +08:00 |
|
GeeeekExplorer
|
386290d69e
|
refactor
|
2025-06-11 21:12:57 +08:00 |
|
GeeeekExplorer
|
b98e1ca305
|
fix
|
2025-06-10 21:25:54 +08:00 |
|
GeeeekExplorer
|
a5a4909e6a
|
init commit
|
2025-06-10 00:27:01 +08:00 |
|