zijie-tian
  • Joined on 2026-01-03
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-07 06:25:42 +08:00
6575099a06 [refactor] Cleanup unused code after perf_opt merge
8fd25d72d7 Merge perf_opt-1 and perf_opt-2 branches
ccf27d3a74 [claudesquad] update from 'perf_opt-1' on 07 Jan 26 05:58 CST
0ad86eb449 [claudesquad] update from 'perf_opt-2' on 07 Jan 26 05:58 CST
Compare 4 commits »
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-07 04:24:57 +08:00
aa953ecb59 [refactor] Aligned the bench.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-07 03:32:05 +08:00
362f5e575f [fix] Fixed .gitignores .
58a06501c1 Merge branch 'zijie/debug_chunk-2' into tzj/minference
2fe50bab50 [claudesquad] update from 'debug_chunk-2' on 07 Jan 26 03:27 CST
Compare 3 commits »
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-07 03:29:13 +08:00
2a6e0a2c02 [feat] Added Quest Sparsity Policy.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-07 02:32:21 +08:00
c99a6f3d3f [WIP] Before add Quest policy.
f240903013 [docs] Add GPU mutex instructions for multi-instance debugging
Compare 2 commits »
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-06 23:32:22 +08:00
0e691f2d85 [WIP] move metadata to GPU.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-06 22:30:21 +08:00
edb5273e34 [WIP] Added basic test for quest.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-06 20:47:46 +08:00
690492e074 [WIP] Before refactor policies.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-06 18:46:38 +08:00
7cc8a394a5 [fix] Fixed bench_offload.py, BUT performance DEGRAD.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-06 18:40:59 +08:00
535f2037ab [WIP] Before fix bench_offload.py.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-05 21:19:16 +08:00
c7ac39dfbd [refactor] Before add sprae policy.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-05 20:31:33 +08:00
e554d5482b [refactor] Delete unnesscessory test, and refacrtor the offload prefix cache.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-05 19:00:36 +08:00
247c5312d9 [fix] Fixed decode misalign.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-05 18:34:01 +08:00
054aaff403 [fix] Fixed needle test bug.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-05 01:51:38 +08:00
d623043a3c [WIP] FIXED decode and prefill NEEDLE test.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-04 22:47:55 +08:00
e897380127 [test] Added test_align.py and Before change nanovllm attention.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-04 20:55:32 +08:00
24096431ed [refactor] refactor test_align.py.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-04 19:36:56 +08:00
772313db8f [refactor] Refactor the kvcache offload.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-03 22:36:34 +08:00
00ed17c640 [feat] Added debug tools.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-03 20:45:53 +08:00
9b52d25866 [docs] Update CLAUDE.md.