This website requires JavaScript.
Explore
Help
Register
Sign In
zijie-tian
0 Followers
·
0 Following
Joined on
2026-01-03
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
1
Projects
Packages
Public Activity
Starred Repositories
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-28 04:05:09 +08:00
39d12a0416
📈
feat: add MemoryObserver for GPU-CPU communication tracking
c16bfcf40f
♻️
refactor: restructure Observer as base class with InferenceObserver
Compare 2 commits »
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-28 00:57:04 +08:00
f3e4611e3b
📝
docs: add XAttention performance analysis documentation
7b5d3b34eb
📈
feat: add NVTX markers to XAttention for profiling
Compare 2 commits »
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-28 00:32:50 +08:00
b760de84c5
✨
feat: add context length and error handling to profile_offload.sh
f81b5ae8a9
✨
feat: enhance profile_offload.sh with policy, block-size parameters
Compare 2 commits »
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-27 22:38:34 +08:00
e874229adc
📝
docs: add comprehensive GPU-only vs Offload benchmark results
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-27 09:24:17 +08:00
4fe7dfb239
🔀
merge: integrate tzj/minference-exp (GPU-only sparse attention)
9177b62d7f
✨
feat: add --enforce-eager option to bench.py
3956a30b14
🔧
chore: add --use-v1 flag to bench_vllm.py
59473fa432
🔧
chore: add configurable arguments to bench_vllm.py
4467e1f654
🔧
chore: add --block-size argument to bench_offload.py
Compare 14 commits »
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-27 07:53:04 +08:00
0437311068
⚡
feat: add Phase 5 CUDA Graph optimization for chunked prefill
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-27 07:36:36 +08:00
0d31b3f71f
📝
docs: add CPU offload optimization strategies guide
73c9dc46ff
✨
feat: add XAttention BSA support to bench_offload.py
924a0d2bfa
🔧
chore: add nsys profiling rule and update gitignore
0619accd1c
📝
docs: add CPU scheduling latency analysis for chunked attention
18bc433f09
⚡
perf: improve NVTX profiling with colored ranges and configurable slots
Compare 6 commits »
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-24 04:32:52 +08:00
3100724666
📝
docs: add nsys wrong event order bug investigation
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-24 01:44:49 +08:00
78a44f3536
📝
docs: add GPU memory monitoring rule
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-23 10:35:57 +08:00
7c41032a2e
✨
feat: add configurable stride and chunk_size for XAttention BSA
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-23 09:45:36 +08:00
f28b500120
🙈
chore: uncomment planning files in gitignore
be67fa8060
🗑️
chore: remove temporary planning files
4f35526457
🔀
merge: integrate remote changes (exec-plan command, CUDA graph plan)
da5e13e2bb
📝
docs: update XAttention BSA Policy with benchmarks and memory management
dd31033732
🔧
chore: add gpu-monitor agent for memory leak debugging
Compare 14 commits »
zijie-tian
pushed to
tzj/layer-offload
at
zijie-tian/nano-vllm
2026-01-22 22:19:56 +08:00
5fb0f67295
[WIP] need refactor.
69b779e252
📝
docs: add layer offload planning notes and task plan
e313dd795a
✨
feat: add exec-plan command for automated task plan execution
9f3ee9279e
✨
feat: add nanovllm.ops module with XAttention estimation kernels
Compare 4 commits »
zijie-tian
created branch
tzj/layer-offload
in
zijie-tian/nano-vllm
2026-01-22 22:19:56 +08:00
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-22 03:15:15 +08:00
47d237bb7e
✨
feat: add exec-plan command for automated task plan execution
a5307fb124
📝
docs: add CUDA Graph optimization plan for offload mode decode
Compare 2 commits »
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-22 01:34:12 +08:00
d808970f2f
[WIP] Before implement the plan.
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-22 01:33:25 +08:00
bc92c1fdb8
feat: add xattn_estimate_chunked for chunked prefill support
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-22 01:00:07 +08:00
2866d4fd88
✨
feat: add chunk attention CUDA graph test for block sparse attention
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-21 21:56:27 +08:00
5d722968ff
[docs] Added cuda_graph_guide.md
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-21 03:30:40 +08:00
d21b40f48f
[test] Added test_cudagraph_memory.py.
zijie-tian
pushed to
tzj/minference
at
zijie-tian/nano-vllm
2026-01-21 03:27:41 +08:00
42cf124343
📝
docs: add CUDA Graph memory mechanism guide
First
Previous
1
2
3
4
5
Next
Last