zijie-tian
  • Joined on 2026-01-03
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-28 04:05:09 +08:00
39d12a0416 📈 feat: add MemoryObserver for GPU-CPU communication tracking
c16bfcf40f ♻️ refactor: restructure Observer as base class with InferenceObserver
Compare 2 commits »
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-28 00:57:04 +08:00
f3e4611e3b 📝 docs: add XAttention performance analysis documentation
7b5d3b34eb 📈 feat: add NVTX markers to XAttention for profiling
Compare 2 commits »
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-28 00:32:50 +08:00
b760de84c5 feat: add context length and error handling to profile_offload.sh
f81b5ae8a9 feat: enhance profile_offload.sh with policy, block-size parameters
Compare 2 commits »
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-27 22:38:34 +08:00
e874229adc 📝 docs: add comprehensive GPU-only vs Offload benchmark results
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-27 09:24:17 +08:00
4fe7dfb239 🔀 merge: integrate tzj/minference-exp (GPU-only sparse attention)
9177b62d7f feat: add --enforce-eager option to bench.py
3956a30b14 🔧 chore: add --use-v1 flag to bench_vllm.py
59473fa432 🔧 chore: add configurable arguments to bench_vllm.py
4467e1f654 🔧 chore: add --block-size argument to bench_offload.py
Compare 14 commits »
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-27 07:53:04 +08:00
0437311068 feat: add Phase 5 CUDA Graph optimization for chunked prefill
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-27 07:36:36 +08:00
0d31b3f71f 📝 docs: add CPU offload optimization strategies guide
73c9dc46ff feat: add XAttention BSA support to bench_offload.py
924a0d2bfa 🔧 chore: add nsys profiling rule and update gitignore
0619accd1c 📝 docs: add CPU scheduling latency analysis for chunked attention
18bc433f09 perf: improve NVTX profiling with colored ranges and configurable slots
Compare 6 commits »
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-24 04:32:52 +08:00
3100724666 📝 docs: add nsys wrong event order bug investigation
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-24 01:44:49 +08:00
78a44f3536 📝 docs: add GPU memory monitoring rule
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-23 10:35:57 +08:00
7c41032a2e feat: add configurable stride and chunk_size for XAttention BSA
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-23 09:45:36 +08:00
f28b500120 🙈 chore: uncomment planning files in gitignore
be67fa8060 🗑️ chore: remove temporary planning files
4f35526457 🔀 merge: integrate remote changes (exec-plan command, CUDA graph plan)
da5e13e2bb 📝 docs: update XAttention BSA Policy with benchmarks and memory management
dd31033732 🔧 chore: add gpu-monitor agent for memory leak debugging
Compare 14 commits »
zijie-tian pushed to tzj/layer-offload at zijie-tian/nano-vllm 2026-01-22 22:19:56 +08:00
5fb0f67295 [WIP] need refactor.
69b779e252 📝 docs: add layer offload planning notes and task plan
e313dd795a feat: add exec-plan command for automated task plan execution
9f3ee9279e feat: add nanovllm.ops module with XAttention estimation kernels
Compare 4 commits »
zijie-tian created branch tzj/layer-offload in zijie-tian/nano-vllm 2026-01-22 22:19:56 +08:00
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-22 03:15:15 +08:00
47d237bb7e feat: add exec-plan command for automated task plan execution
a5307fb124 📝 docs: add CUDA Graph optimization plan for offload mode decode
Compare 2 commits »
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-22 01:34:12 +08:00
d808970f2f [WIP] Before implement the plan.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-22 01:33:25 +08:00
bc92c1fdb8 feat: add xattn_estimate_chunked for chunked prefill support
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-22 01:00:07 +08:00
2866d4fd88 feat: add chunk attention CUDA graph test for block sparse attention
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-21 21:56:27 +08:00
5d722968ff [docs] Added cuda_graph_guide.md
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-21 03:30:40 +08:00
d21b40f48f [test] Added test_cudagraph_memory.py.
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-21 03:27:41 +08:00
42cf124343 📝 docs: add CUDA Graph memory mechanism guide