zijie-tian
  • Joined on 2026-01-03
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-02-05 03:16:47 +08:00
52b12a89e3 📋 docs: add changelog for 2026-02-05
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-02-05 03:14:50 +08:00
d35dd76e09 🗑️ chore: clean up tests directory to essential files only
2b61c5ab57 🗑️ chore: remove test_needle* files
a709551072 🗑️ chore: remove redundant XAttention test files
Compare 3 commits »
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-02-05 02:58:33 +08:00
11a867f6fb 🐛 fix: skip GQA buffer allocation in XAttention offload mode
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-02-05 02:50:17 +08:00
af4da454ba 📊 docs: add XAttention offload profiling analysis for 32K context
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-02-05 02:47:44 +08:00
ef37d4f1a8 🐛 docs: document XAttention offload GQA buffer OOM issue
c8a5ef04c0 📝 docs: add test_ruler.py usage guide and rule
Compare 2 commits »
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-02-05 02:00:56 +08:00
1c36d53570 🙈 chore: add ralph-tui session file to gitignore
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-02-05 02:00:24 +08:00
54fd302fa8 📝 docs: add XAttention density alignment verification results
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-02-05 01:46:09 +08:00
1eb7521994 📝 docs: add XAttention density types documentation
51bd678335 📊 feat: distinguish compute density and communication density in DensityObserver
1ea5afd886 📝 docs: add XAttention offload stream sync fix documentation
829b311c02 🐛 fix: stream synchronization for XAttention estimate kernels in offload mode
dd0472aea8 [plugin] Added ralph-tui setup.
Compare 5 commits »
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-02-02 14:39:51 +08:00
a1c68a733e 📊 docs: add XAttention memory benchmark for 24GB GPUs
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-02-02 14:23:20 +08:00
dc51972777 📝 docs: update density alignment test with Offload mode results
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-02-02 11:23:39 +08:00
232fcf043e 📝 docs: add GPU-only density alignment test results
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-02-02 11:15:04 +08:00
aeed6ccdfb test: add GPU-only density alignment verification test
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-02-02 10:48:26 +08:00
6c55c4d2a3 ♻️ refactor: rewrite select_blocks with 3-stage KV chunking algorithm
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-02-02 10:05:50 +08:00
6e34efd58a 📝 docs: add storage overhead analysis and batch tests for KV chunking
5acd5558d6 feat: add KV chunking support for XAttention softmax kernels
193ef55d18 ♻️ refactor: use Q-chunked processing in xattn alignment test
f173a3f7f5 test: add xattn_estimate vs low-level kernels alignment test
8035e4db3d 📝 docs: add XAttention KV chunking density test results
Compare 6 commits »
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-31 14:48:29 +08:00
2e96d1d97d WIP: Enhance sparse attention with density tracking and block selection improvements
f6ac4ccdde feat: add DensityObserver for XAttention sparse attention density tracking
Compare 2 commits »
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-29 08:37:36 +08:00
4484a1482c [refactor] Refactor the profile_offload.sh
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-28 14:20:10 +08:00
e436ec861f ⚙️ config: update test_ruler.py defaults
45efcf0db1 feat: add --dtype parameter to test_ruler.py
Compare 2 commits »
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-28 13:45:12 +08:00
e09a2a5b10 feat: add Qwen2/2.5 model support
a239bfb40d 📚 docs: add new model integration guide
29e102720b 🐛 fix: support multiple EOS tokens for GLM-4
726e4b58cf feat: add GLM-4-9B-Chat-1M model support
Compare 4 commits »
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-28 10:18:37 +08:00
8d19e61446 ️ perf: replace Triton merge with FlashInfer merge_state
4484ebbb77 📚 docs: add 1M+ context length models reference list
Compare 2 commits »
zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm 2026-01-28 07:09:12 +08:00
2c2383c786 ️ perf: optimize XAttention estimate with hierarchical block sum
f049971f84 test: add hierarchical block sum estimation validation
c90dc196b2 📝 docs: add estimate block_size performance analysis
3da9b8aef2 ️ perf: optimize XAttention estimate phase with K-only loading
a832d127b6 feat: add nsys-profiler agent for kernel performance analysis
Compare 5 commits »