zijie-tian

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-02-05 03:16:47 +08:00

52b12a89e3 📋 docs: add changelog for 2026-02-05

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-02-05 03:14:50 +08:00

d35dd76e09 🗑️ chore: clean up tests directory to essential files only

2b61c5ab57 🗑️ chore: remove test_needle* files

a709551072 🗑️ chore: remove redundant XAttention test files

Compare 3 commits »

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-02-05 02:58:33 +08:00

11a867f6fb 🐛 fix: skip GQA buffer allocation in XAttention offload mode

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-02-05 02:50:17 +08:00

af4da454ba 📊 docs: add XAttention offload profiling analysis for 32K context

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-02-05 02:47:44 +08:00

ef37d4f1a8 🐛 docs: document XAttention offload GQA buffer OOM issue

c8a5ef04c0 📝 docs: add test_ruler.py usage guide and rule

Compare 2 commits »

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-02-05 02:00:56 +08:00

1c36d53570 🙈 chore: add ralph-tui session file to gitignore

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-02-05 02:00:24 +08:00

54fd302fa8 📝 docs: add XAttention density alignment verification results

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-02-05 01:46:09 +08:00

1eb7521994 📝 docs: add XAttention density types documentation

51bd678335 📊 feat: distinguish compute density and communication density in DensityObserver

1ea5afd886 📝 docs: add XAttention offload stream sync fix documentation

829b311c02 🐛 fix: stream synchronization for XAttention estimate kernels in offload mode

dd0472aea8 [plugin] Added ralph-tui setup.

Compare 5 commits »

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-02-02 14:39:51 +08:00

a1c68a733e 📊 docs: add XAttention memory benchmark for 24GB GPUs

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-02-02 14:23:20 +08:00

dc51972777 📝 docs: update density alignment test with Offload mode results

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-02-02 11:23:39 +08:00

232fcf043e 📝 docs: add GPU-only density alignment test results

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-02-02 11:15:04 +08:00

aeed6ccdfb ✅ test: add GPU-only density alignment verification test

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-02-02 10:48:26 +08:00

6c55c4d2a3 ♻️ refactor: rewrite select_blocks with 3-stage KV chunking algorithm

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-02-02 10:05:50 +08:00

6e34efd58a 📝 docs: add storage overhead analysis and batch tests for KV chunking

5acd5558d6 feat: add KV chunking support for XAttention softmax kernels

193ef55d18 ♻️ refactor: use Q-chunked processing in xattn alignment test

f173a3f7f5 ✅ test: add xattn_estimate vs low-level kernels alignment test

8035e4db3d 📝 docs: add XAttention KV chunking density test results

Compare 6 commits »

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-01-31 14:48:29 +08:00

2e96d1d97d WIP: Enhance sparse attention with density tracking and block selection improvements

f6ac4ccdde ✨ feat: add DensityObserver for XAttention sparse attention density tracking

Compare 2 commits »

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-01-29 08:37:36 +08:00

4484a1482c [refactor] Refactor the profile_offload.sh

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-01-28 14:20:10 +08:00

e436ec861f ⚙️ config: update test_ruler.py defaults

45efcf0db1 ✨ feat: add --dtype parameter to test_ruler.py

Compare 2 commits »

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-01-28 13:45:12 +08:00

e09a2a5b10 ✨ feat: add Qwen2/2.5 model support

a239bfb40d 📚 docs: add new model integration guide

29e102720b 🐛 fix: support multiple EOS tokens for GLM-4

726e4b58cf ✨ feat: add GLM-4-9B-Chat-1M model support

Compare 4 commits »

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-01-28 10:18:37 +08:00

8d19e61446 ⚡️ perf: replace Triton merge with FlashInfer merge_state

4484ebbb77 📚 docs: add 1M+ context length models reference list

Compare 2 commits »

zijie-tian pushed to tzj/minference at zijie-tian/nano-vllm

2026-01-28 07:09:12 +08:00

2c2383c786 ⚡️ perf: optimize XAttention estimate with hierarchical block sum

f049971f84 ✅ test: add hierarchical block sum estimation validation

c90dc196b2 📝 docs: add estimate block_size performance analysis

3da9b8aef2 ⚡️ perf: optimize XAttention estimate phase with K-only loading

a832d127b6 ✨ feat: add nsys-profiler agent for kernel performance analysis

Compare 5 commits »