Commit Graph

  • 4d8ae951c3 [WIP] Before debug plan. Zijie Tian 2026-01-21 00:01:10 +08:00
  • 1ab4676396 ♻️ refactor: consolidate RULER test files and document root cause Zijie Tian 2026-01-20 23:41:17 +08:00
  • 512e1e5401 🔧 chore: add Claude rules for agent result format and multi-GPU debugging Zijie Tian 2026-01-20 23:41:08 +08:00
  • 6180055ed8 📝 docs: add chunked attention solutions guide and update doc index Zijie Tian 2026-01-20 04:48:20 +08:00
  • 4cbd451af7 📝 docs: add BSA interface documentation and cleanup temp files Zijie Tian 2026-01-20 04:27:19 +08:00
  • 3aef6fc3a2 feat: add XAttention Triton operators for sparse attention estimation Zijie Tian 2026-01-20 04:27:07 +08:00
  • 690456dbf9 ♻️ refactor: create ops module and move chunked_attention Zijie Tian 2026-01-20 02:50:14 +08:00
  • e440c45e73 📝 docs: add XAttention algorithm guide based on COMPASS implementation Zijie Tian 2026-01-20 02:50:03 +08:00
  • 07f5220f40 Merge branch 'tzj/minference' of ssh://git.zijie-tian.site:2222/zijie-tian/nano-vllm into tzj/minference Zijie Tian 2026-01-20 02:27:10 +08:00
  • 37aecd4d52 📝 docs: add SparsePolicy implementation guide and update rules Zijie Tian 2026-01-20 02:25:46 +08:00
  • b1f292cf22 Merge branch 'tzj/minference' of ssh://git.zijie-tian.site:2222/zijie-tian/nano-vllm into tzj/minference Zijie Tian 2026-01-20 02:16:39 +08:00
  • 16fbcf9e4c docs: add RULER 32K chunked offload issue documentation Zijie Tian 2026-01-20 02:16:21 +08:00
  • fa7601f4b8 ♻️ refactor: remove cross-layer pipeline and rename compute_chunked_prefill Zijie Tian 2026-01-20 02:10:40 +08:00
  • 6080bf7554 🙈 chore: exclude planning-with-files from git tracking Zijie Tian 2026-01-20 02:06:28 +08:00
  • e5a17c832c 📝 docs: add SparsePolicy architecture documentation Zijie Tian 2026-01-20 01:36:09 +08:00
  • 4593f42ec3 ♻️ refactor: migrate chunked decode attention to SparsePolicy Zijie Tian 2026-01-20 01:32:17 +08:00
  • a36f8569fc [WIP] Before refactor. Zijie Tian 2026-01-20 01:25:46 +08:00
  • d3b41b2f64 🔧 chore: clean up claude-flow configuration Zijie Tian 2026-01-20 00:58:52 +08:00
  • baa4be7e2e ♻️ refactor: migrate chunked prefill attention to SparsePolicy Zijie Tian 2026-01-20 00:58:46 +08:00
  • 6783a45e6f 🚧 wip: update sparse policy refactoring plan to v4 Zijie Tian 2026-01-19 23:23:16 +08:00
  • 16b269d897 🚧 wip: update sparse policy refactoring plan to v4 Zijie Tian 2026-01-19 23:10:49 +08:00
  • b97b0b96a0 [WIP] Before refactor the nanovllm sparse policy. Zijie Tian 2026-01-19 22:34:44 +08:00
  • b5da802dff [WIP] Before integrate the xattn operator. Zijie Tian 2026-01-19 21:19:21 +08:00
  • 9e6fdc0650 [WIP] Before plan execute. Zijie Tian 2026-01-19 03:30:44 +08:00
  • 50520a6c3c [fix] fixed request to request error. Zijie Tian 2026-01-19 00:55:26 +08:00
  • e6e0dc5d7d feat: add comprehensive RULER benchmark testing Zijie Tian 2026-01-18 20:34:06 +08:00
  • 0550a64339 feat: add dynamic port allocation from tzj/vs_offload Zijie Tian 2026-01-18 19:51:56 +08:00
  • b8c00399af chore: sync submodule URL with tzj/minference (use HTTPS) tzj/vs_offload Zijie Tian 2026-01-18 19:32:18 +08:00
  • d9890aa2cd chore: add Block-SparseAttention submodule from tzj/vs_offload Zijie Tian 2026-01-18 19:22:40 +08:00
  • 5a837c8c83 chore: update .gitignore with tzj/vs_offload configuration Zijie Tian 2026-01-18 18:59:17 +08:00
  • d1bbb7efe2 chore: update claude configuration and rules from tzj/vs_offload Zijie Tian 2026-01-18 18:56:49 +08:00
  • 1a78ae74d5 feat: add claude-flow MCP configuration Zijie Tian 2026-01-14 09:18:09 +08:00
  • c254c8c330 chore: add planning-with-files rule configuration Zijie Tian 2026-01-14 10:09:52 +08:00
  • 13586e689b docs: add chunked prefill integration plan Zijie Tian 2026-01-18 18:49:19 +08:00
  • e72725c12b test: add OffloadedTensor unified test suite Zijie Tian 2026-01-18 10:41:40 +08:00
  • cfb188c34a docs: add chunked prefill analysis for ultra-long sequences Zijie Tian 2026-01-16 10:38:02 +08:00
  • 2826a649de docs: add XAttention integration guide Zijie Tian 2026-01-14 10:16:21 +08:00
  • 24baeb6d5a chore: add planning-with-files rule configuration Zijie Tian 2026-01-14 10:09:52 +08:00
  • 57f4e9c6e6 docs: reorganize documentation files Zijie Tian 2026-01-14 10:08:41 +08:00
  • ac1ccbceaa feat: add XAttention sparse policy integration Zijie Tian 2026-01-14 10:04:46 +08:00
  • 029894118d feat: add claude-flow MCP configuration Zijie Tian 2026-01-14 09:18:09 +08:00
  • 8d6fde3b23 docs: add Block-Sparse-Attention library reference Zijie Tian 2026-01-14 08:39:03 +08:00
  • 6a6bd75685 feat: add Block-Sparse-Attention submodule (tzj/minference branch) Zijie Tian 2026-01-14 08:07:07 +08:00
  • 86633004ca 📝 docs: add 64k memory analysis and test configuration updates Zijie Tian 2026-01-14 07:02:09 +08:00
  • c51a640a29 🐛 fix: remove torch.compile from add_rms_forward to avoid recompilation Zijie Tian 2026-01-14 07:02:02 +08:00
  • dce6ad6b74 ♻️ refactor: chunked LayerNorm/QKV/MLP for 64k memory optimization Zijie Tian 2026-01-14 07:01:57 +08:00
  • cf168fd9b9 test: add comprehensive RULER benchmark test suite Zijie Tian 2026-01-14 00:51:30 +08:00
  • 76af506956 [claudesquad] update from 'multi-request-2' on 13 Jan 26 02:01 CST Zijie Tian 2026-01-13 02:01:07 +08:00
  • 49519c7ce7 📝 docs: update offload accuracy issue with independent testing results Zijie Tian 2026-01-12 21:08:35 +08:00
  • 1424e665e7 test: add parallel multi-GPU RULER NIAH test script Zijie Tian 2026-01-12 21:08:27 +08:00
  • 64971c8e8a Merge branch 'zijie/fix-dist-3': Fix distributed port conflict Zijie Tian 2026-01-12 16:20:44 +08:00
  • de6f36bdb2 [docs] Added dist port issue. Zijie Tian 2026-01-12 15:16:39 +08:00
  • 8e0888c20c [docs] Added offload_acc issue. Zijie Tian 2026-01-12 15:05:55 +08:00
  • a6cc703d73 [tests] Added test_niah_standalone.py. Zijie Tian 2026-01-12 00:16:37 +08:00
  • 5895de0c97 [docs] Added transformers error desp. Zijie Tian 2026-01-11 18:48:50 +08:00
  • 2771312565 [docs] Add sparse prefill integration plan from int-minference analysis Zijie Tian 2026-01-10 23:33:09 +08:00
  • de6eae472d [docs] Update CLAUDE.md with multi-model support documentation Zijie Tian 2026-01-10 21:29:39 +08:00
  • e23be2e844 Merge branch 'zijie/add-llama-1': Add multi-model support Zijie Tian 2026-01-10 21:20:53 +08:00
  • 24f5ae5fc3 [claudesquad] update from 'add-llama-1' on 10 Jan 26 21:14 CST Zijie Tian 2026-01-10 21:14:32 +08:00
  • 03a8c033cb [claudesquad] update from 'add-llama-1' on 10 Jan 26 21:03 CST Zijie Tian 2026-01-10 21:03:45 +08:00
  • 9377ff63fe Merge remote-tracking branch 'origin/zijie/fix-bug-2' into tzj/vs_offload Zijie Tian 2026-01-09 16:13:38 +08:00
  • 067e36f4a2 [claudesquad] update from 'fix-bug-2' on 09 Jan 26 16:10 CST Zijie Tian 2026-01-09 16:10:28 +08:00
  • 1425510a2e [claudesquad] update from 'fix-bug-2' on 09 Jan 26 16:05 CST Zijie Tian 2026-01-09 16:05:36 +08:00
  • 335117bfca Merge remote-tracking branch 'origin/zijie/fix-bug-2' into tzj/vs_offload Zijie Tian 2026-01-09 15:21:48 +08:00
  • 5012b11291 [bench] Modify bench_vllm.py Zijie Tian 2026-01-09 15:20:37 +08:00
  • ccf04d3917 [claudesquad] update from 'fix-bug-2' on 09 Jan 26 15:16 CST Zijie Tian 2026-01-09 15:16:55 +08:00
  • 59f8970ed3 [claudesquad] update from 'fix-bug-2' on 09 Jan 26 15:12 CST Zijie Tian 2026-01-09 15:12:42 +08:00
  • 6378cb4c17 Merge remote-tracking branch 'origin/zijie/fix-ga-perf-2' into tzj/vs_offload Zijie Tian 2026-01-09 14:21:00 +08:00
  • 47e3e465f0 [claudesquad] update from 'fix-ga-perf-2' on 09 Jan 26 14:08 CST Zijie Tian 2026-01-09 14:08:12 +08:00
  • aac94c9481 [claude] Added some commands. Zijie Tian 2026-01-09 13:16:23 +08:00
  • 79c4df4a27 [claudesquad] update from 'int-minference-1' on 08 Jan 26 23:42 CST Zijie Tian 2026-01-08 23:42:30 +08:00
  • ea4e904de0 [claudesquad] update from 'int-minference-1' on 08 Jan 26 23:22 CST Zijie Tian 2026-01-08 23:22:38 +08:00
  • 0bfe1984ef [docs] Refine GPU mutex: exclusive for benchmarks, port check for tests Zijie Tian 2026-01-08 21:35:08 +08:00
  • 105201b902 [claudesquad] update from 'lw-offload-2' on 08 Jan 26 21:19 CST Zijie Tian 2026-01-08 21:19:38 +08:00
  • a8c9f0d837 [claudesquad] update from 'lw-offload-2' on 08 Jan 26 20:53 CST Zijie Tian 2026-01-08 20:53:08 +08:00
  • 85bcca3d17 [claudesquad] update from 'int-offload-1' on 08 Jan 26 19:44 CST Zijie Tian 2026-01-08 19:44:29 +08:00
  • b5c0ef3b7a [docs] Replace chunked prefill docs with layer-wise offload strategy Zijie Tian 2026-01-08 05:39:26 +08:00
  • bbbfd1e7da [docs] Simplify multi-instance development with direct PYTHONPATH Zijie Tian 2026-01-08 04:51:55 +08:00
  • c1ddb44e5d Merge branch 'zijie/layer-prefill-1' into tzj/vs_offload Zijie Tian 2026-01-08 03:40:53 +08:00
  • d8a87da1c3 [claudesquad] update from 'layer-prefill-1' on 08 Jan 26 03:36 CST Zijie Tian 2026-01-08 03:36:39 +08:00
  • ecd9ae0271 [WIP] changed to layerwise offload. Zijie Tian 2026-01-08 00:28:27 +08:00
  • 6575099a06 [refactor] Cleanup unused code after perf_opt merge Zijie Tian 2026-01-07 06:25:21 +08:00
  • 8fd25d72d7 Merge perf_opt-1 and perf_opt-2 branches Zijie Tian 2026-01-07 06:03:44 +08:00
  • ccf27d3a74 [claudesquad] update from 'perf_opt-1' on 07 Jan 26 05:58 CST Zijie Tian 2026-01-07 05:58:23 +08:00
  • 0ad86eb449 [claudesquad] update from 'perf_opt-2' on 07 Jan 26 05:58 CST Zijie Tian 2026-01-07 05:58:10 +08:00
  • aa953ecb59 [refactor] Aligned the bench. Zijie Tian 2026-01-07 04:25:06 +08:00
  • 362f5e575f [fix] Fixed .gitignores . Zijie Tian 2026-01-07 03:32:14 +08:00
  • 58a06501c1 Merge branch 'zijie/debug_chunk-2' into tzj/minference Zijie Tian 2026-01-07 03:30:38 +08:00
  • 2a6e0a2c02 [feat] Added Quest Sparsity Policy. Zijie Tian 2026-01-07 03:29:21 +08:00
  • 2fe50bab50 [claudesquad] update from 'debug_chunk-2' on 07 Jan 26 03:27 CST Zijie Tian 2026-01-07 03:27:27 +08:00
  • c99a6f3d3f [WIP] Before add Quest policy. Zijie Tian 2026-01-07 02:32:30 +08:00
  • f240903013 [docs] Add GPU mutex instructions for multi-instance debugging Zijie Tian 2026-01-07 01:42:59 +08:00
  • 0e691f2d85 [WIP] move metadata to GPU. Zijie Tian 2026-01-06 23:32:32 +08:00
  • edb5273e34 [WIP] Added basic test for quest. Zijie Tian 2026-01-06 22:30:31 +08:00
  • 690492e074 [WIP] Before refactor policies. Zijie Tian 2026-01-06 20:47:55 +08:00
  • 7cc8a394a5 [fix] Fixed bench_offload.py, BUT performance DEGRAD. Zijie Tian 2026-01-06 18:46:48 +08:00
  • 535f2037ab [WIP] Before fix bench_offload.py. Zijie Tian 2026-01-06 18:41:08 +08:00
  • c7ac39dfbd [refactor] Before add sprae policy. Zijie Tian 2026-01-05 21:19:24 +08:00
  • e554d5482b [refactor] Delete unnesscessory test, and refacrtor the offload prefix cache. Zijie Tian 2026-01-05 20:31:42 +08:00
  • 247c5312d9 [fix] Fixed decode misalign. Zijie Tian 2026-01-05 19:00:44 +08:00