Commit Graph

11 Commits

Author SHA1 Message Date
Zijie Tian
69b779e252 📝 docs: add layer offload planning notes and task plan
Add planning documents for layer-wise offload implementation:
- notes.md: Implementation notes and findings
- task_plan.md: Detailed task breakdown and progress tracking

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 06:04:36 +08:00
Zijie Tian
ac1ccbceaa feat: add XAttention sparse policy integration
Integrate COMPASS XAttention algorithm into nano-vllm's CPU offload
execution path. Uses FlashAttention with native GQA support for
offload mode.

New files:
- nanovllm/kvcache/sparse/utils.py: find_blocks_chunked() utility
- nanovllm/kvcache/sparse/kernels.py: Triton kernels for XAttention
- nanovllm/kvcache/sparse/xattn.py: XAttentionPolicy implementation

Modified:
- nanovllm/config.py: Add XATTN configuration parameters
- nanovllm/engine/model_runner.py: Support XATTN policy
- nanovllm/kvcache/sparse/__init__.py: Register XAttentionPolicy
- tests/test_ruler.py: Add --sparse-policy parameter

Test results (32k ruler):
- NIAH tasks: 12/12 (100%)
- QA/Recall tasks: 11/15 (73%)
- Overall: 23/27 (85%)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-14 10:04:46 +08:00
Zijie Tian
76af506956 [claudesquad] update from 'multi-request-2' on 13 Jan 26 02:01 CST 2026-01-13 02:01:07 +08:00
Zijie Tian
64971c8e8a Merge branch 'zijie/fix-dist-3': Fix distributed port conflict
- Auto port allocation with _find_free_port() in model_runner.py
- Resource management refactor with close() + context manager in llm_engine.py
- Add tests/test_port_conflict.py and tests/run_parallel_niah.sh
- Remove docs/torch_distributed_port_issue.md (issue fixed)
- Ignore tests/data/ directory

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 16:27:25 +08:00
Zijie Tian
1425510a2e [claudesquad] update from 'fix-bug-2' on 09 Jan 26 16:05 CST 2026-01-09 16:05:36 +08:00
Zijie Tian
ccf04d3917 [claudesquad] update from 'fix-bug-2' on 09 Jan 26 15:16 CST 2026-01-09 15:16:55 +08:00
Zijie Tian
59f8970ed3 [claudesquad] update from 'fix-bug-2' on 09 Jan 26 15:12 CST 2026-01-09 15:12:42 +08:00
Zijie Tian
47e3e465f0 [claudesquad] update from 'fix-ga-perf-2' on 09 Jan 26 14:08 CST 2026-01-09 14:08:12 +08:00
Zijie Tian
ea4e904de0 [claudesquad] update from 'int-minference-1' on 08 Jan 26 23:22 CST 2026-01-08 23:22:38 +08:00
Zijie Tian
a8c9f0d837 [claudesquad] update from 'lw-offload-2' on 08 Jan 26 20:53 CST 2026-01-08 20:53:08 +08:00
Zijie Tian
85bcca3d17 [claudesquad] update from 'int-offload-1' on 08 Jan 26 19:44 CST 2026-01-08 19:44:29 +08:00