Zijie Tian
a6cc703d73
[tests] Added test_niah_standalone.py.
2026-01-12 00:16:37 +08:00
Zijie Tian
5895de0c97
[docs] Added transformers error desp.
2026-01-11 18:48:50 +08:00
Zijie Tian
2771312565
[docs] Add sparse prefill integration plan from int-minference analysis
...
Consolidated analysis from int-minference-1/2/3 branches into a unified
integration plan for MInference, XAttention, and FlexPrefill strategies.
Key design decisions:
- Backward compatible: Keep existing SparsePolicy interface
- Unified BlockMask intermediate representation for new strategies
- XAttention/FlexPrefill use block_sparse_attn_func kernel
- MInference can optionally use block_sparse_attn (Phase 4)
Five-phase implementation plan:
1. BlockMask + block_sparse_attn wrapper
2. XAttention implementation
3. FlexPrefill implementation
4. Optional MInference refactoring
5. Integration and testing
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-10 23:33:09 +08:00
Zijie Tian
e23be2e844
Merge branch 'zijie/add-llama-1': Add multi-model support
...
- Add model registry system for dynamic model loading
- Implement LlamaForCausalLM with Llama3 RoPE scaling
- Register Qwen3ForCausalLM and Qwen2ForCausalLM
- Update ModelRunner to use get_model_class() for dynamic model selection
Tested: needle 32k test PASSED
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-10 21:20:53 +08:00
Zijie Tian
24f5ae5fc3
[claudesquad] update from 'add-llama-1' on 10 Jan 26 21:14 CST
2026-01-10 21:14:32 +08:00
Zijie Tian
067e36f4a2
[claudesquad] update from 'fix-bug-2' on 09 Jan 26 16:10 CST
2026-01-09 16:10:28 +08:00
Zijie Tian
79c4df4a27
[claudesquad] update from 'int-minference-1' on 08 Jan 26 23:42 CST
2026-01-08 23:42:30 +08:00
Zijie Tian
ea4e904de0
[claudesquad] update from 'int-minference-1' on 08 Jan 26 23:22 CST
2026-01-08 23:22:38 +08:00
Zijie Tian
105201b902
[claudesquad] update from 'lw-offload-2' on 08 Jan 26 21:19 CST
2026-01-08 21:19:38 +08:00
Zijie Tian
a8c9f0d837
[claudesquad] update from 'lw-offload-2' on 08 Jan 26 20:53 CST
2026-01-08 20:53:08 +08:00
Zijie Tian
bf4c63c7ec
[docs] Added Sparse Attn.
2025-12-29 19:56:54 +08:00