✨ feat: add configurable stride and chunk_size for XAttention BSA

- Add sparse_chunk_size config option (default: 16384) - Pass stride, chunk_size, use_triton through factory function - Add --sparse-stride CLI option to test_ruler.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 10:37:04 +08:00
parent f28b500120
commit 7c41032a2e
4 changed files with 10 additions and 0 deletions
--- a/nanovllm/kvcache/sparse/init.py
+++ b/nanovllm/kvcache/sparse/init.py
@@ -61,6 +61,9 @@ def create_sparse_policy(policy_type: SparsePolicyType, **kwargs) -> SparsePolic
            block_size=kwargs.get("block_size", 128),
            samples_per_chunk=kwargs.get("samples_per_chunk", 128),
            threshold=kwargs.get("threshold", 0.9),
+            stride=kwargs.get("stride", 8),
+            chunk_size=kwargs.get("chunk_size", 16384),
+            use_triton=kwargs.get("use_triton", True),
        )

    else: