Files
nano-vllm/nanovllm/engine/model_runner.py
Zijie Tian c1ddb44e5d Merge branch 'zijie/layer-prefill-1' into tzj/vs_offload
Adds MInference sparse attention support:
- New MInference sparse policy implementation
- A-shape, vertical-slash, and block-sparse patterns
- Updated bench.py with sparse attention options
- test_minference_gpu.py validation test

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 03:40:53 +08:00

38 KiB