- Add --policy parameter for sparse attention policy selection (full/xattn) - Add --block-size parameter (default 4096) for KV cache block size - Add --gpu-util parameter for GPU memory utilization control - Improve output filename format: <policy>_<gpuonly|offload>_blk<size>_<timestamp> - Map user-friendly policy names to internal enum (xattn -> XATTN_BSA) Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
5.1 KiB
Executable File
5.1 KiB
Executable File