Match xattn_estimate internal logic by processing Q in chunks: - Reduces peak memory for attn_scores tensor - Enables testing 64K sequences without OOM - All 5 test files pass (3.6K to 64K) Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
8.7 KiB
8.7 KiB