Add test to verify XAttention density calculation in GPU-only mode
matches independent xattn_estimate calls.
Changes:
- Add tests/test_gpuonly_density_alignment.py: loads saved Q/K from
xattn_bsa.py, calls xattn_estimate independently, compares results
- Enhance debug save in xattn_bsa.py: now saves Q, K tensors and
xattn_estimate parameters for external verification
- Set _DEBUG_SAVE_MASK = False by default
Usage:
1. Set _DEBUG_SAVE_MASK = True in xattn_bsa.py
2. Run GPU-only inference with XAttention (e.g., test_ruler.py)
3. Run tests/test_gpuonly_density_alignment.py to verify alignment
Verified on 4k/8k/16k/32k/64k contexts - all pass with exact match.
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>