Document OOM issue when using XAttention BSA + CPU offload
with large models (GLM-4-9B) on 24GB GPUs.
Issue: 8GB allocation for k_expanded buffer fails due to
using num_heads instead of num_kv_heads in GQA models.
Root cause analysis and proposed fix included.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>