📝 docs: add test_ruler.py usage guide and rule

- Add comprehensive test_ruler.py usage guide with verified commands - Add .claude/rules/test-ruler.md to enforce documentation-first approach - Update CLAUDE.md documentation index Tested commands on RTX 3090 (GPU 4): - 32K/64K offload + XAttn BSA - Multi-dataset, JSON output, quiet mode - GLM-4 model support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 02:46:44 +08:00
parent 1c36d53570
commit c8a5ef04c0
3 changed files with 430 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -45,6 +45,7 @@ Nano-vLLM is a lightweight vLLM implementation (~1,200 lines) for fast offline L
 | [`docs/xattn_offload_stream_sync_fix.md`](docs/xattn_offload_stream_sync_fix.md) | 🐛 FIX: XAttention Offload stream 同步 bug，Pass1/Pass2 K 数据不一致，compute_stream 包装 |
 | [`docs/xattn_density_types.md`](docs/xattn_density_types.md) | 📊 Compute vs Comm density: BSA block (128) vs CPU block (4096) 粒度，聚合效应导致 comm=100% |
 | [`docs/xattn_density_alignment_verification.md`](docs/xattn_density_alignment_verification.md) | ✅ VERIFIED: GPU-only vs Offload density 对齐验证 (32K 差异 0.37%, 64K 差异 0.09%) |
+| [`docs/test_ruler_usage_guide.md`](docs/test_ruler_usage_guide.md) | 📖 GUIDE: test_ruler.py 使用指南，RULER benchmark 测试命令，已验证的命令示例 |

 ## Rules Index

@@ -55,6 +56,7 @@ Nano-vLLM is a lightweight vLLM implementation (~1,200 lines) for fast offline L
 | [`.claude/rules/sparse-policy.md`](.claude/rules/sparse-policy.md) | SparsePolicy implementation requirements |
 | [`.claude/rules/planning-with-files.md`](.claude/rules/planning-with-files.md) | Planning file management for complex tasks |
 | [`.claude/rules/gpu-monitor.md`](.claude/rules/gpu-monitor.md) | **GPU memory monitoring**: 必须使用 gpu-monitor agent，禁止手动 nvidia-smi 循环 |
+| [`.claude/rules/test-ruler.md`](.claude/rules/test-ruler.md) | **test_ruler.py 规则**: 禁止 --help，必须查阅文档，含快速参考和命令模板 |

 ## GPU Mutex for Multi-Instance Debugging