📝 docs: add test_ruler.py usage guide and rule
- Add comprehensive test_ruler.py usage guide with verified commands - Add .claude/rules/test-ruler.md to enforce documentation-first approach - Update CLAUDE.md documentation index Tested commands on RTX 3090 (GPU 4): - 32K/64K offload + XAttn BSA - Multi-dataset, JSON output, quiet mode - GLM-4 model support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -45,6 +45,7 @@ Nano-vLLM is a lightweight vLLM implementation (~1,200 lines) for fast offline L
|
||||
| [`docs/xattn_offload_stream_sync_fix.md`](docs/xattn_offload_stream_sync_fix.md) | 🐛 FIX: XAttention Offload stream 同步 bug,Pass1/Pass2 K 数据不一致,compute_stream 包装 |
|
||||
| [`docs/xattn_density_types.md`](docs/xattn_density_types.md) | 📊 Compute vs Comm density: BSA block (128) vs CPU block (4096) 粒度,聚合效应导致 comm=100% |
|
||||
| [`docs/xattn_density_alignment_verification.md`](docs/xattn_density_alignment_verification.md) | ✅ VERIFIED: GPU-only vs Offload density 对齐验证 (32K 差异 0.37%, 64K 差异 0.09%) |
|
||||
| [`docs/test_ruler_usage_guide.md`](docs/test_ruler_usage_guide.md) | 📖 GUIDE: test_ruler.py 使用指南,RULER benchmark 测试命令,已验证的命令示例 |
|
||||
|
||||
## Rules Index
|
||||
|
||||
@@ -55,6 +56,7 @@ Nano-vLLM is a lightweight vLLM implementation (~1,200 lines) for fast offline L
|
||||
| [`.claude/rules/sparse-policy.md`](.claude/rules/sparse-policy.md) | SparsePolicy implementation requirements |
|
||||
| [`.claude/rules/planning-with-files.md`](.claude/rules/planning-with-files.md) | Planning file management for complex tasks |
|
||||
| [`.claude/rules/gpu-monitor.md`](.claude/rules/gpu-monitor.md) | **GPU memory monitoring**: 必须使用 gpu-monitor agent,禁止手动 nvidia-smi 循环 |
|
||||
| [`.claude/rules/test-ruler.md`](.claude/rules/test-ruler.md) | **test_ruler.py 规则**: 禁止 --help,必须查阅文档,含快速参考和命令模板 |
|
||||
|
||||
## GPU Mutex for Multi-Instance Debugging
|
||||
|
||||
|
||||
Reference in New Issue
Block a user