nano-vllm/tests/test_ruler.py at cf168fd9b9d093fe07020945b179b7a321329abf

Files

Zijie Tian cf168fd9b9 ✅ test: add comprehensive RULER benchmark test suite

- Add test_ruler.py supporting all 13 RULER tasks (NIAH, QA, CWE, FWE, VT)
- Implement RULER official evaluation metrics (string_match_all/part)
- Fix max_model_len to 32896 to prevent decode OOM on long inputs
- Add ruler_benchmark_report.md with full test results (92.1% accuracy)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-14 00:51:30 +08:00

13 KiB

Raw Blame History

View Raw

13 KiB Raw Blame History

13 KiB

Raw Blame History