Files
nano-vllm/tests/test_ruler.py
Zijie Tian cf168fd9b9 test: add comprehensive RULER benchmark test suite
- Add test_ruler.py supporting all 13 RULER tasks (NIAH, QA, CWE, FWE, VT)
- Implement RULER official evaluation metrics (string_match_all/part)
- Fix max_model_len to 32896 to prevent decode OOM on long inputs
- Add ruler_benchmark_report.md with full test results (92.1% accuracy)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 00:51:30 +08:00

13 KiB