ICLR "benchmark construction" Papers
3 papers found
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
Cheng Yang, Chufan Shi, Yaxin Liu et al.
ICLR 2025posterarXiv:2406.09961
65
citations
JudgeBench: A Benchmark for Evaluating LLM-Based Judges
Sijun Tan, Siyuan Zhuang, Kyle Montgomery et al.
ICLR 2025posterarXiv:2410.12784
150
citations
SysBench: Can LLMs Follow System Message?
Yanzhao Qin, Tao Zhang, Tao Zhang et al.
ICLR 2025poster
5
citations