2025 Poster "automated evaluation" Papers
6 papers found
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Yuhui Zhang, Yuchang Su, Yiming Liu et al.
CVPR 2025posterarXiv:2501.03225
21
citations
Beyond the Surface: Enhancing LLM-as-a-Judge Alignment with Human via Internal Representations
Peng Lai, Jianjie Zheng, Sijie Cheng et al.
NEURIPS 2025posterarXiv:2508.03550
2
citations
EditCLIP: Representation Learning for Image Editing
Qian Wang, Aleksandar Cvejic, Abdelrahman Eldesokey et al.
ICCV 2025posterarXiv:2503.20318
MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models
Young-Jun Lee, Byung-Kwan Lee, Jianshu Zhang et al.
ICCV 2025posterarXiv:2510.16641
4
citations
RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics
Jie Zhang, Cezara Petrui, Kristina Nikolić et al.
NEURIPS 2025posterarXiv:2505.12575
10
citations
xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
Qingchen Yu, Zifan Zheng, Shichao Song et al.
ICLR 2025posterarXiv:2405.11874
15
citations