2024 "benchmark evaluation" Papers

15 papers found

A Comparative Study of Image Restoration Networks for General Backbone Network Design

Xiangyu Chen, Zheyuan Li, Yuandong Pu et al.

ECCV 2024posterarXiv:2310.11881
53
citations

Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark

Fangjun Li, David C. Hogg, Anthony G. Cohn

AAAI 2024paperarXiv:2401.03991
51
citations

Benchmarking Large Language Models in Retrieval-Augmented Generation

Jiawei Chen, Hongyu Lin, Xianpei Han et al.

AAAI 2024paperarXiv:2309.01431
458
citations

Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling

Denis Blessing, Xiaogang Jia, Johannes Esslinger et al.

ICML 2024poster

CurBench: Curriculum Learning Benchmark

Yuwei Zhou, Zirui Pan, Xin Wang et al.

ICML 2024poster

Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions

Jin Gao, Lei Gan, Yuankai Li et al.

ECCV 2024posterarXiv:2408.01091
4
citations

EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data

Shengjie Wang, Shaohuai Liu, Weirui Ye et al.

ICML 2024spotlight

Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models

Mingrui Wu, Jiayi Ji, Oucheng Huang et al.

ICML 2024poster

InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks

Xueyu Hu, Ziyu Zhao, Shuang Wei et al.

ICML 2024poster

MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation

Qian Huang, Jian Vora, Percy Liang et al.

ICML 2024poster

Position: Towards Implicit Prompt For Text-To-Image Models

Yue Yang, Yuqi Lin, Hong Liu et al.

ICML 2024poster

Premise Order Matters in Reasoning with Large Language Models

Xinyun Chen, Ryan Chi, Xuezhi Wang et al.

ICML 2024poster

RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Lei Shu, Liangchen Luo, Jayakumar Hoskere et al.

AAAI 2024paperarXiv:2305.15685
76
citations

SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models

Xiaoxuan Wang, ziniu hu, Pan Lu et al.

ICML 2024poster

TravelPlanner: A Benchmark for Real-World Planning with Language Agents

Jian Xie, Kai Zhang, Jiangjie Chen et al.

ICML 2024spotlight