2024 "benchmark evaluation" Papers
15 papers found
A Comparative Study of Image Restoration Networks for General Backbone Network Design
Xiangyu Chen, Zheyuan Li, Yuandong Pu et al.
ECCV 2024posterarXiv:2310.11881
53
citations
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
Fangjun Li, David C. Hogg, Anthony G. Cohn
AAAI 2024paperarXiv:2401.03991
51
citations
Benchmarking Large Language Models in Retrieval-Augmented Generation
Jiawei Chen, Hongyu Lin, Xianpei Han et al.
AAAI 2024paperarXiv:2309.01431
458
citations
Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling
Denis Blessing, Xiaogang Jia, Johannes Esslinger et al.
ICML 2024poster
CurBench: Curriculum Learning Benchmark
Yuwei Zhou, Zirui Pan, Xin Wang et al.
ICML 2024poster
Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions
Jin Gao, Lei Gan, Yuankai Li et al.
ECCV 2024posterarXiv:2408.01091
4
citations
EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data
Shengjie Wang, Shaohuai Liu, Weirui Ye et al.
ICML 2024spotlight
Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models
Mingrui Wu, Jiayi Ji, Oucheng Huang et al.
ICML 2024poster
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks
Xueyu Hu, Ziyu Zhao, Shuang Wei et al.
ICML 2024poster
MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation
Qian Huang, Jian Vora, Percy Liang et al.
ICML 2024poster
Position: Towards Implicit Prompt For Text-To-Image Models
Yue Yang, Yuqi Lin, Hong Liu et al.
ICML 2024poster
Premise Order Matters in Reasoning with Large Language Models
Xinyun Chen, Ryan Chi, Xuezhi Wang et al.
ICML 2024poster
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting
Lei Shu, Liangchen Luo, Jayakumar Hoskere et al.
AAAI 2024paperarXiv:2305.15685
76
citations
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Xiaoxuan Wang, ziniu hu, Pan Lu et al.
ICML 2024poster
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Jian Xie, Kai Zhang, Jiangjie Chen et al.
ICML 2024spotlight