ICML Poster "benchmark evaluation" Papers
8 papers found
Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling
Denis Blessing, Xiaogang Jia, Johannes Esslinger et al.
ICML 2024poster
CurBench: Curriculum Learning Benchmark
Yuwei Zhou, Zirui Pan, Xin Wang et al.
ICML 2024poster
Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models
Mingrui Wu, Jiayi Ji, Oucheng Huang et al.
ICML 2024poster
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks
Xueyu Hu, Ziyu Zhao, Shuang Wei et al.
ICML 2024poster
MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation
Qian Huang, Jian Vora, Percy Liang et al.
ICML 2024poster
Position: Towards Implicit Prompt For Text-To-Image Models
Yue Yang, Yuqi Lin, Hong Liu et al.
ICML 2024poster
Premise Order Matters in Reasoning with Large Language Models
Xinyun Chen, Ryan Chi, Xuezhi Wang et al.
ICML 2024poster
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Xiaoxuan Wang, ziniu hu, Pan Lu et al.
ICML 2024poster