Spotlight "benchmark evaluation" Papers
4 papers found
AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench
Edan Toledo, Karen Hambardzumyan, Martin Josifoski et al.
NeurIPS 2025spotlightarXiv:2507.02554
15
citations
THUNDER: Tile-level Histopathology image UNDERstanding benchmark
Pierre Marza, Leo Fillioux, Sofiène Boutaj et al.
NeurIPS 2025spotlightarXiv:2507.07860
3
citations
EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data
Shengjie Wang, Shaohuai Liu, Weirui Ye et al.
ICML 2024spotlight
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Jian Xie, Kai Zhang, Jiangjie Chen et al.
ICML 2024spotlight