2025 Spotlight "benchmark evaluation" Papers
2 papers found
AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench
Edan Toledo, Karen Hambardzumyan, Martin Josifoski et al.
NeurIPS 2025spotlightarXiv:2507.02554
15
citations
THUNDER: Tile-level Histopathology image UNDERstanding benchmark
Pierre Marza, Leo Fillioux, Sofiène Boutaj et al.
NeurIPS 2025spotlightarXiv:2507.07860
3
citations