2025 Poster "benchmark design" Papers
5 papers found
Commit0: Library Generation from Scratch
Wenting Zhao, Nan Jiang, Celine Lee et al.
ICLR 2025posterarXiv:2412.01769
18
citations
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
Shihan Dou, Ming Zhang, Chenhao Huang et al.
NEURIPS 2025posterarXiv:2506.02672
4
citations
Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation
Yong Liu, Song-Li Wu, Sule Bai et al.
ICCV 2025posterarXiv:2506.16058
2
citations
The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements
Bingchen Zhao, Despoina Magka, Minqi Jiang et al.
NEURIPS 2025posterarXiv:2506.22419
2
citations
WorldModelBench: Judging Video Generation Models As World Models
Dacheng Li, Yunhao Fang, Yukang Chen et al.
NEURIPS 2025posterarXiv:2502.20694
31
citations