2025 "benchmark design" Papers
4 papers found
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Clemencia Siro, Guy Gur-Ari, Gaurav Mishra et al.
ICLR 2025oralarXiv:2206.04615
2192
citations
Commit0: Library Generation from Scratch
Wenting Zhao, Nan Jiang, Celine Lee et al.
ICLR 2025posterarXiv:2412.01769
18
citations
Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation
Yong Liu, Song-Li Wu, Sule Bai et al.
ICCV 2025posterarXiv:2506.16058
2
citations
WorldModelBench: Judging Video Generation Models As World Models
Dacheng Li, Yunhao Fang, Yukang Chen et al.
NeurIPS 2025posterarXiv:2502.20694
31
citations