"benchmark development" Papers
4 papers found
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai, Enxin Song, Yilun Du et al.
ICLR 2025oralarXiv:2410.03051
102
citations
Do as We Do, Not as You Think: the Conformity of Large Language Models
Zhiyuan Weng, Guikun Chen, Wenguan Wang
ICLR 2025posterarXiv:2501.13381
18
citations
How efficient is LLM-generated code? A rigorous & high-standard benchmark
Ruizhong Qiu, Weiliang Zeng, James Ezick et al.
ICLR 2025posterarXiv:2406.06647
43
citations
Offline Multi-Objective Optimization
Ke Xue, Rong-Xi Tan, Xiaobin Huang et al.
ICML 2024poster