ICLR Poster "benchmark dataset" Papers
8 papers found
BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics
Lukas Rauch, Raphael Schwinger, Moritz Wirth et al.
ICLR 2025posterarXiv:2403.10380
18
citations
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Egor Zverev, Sahar Abdelnabi, Soroush Tabesh et al.
ICLR 2025posterarXiv:2403.06833
45
citations
ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models
Veeramakali Vignesh Manivannan, Yasaman Jafari, Srikar Eranky et al.
ICLR 2025posterarXiv:2410.16701
3
citations
Do Large Language Models Truly Understand Geometric Structures?
Xiaofeng Wang, Yiming Wang, Wenhong Zhu et al.
ICLR 2025posterarXiv:2501.13773
9
citations
MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions
Jian Wu, Linyi Yang, Dongyuan Li et al.
ICLR 2025poster
23
citations
OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?
Junjielong Xu, Qinan Zhang, Zhiqing Zhong et al.
ICLR 2025poster
21
citations
Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos
Mingfei Han, Linjie Yang, Xiaojun Chang et al.
ICLR 2025posterarXiv:2312.10300
46
citations
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
Kai Li, Wendi Sang, Chang Zeng et al.
ICLR 2025posterarXiv:2410.01481
8
citations