NeurIPS Poster "benchmark evaluation" Papers
13 papers found
A Technical Report on “Erasing the Invisible”: The 2024 NeurIPS Competition on Stress Testing Image Watermarks
Mucong Ding, Bang An, Tahseen Rabbani et al.
NeurIPS 2025poster
C-SEO Bench: Does Conversational SEO Work?
Haritz Puerto, Martin Gubri, Tommaso Green et al.
NeurIPS 2025posterarXiv:2506.11097
2
citations
DGCBench: A Deep Graph Clustering Benchmark
Benyu Wu, Yue Liu, Qiaoyu Tan et al.
NeurIPS 2025poster
Is Artificial Intelligence Generated Image Detection a Solved Problem?
Ziqiang Li, Jiazhen Yan, Ziwen He et al.
NeurIPS 2025posterarXiv:2505.12335
15
citations
LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents
Rui Li, Zixuan Hu, Wenxi Qu et al.
NeurIPS 2025posterarXiv:2505.22634
2
citations
Massive Sound Embedding Benchmark (MSEB)
Georg Heigold, Ehsan Variani, Tom Bagby et al.
NeurIPS 2025poster
MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks
Yinghao Zhu, Ziyi He, Haoran Hu et al.
NeurIPS 2025posterarXiv:2505.12371
13
citations
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Tianhao Peng, Haochen Wang, Yuanxing Zhang et al.
NeurIPS 2025posterarXiv:2511.07250
2
citations
OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps
Bingnan Li, Chen-Yu Wang, Haiyang Xu et al.
NeurIPS 2025posterarXiv:2509.19282
1
citations
PAC Bench: Do Foundation Models Understand Prerequisites for Executing Manipulation Policies?
Atharva Gundawar, Som Sagar, Ransalu Senanayake
NeurIPS 2025posterarXiv:2506.23725
3
citations
TCM-Ladder: A Benchmark for Multimodal Question Answering on Traditional Chinese Medicine
Jiacheng Xie, Yang Yu, Ziyang Zhang et al.
NeurIPS 2025posterarXiv:2505.24063
2
citations
This Time is Different: An Observability Perspective on Time Series Foundation Models
Ben Cohen, Emaad Khwaja, Youssef Doubli et al.
NeurIPS 2025posterarXiv:2505.14766
11
citations
WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios
Eun Chang, Zhuangqun Huang, Yiwei Liao et al.
NeurIPS 2025posterarXiv:2511.22154