2025 Poster "reasoning chain evaluation" Papers
3 papers found
BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception
junyan ye, Dongzhi JIANG, Jun He et al.
NEURIPS 2025posterarXiv:2510.09361
2
citations
CofCA: A STEP-WISE Counterfactual Multi-hop QA benchmark
Jian Wu, Linyi Yang, Zhen Wang et al.
ICLR 2025posterarXiv:2402.11924
14
citations
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
Jiashuo Yu, Yue Wu, Meng Chu et al.
ICCV 2025posterarXiv:2506.10857
9
citations