"reasoning chain evaluation" Papers
2 papers found
BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception
junyan ye, Dongzhi JIANG, Jun He et al.
NeurIPS 2025posterarXiv:2510.09361
2
citations
CofCA: A STEP-WISE Counterfactual Multi-hop QA benchmark
Jian Wu, Linyi Yang, Zhen Wang et al.
ICLR 2025posterarXiv:2402.11924
14
citations