🧬Language Models

Chain-of-Thought Reasoning

Step-by-step reasoning in language models

100 papers11,398 total citations
Compare with other topics
Mar '24 Feb '261168 papers
Also includes: chain-of-thought reasoning, chain of thought, cot, reasoning chains, step-by-step reasoning

Top Papers

#1

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

Pan Lu, Hritik Bansal, Tony Xia et al.

ICLR 2024
1,171
citations
#2

LISA: Reasoning Segmentation via Large Language Model

Xin Lai, Zhuotao Tian, Yukang Chen et al.

CVPR 2024
721
citations
#3

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

Chenhao Tan, Robert Ness, Amit Sharma et al.

ICLR 2025arXiv:2305.00050
causal reasoninglarge language modelscausal discoverycounterfactual reasoning+3
390
citations
#4

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Qidong Huang, Xiaoyi Dong, Pan Zhang et al.

CVPR 2024
365
citations
#5

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Guowei Xu, Peng Jin, ZiangWu ZiangWu et al.

ICCV 2025
338
citations
#6

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Samuel Marks, Can Rager, Eric Michaud et al.

ICLR 2025
252
citations
#7

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

yi yang, Xiaoxuan He, Hongkun Pan et al.

ICCV 2025
247
citations
#8

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh vahid et al.

NeurIPS 2025
242
citations
#9

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Zayne Sprague, Fangcong Yin, Juan Rodriguez et al.

ICLR 2025arXiv:2409.12183
chain-of-thought promptingsymbolic reasoningmathematical reasoninglarge language models+3
239
citations
#10

Video-R1: Reinforcing Video Reasoning in MLLMs

Kaituo Feng, Kaixiong Gong, Bohao Li et al.

NeurIPS 2025arXiv:2503.21776
rule-based reinforcement learningmultimodal large language modelsvideo reasoningtemporal modeling+3
236
citations
#11

Listen, Think, and Understand

Yuan Gong, Hongyin Luo, Alexander Liu et al.

ICLR 2024
221
citations
#12

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Jingyi Zhang, Jiaxing Huang, Huanjin Yao et al.

ICCV 2025
206
citations
#13

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

Qingqing Zhao, Yao Lu, Moo Jin Kim et al.

CVPR 2025
203
citations
#14

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

Aojun Zhou, Ke Wang, Zimu Lu et al.

ICLR 2024
196
citations
#15

Think before you speak: Training Language Models With Pause Tokens

Sachin Goyal, Ziwei Ji, Ankit Singh Rawat et al.

ICLR 2024
187
citations
#16

Advancing LLM Reasoning Generalists with Preference Trees

Lifan Yuan, Ganqu Cui, Hanbin Wang et al.

ICLR 2025arXiv:2404.02078
large language modelsreasoning taskspreference learningalignment dataset+4
179
citations
#17

Compositional Chain-of-Thought Prompting for Large Multimodal Models

Chancharik Mitra, Brandon Huang, Trevor Darrell et al.

CVPR 2024
167
citations
#18

Can Large Language Models Infer Causation from Correlation?

Zhijing Jin, Jiarui Liu, Zhiheng LYU et al.

ICLR 2024
166
citations
#19

Training Language Models to Reason Efficiently

Daman Arora, Andrea Zanette

NeurIPS 2025
155
citations
#20

Hypothesis Search: Inductive Reasoning with Language Models

Ruocheng Wang, Eric Zelikman, Gabriel Poesia et al.

ICLR 2024
151
citations
#21

Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources

Xingxuan Li, Ruochen Zhao, Yew Ken Chia et al.

ICLR 2024
145
citations
#22

Physics of Language Models: Part 3.2, Knowledge Manipulation

Zeyuan Allen-Zhu, Yuanzhi Li

ICLR 2025
142
citations
#23

Linearity of Relation Decoding in Transformer Language Models

Evan Hernandez, Arnab Sen Sharma, Tal Haklay et al.

ICLR 2024
140
citations
#24

Retrieval Head Mechanistically Explains Long-Context Factuality

Wenhao Wu, Yizhong Wang, Guangxuan Xiao et al.

ICLR 2025arXiv:2404.15574
retrieval headslong-context language modelsattention mechanismtransformer-based models+4
140
citations
#25

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Jonas Geiping, Sean McLeish, Neel Jain et al.

NeurIPS 2025
134
citations
#26

MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning

Zayne Sprague, Xi Ye, Kaj Bostrom et al.

ICLR 2024
131
citations
#27

Large Language Models as Analogical Reasoners

Michihiro Yasunaga, Xinyun Chen, Yujia Li et al.

ICLR 2024
131
citations
#28

CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL

Mohammadreza Pourreza, Hailong Li, Ruoxi Sun et al.

ICLR 2025arXiv:2410.01943
text-to-sqllarge language modelsmulti-agent modelingchain-of-thought reasoning+4
116
citations
#29

C3oT: Generating Shorter Chain-of-Thought Without Compromising Effectiveness

Yu Kang, Xianghui Sun, Liangyu Chen et al.

AAAI 2025
115
citations
#30

Imagine While Reasoning in Space: Multimodal Visualization-of-Thought

Chengzu Li, Wenshan Wu, Huanyu Zhang et al.

ICML 2025
115
citations
#31

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving

Ming Nie, Renyuan Peng, Chunwei Wang et al.

ECCV 2024arXiv:2312.03661
autonomous drivingvision-language modelsinterpretable reasoningchain-based reasoning+4
112
citations
#32

Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting

Xinyan Guan, Yanjiang Liu, Hongyu Lin et al.

AAAI 2024arXiv:2311.13314
knowledge graph integrationlarge language model hallucinationfactual knowledge retrievalautonomous knowledge verification+2
108
citations
#33

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Han Zhao, Min Zhang, Wei Zhao et al.

AAAI 2025
106
citations
#34

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Mingjie Liu, Shizhe Diao, Ximing Lu et al.

NeurIPS 2025arXiv:2505.24864
reinforcement learningreasoning capabilitieskl divergence controlreference policy resetting+4
99
citations
#35

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

Tian Ye, Zicheng Xu, Yuanzhi Li et al.

ICLR 2025
98
citations
#36

Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning

Wenkai Yang, Shuming Ma, Yankai Lin et al.

NeurIPS 2025arXiv:2502.18080
test-time computechain of thoughtmathematical reasoningreasoning performance+3
96
citations
#37

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Dongzhi JIANG, Ziyu Guo, Renrui Zhang et al.

NeurIPS 2025arXiv:2505.00703
chain-of-thought reasoningtext-to-image generationreinforcement learningsemantic-level planning+3
91
citations
#38

At Which Training Stage Does Code Data Help LLMs Reasoning?

ma yingwei, Yue Liu, Yue Yu et al.

ICLR 2024
90
citations
#39

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Dongzhi Jiang, Renrui Zhang, Ziyu Guo et al.

ICML 2025
88
citations
#40

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Weiyun Wang Weiyun, yiming ren, Haowen Luo et al.

ECCV 2024arXiv:2402.19474
relation comprehensionobject localizationmultimodal large language modelsrelation conversation task+3
86
citations
#41

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

Jiafei Duan, Wilbert Pumacay, Nishanth Kumar et al.

ICLR 2025arXiv:2410.00371
vision-language modelsrobotic manipulationfailure detectionfailure reasoning+4
81
citations
#42

In-Context Pretraining: Language Modeling Beyond Document Boundaries

Weijia Shi, Sewon Min, Maria Lomeli et al.

ICLR 2024
81
citations
#43

BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

Zhen Xiang, Fengqing Jiang, Zidi Xiong et al.

ICLR 2024
80
citations
#44

Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning

Yiwei Li, Peiwen Yuan, Shaoxiong Feng et al.

ICLR 2024
78
citations
#45

Towards Foundation Models for Knowledge Graph Reasoning

Mikhail Galkin, Xinyu Yuan, Hesham Mostafa et al.

ICLR 2024
78
citations
#46

KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning

Debjyoti Mondal, Suraj Modi, Subhadarshi Panda et al.

AAAI 2024arXiv:2401.12863
chain-of-thought reasoningmultimodal reasoningknowledge graph integrationlarge language models+3
78
citations
#47

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan et al.

ICLR 2025arXiv:2411.14257
sparse autoencodershallucination mechanismsentity recognitionknowledge awareness+3
77
citations
#48

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

Siyan Zhao, Devaansh Gupta, Qinqing Zheng et al.

NeurIPS 2025arXiv:2504.12216
diffusion large language modelsreinforcement learningnon-autoregressive generationreasoning capabilities+4
75
citations
#49

General-Reasoner: Advancing LLM Reasoning Across All Domains

Xueguang Ma, Qian Liu, Dongfu Jiang et al.

NeurIPS 2025
74
citations
#50

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.

NeurIPS 2025arXiv:2506.01347
reinforcement learningmathematical reasoninglanguage modelspolicy gradients+4
74
citations
#51

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Davide Paglieri, Bartłomiej Cupiał, Samuel Coward et al.

ICLR 2025
70
citations
#52

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

Shanshan Zhong, Zhongzhan Huang, Shanghua Gao et al.

CVPR 2024
65
citations
#53

Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning

Yiming Huang, Xiao Liu, Yeyun Gong et al.

AAAI 2025
63
citations
#54

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

Hritik Bansal, Arian Hosseini, Rishabh Agarwal et al.

ICLR 2025
63
citations
#55

Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics

Yaniv Nikankin, Anja Reusch, Aaron Mueller et al.

ICLR 2025
63
citations
#56

GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing

Rongyao Fang, Chengqi Duan, Kun Wang et al.

NeurIPS 2025
60
citations
#57

Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons

Yuheng Chen, Pengfei Cao, Yubo Chen et al.

AAAI 2024arXiv:2308.13198
knowledge neuronsmultilingual language modelsfactual knowledge storageintegrated gradients method+4
59
citations
#58

Thinkless: LLM Learns When to Think

Gongfan Fang, Xinyin Ma, Xinchao Wang

NeurIPS 2025
58
citations
#59

Magnushammer: A Transformer-Based Approach to Premise Selection

Maciej Mikuła, Szymon Tworkowski, Szymon Antoniak et al.

ICLR 2024
57
citations
#60

Large Language Models Are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales

Taeyoon Kwon, Kai Ong, Dongjin Kang et al.

AAAI 2024arXiv:2312.07399
clinical reasoninglarge language modelsdisease diagnosisprompt-based learning+3
57
citations
#61

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Mingyang Chen, Linzhuang Sun, Tianpeng Li et al.

NeurIPS 2025arXiv:2503.19470
reasoning with searchreinforcement learningmulti-hop question answeringsearch-guided reasoning+3
56
citations
#62

TEILP: Time Prediction over Knowledge Graphs via Logical Reasoning

Siheng Xiong, Yuan Yang, Ali Payani et al.

AAAI 2024arXiv:2312.15816
temporal knowledge graphsevent time predictionlogical reasoningtemporal event knowledge graph+3
54
citations
#63

Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge

Swarnadeep Saha, Xian Li, Marjan Ghazvininejad et al.

ICML 2025
53
citations
#64

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

Chi-Pin Huang, Yueh-Hua Wu, Min-Hung Chen et al.

NeurIPS 2025arXiv:2507.16815
vision-language-action reasoningreinforced visual planningembodied reasoning plansmultimodal instruction interpretation+4
53
citations
#65

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Ziyang Ma, Yinghao Ma, Yanqiao Zhu et al.

NeurIPS 2025arXiv:2505.13032
audio-language modelsmultimodal audio reasoningchain-of-thought rationaleaudio question answering+4
52
citations
#66

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Enshen Zhou, Jingkun An, Cheng Chi et al.

NeurIPS 2025arXiv:2506.04308
spatial referringvision-language modelsdepth encoder integrationsupervised fine-tuning+4
51
citations
#67

Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark

Fangjun Li, David C. Hogg, Anthony G. Cohn

AAAI 2024arXiv:2401.03991
spatial reasoninglarge language modelsbenchmark evaluationtemplate-to-relation mapping+4
51
citations
#68

Discovering and Mitigating Visual Biases through Keyword Explanation

Younghyun Kim, Sangwoo Mo, Minkyu Kim et al.

CVPR 2024
50
citations
#69

Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

Tian Ye, Zicheng Xu, Yuanzhi Li et al.

ICLR 2025
48
citations
#70

Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference

Aniket Vashishtha, Abbavaram Gowtham Reddy, Abhinav Kumar et al.

ICLR 2025arXiv:2310.15117
causal inferencecausal ordercausal graphspairwise prompting+4
48
citations
#71

GRIT: Teaching MLLMs to Think with Images

Yue Fan, Xuehai He, Diji Yang et al.

NeurIPS 2025
48
citations
#72

SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

Phillip Howard, Avinash Madasu, Tiep Le et al.

CVPR 2024
48
citations
#73

What does the Knowledge Neuron Thesis Have to do with Knowledge?

Jingcheng Niu, Andrew Liu, Zining Zhu et al.

ICLR 2024
47
citations
#74

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

Yixuan Wu, Yizhou Wang, Shixiang Tang et al.

ECCV 2024arXiv:2403.12488
multimodal large language modelszero-shot object detectionprompting paradigmdetection prompting toolkit+4
47
citations
#75

StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Zhuoqun Li, Xuanang Chen, Haiyang Yu et al.

ICLR 2025arXiv:2410.08815
retrieval-augmented generationknowledge-intensive reasoninginformation structurizationlarge language models+3
46
citations
#76

NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions

Weizhe Yuan, Jane Yu, Song Jiang et al.

NeurIPS 2025
46
citations
#77

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

Mehul Damani, Idan Shenfeld, Andi Peng et al.

ICLR 2025
45
citations
#78

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

Andy (DiJia) Su, Hanlin Zhu, Yingchen Xu et al.

ICML 2025
45
citations
#79

Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing

Junfei Wu, Jian Guan, Kaituo Feng et al.

NeurIPS 2025
45
citations
#80

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Xingyu Fu, Minqian Liu, Zhengyuan Yang et al.

ICML 2025
44
citations
#81

Atom of Thoughts for Markov LLM Test-Time Scaling

Fengwei Teng, Quan Shi, Zhaoyang Yu et al.

NeurIPS 2025
44
citations
#82

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Yibin Wang, li zhimin, Yuhang Zang et al.

NeurIPS 2025
42
citations
#83

RATT: A Thought Structure for Coherent and Correct LLM Reasoning

Jinghan Zhang, Xiting Wang, Weijieying Ren et al.

AAAI 2025
39
citations
#84

MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation

Zhongshen Zeng, Pengguang Chen, Shu Liu et al.

ICLR 2025
39
citations
#85

Reasoning Gym: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Zafir Stojanovski, Oliver Stanley, Joe Sharratt et al.

NeurIPS 2025
39
citations
#86

Combining Induction and Transduction for Abstract Reasoning

Wen-Ding Li, Keya Hu, Carter Larsen et al.

ICLR 2025
38
citations
#87

Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

Zhen Zhang, Xuehai He, Weixiang Yan et al.

NeurIPS 2025
37
citations
#88

Causal Prompting: Debiasing Large Language Model Prompting Based on Front-Door Adjustment

Congzhi Zhang, Linhai Zhang, Jialong Wu et al.

AAAI 2025
36
citations
#89

ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning

Ziyu Wan, Yunxiang Li, Xiaoyu Wen et al.

NeurIPS 2025arXiv:2503.09501
meta-thinkingmulti-agent reinforcement learninglarge language modelsreasoning processes+4
36
citations
#90

Think Only When You Need with Large Hybrid-Reasoning Models

Lingjie Jiang, Xun Wu, Shaohan Huang et al.

NeurIPS 2025arXiv:2505.14631
large reasoning modelshybrid reasoning modelsadaptive thinking selectionreinforcement learning optimization+4
35
citations
#91

Making Large Language Models Better Planners with Reasoning-Decision Alignment

Zhijian Huang, Tao Tang, Shaoxiang Chen et al.

ECCV 2024
35
citations
#92

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Ziyao Shangguan, Chuhan Li, Yuxuan Ding et al.

ICLR 2025
35
citations
#93

$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

Mintong Kang, Bo Li

ICLR 2025arXiv:2407.05557
llm safety guardrailsprobabilistic graphical modelsmarkov logic networksprobabilistic circuits+4
34
citations
#94

Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models

Liqi He, Zuchao Li, Xiantao Cai et al.

AAAI 2024arXiv:2312.08762
chain-of-thought reasoningmulti-modal reasoninglatent space learningdiffusion processes+4
34
citations
#95

CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning

Ji Qi, Ming Ding, Weihan Wang et al.

ICLR 2025arXiv:2402.04236
vision-language modelschain-of-manipulations reasoningvisual reasoningmulti-turn multi-image architecture+4
33
citations
#96

Preference Optimization for Reasoning with Pseudo Feedback

Fangkai Jiao, Geyang Guo, Xingxing Zhang et al.

ICLR 2025
33
citations
#97

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Xiangyu Zhao, Peiyuan Zhang, Kexian Tang et al.

NeurIPS 2025
32
citations
#98

Reasoning Models Better Express Their Confidence

Dongkeun Yoon, Seungone Kim, Sohee Yang et al.

NeurIPS 2025arXiv:2505.14489
confidence calibrationchain-of-thought reasoninglarge language modelsslow thinking behaviors+2
32
citations
#99

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Barys Liskavets, Maxim Ushakov, Shuvendu Roy et al.

AAAI 2025
31
citations
#100

Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation

Derong Xu, Xinhang Li, Ziheng Zhang et al.

AAAI 2025
31
citations