🧬Language Models

Chain-of-Thought Reasoning

Step-by-step reasoning in language models

100 papers11,398 total citations

Compare with other topics

Mar '24 — Feb '261168 papers

Top Conferences

ICLR: 40 NeurIPS: 27 AAAI: 14 CVPR: 7 ICML: 5 ECCV: 4

Top Papers

#1

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

Pan Lu, Hritik Bansal, Tony Xia et al.

LISA: Reasoning Segmentation via Large Language Model

Xin Lai, Zhuotao Tian, Yukang Chen et al.

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

Chenhao Tan, Robert Ness, Amit Sharma et al.

ICLR 2025arXiv:2305.00050

causal reasoninglarge language modelscausal discoverycounterfactual reasoning+3

390

citations

#4

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Qidong Huang, Xiaoyi Dong, Pan Zhang et al.

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Guowei Xu, Peng Jin, ZiangWu ZiangWu et al.

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Samuel Marks, Can Rager, Eric Michaud et al.

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

yi yang, Xiaoxuan He, Hongkun Pan et al.

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh vahid et al.

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Zayne Sprague, Fangcong Yin, Juan Rodriguez et al.

ICLR 2025arXiv:2409.12183

chain-of-thought promptingsymbolic reasoningmathematical reasoninglarge language models+3

239

citations

#10

Video-R1: Reinforcing Video Reasoning in MLLMs

Kaituo Feng, Kaixiong Gong, Bohao Li et al.

NeurIPS 2025arXiv:2503.21776

rule-based reinforcement learningmultimodal large language modelsvideo reasoningtemporal modeling+3

236

citations

#11

Listen, Think, and Understand

Yuan Gong, Hongyin Luo, Alexander Liu et al.

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Jingyi Zhang, Jiaxing Huang, Huanjin Yao et al.

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

Qingqing Zhao, Yao Lu, Moo Jin Kim et al.

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

Aojun Zhou, Ke Wang, Zimu Lu et al.

Think before you speak: Training Language Models With Pause Tokens

Sachin Goyal, Ziwei Ji, Ankit Singh Rawat et al.

Advancing LLM Reasoning Generalists with Preference Trees

Lifan Yuan, Ganqu Cui, Hanbin Wang et al.

ICLR 2025arXiv:2404.02078

large language modelsreasoning taskspreference learningalignment dataset+4

179

citations

#17

Compositional Chain-of-Thought Prompting for Large Multimodal Models

Chancharik Mitra, Brandon Huang, Trevor Darrell et al.

Can Large Language Models Infer Causation from Correlation?

Zhijing Jin, Jiarui Liu, Zhiheng LYU et al.

Training Language Models to Reason Efficiently

Daman Arora, Andrea Zanette

Hypothesis Search: Inductive Reasoning with Language Models

Ruocheng Wang, Eric Zelikman, Gabriel Poesia et al.

Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources

Xingxuan Li, Ruochen Zhao, Yew Ken Chia et al.

Physics of Language Models: Part 3.2, Knowledge Manipulation

Zeyuan Allen-Zhu, Yuanzhi Li

Linearity of Relation Decoding in Transformer Language Models

Evan Hernandez, Arnab Sen Sharma, Tal Haklay et al.

Retrieval Head Mechanistically Explains Long-Context Factuality

Wenhao Wu, Yizhong Wang, Guangxuan Xiao et al.

ICLR 2025arXiv:2404.15574

retrieval headslong-context language modelsattention mechanismtransformer-based models+4

140

citations

#25

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Jonas Geiping, Sean McLeish, Neel Jain et al.

MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning

Zayne Sprague, Xi Ye, Kaj Bostrom et al.

Large Language Models as Analogical Reasoners

Michihiro Yasunaga, Xinyun Chen, Yujia Li et al.

CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL

Mohammadreza Pourreza, Hailong Li, Ruoxi Sun et al.

ICLR 2025arXiv:2410.01943

text-to-sqllarge language modelsmulti-agent modelingchain-of-thought reasoning+4

116

citations

#29

C3oT: Generating Shorter Chain-of-Thought Without Compromising Effectiveness

Yu Kang, Xianghui Sun, Liangyu Chen et al.

Imagine While Reasoning in Space: Multimodal Visualization-of-Thought

Chengzu Li, Wenshan Wu, Huanyu Zhang et al.

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving

Ming Nie, Renyuan Peng, Chunwei Wang et al.

ECCV 2024arXiv:2312.03661

autonomous drivingvision-language modelsinterpretable reasoningchain-based reasoning+4

112

citations

#32

Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting

Xinyan Guan, Yanjiang Liu, Hongyu Lin et al.

AAAI 2024arXiv:2311.13314

knowledge graph integrationlarge language model hallucinationfactual knowledge retrievalautonomous knowledge verification+2

108

citations

#33

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Han Zhao, Min Zhang, Wei Zhao et al.

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Mingjie Liu, Shizhe Diao, Ximing Lu et al.

NeurIPS 2025arXiv:2505.24864

reinforcement learningreasoning capabilitieskl divergence controlreference policy resetting+4

99

citations

#35

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

Tian Ye, Zicheng Xu, Yuanzhi Li et al.

Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning

Wenkai Yang, Shuming Ma, Yankai Lin et al.

NeurIPS 2025arXiv:2502.18080

test-time computechain of thoughtmathematical reasoningreasoning performance+3

96

citations

#37

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Dongzhi JIANG, Ziyu Guo, Renrui Zhang et al.

NeurIPS 2025arXiv:2505.00703

chain-of-thought reasoningtext-to-image generationreinforcement learningsemantic-level planning+3

91

citations

#38

At Which Training Stage Does Code Data Help LLMs Reasoning?

ma yingwei, Yue Liu, Yue Yu et al.

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Dongzhi Jiang, Renrui Zhang, Ziyu Guo et al.

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Weiyun Wang Weiyun, yiming ren, Haowen Luo et al.

ECCV 2024arXiv:2402.19474

relation comprehensionobject localizationmultimodal large language modelsrelation conversation task+3

86

citations

#41

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

Jiafei Duan, Wilbert Pumacay, Nishanth Kumar et al.

ICLR 2025arXiv:2410.00371

vision-language modelsrobotic manipulationfailure detectionfailure reasoning+4

81

citations

#42

In-Context Pretraining: Language Modeling Beyond Document Boundaries

Weijia Shi, Sewon Min, Maria Lomeli et al.

BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

Zhen Xiang, Fengqing Jiang, Zidi Xiong et al.

Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning

Yiwei Li, Peiwen Yuan, Shaoxiong Feng et al.

Towards Foundation Models for Knowledge Graph Reasoning

Mikhail Galkin, Xinyu Yuan, Hesham Mostafa et al.

KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning

Debjyoti Mondal, Suraj Modi, Subhadarshi Panda et al.

AAAI 2024arXiv:2401.12863

chain-of-thought reasoningmultimodal reasoningknowledge graph integrationlarge language models+3

78

citations

#47

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan et al.

ICLR 2025arXiv:2411.14257

sparse autoencodershallucination mechanismsentity recognitionknowledge awareness+3

77

citations

#48

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

Siyan Zhao, Devaansh Gupta, Qinqing Zheng et al.

NeurIPS 2025arXiv:2504.12216

diffusion large language modelsreinforcement learningnon-autoregressive generationreasoning capabilities+4

75

citations

#49

General-Reasoner: Advancing LLM Reasoning Across All Domains

Xueguang Ma, Qian Liu, Dongfu Jiang et al.

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.

NeurIPS 2025arXiv:2506.01347

reinforcement learningmathematical reasoninglanguage modelspolicy gradients+4

74

citations

#51

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Davide Paglieri, Bartłomiej Cupiał, Samuel Coward et al.

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

Shanshan Zhong, Zhongzhan Huang, Shanghua Gao et al.

Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning

Yiming Huang, Xiao Liu, Yeyun Gong et al.

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

Hritik Bansal, Arian Hosseini, Rishabh Agarwal et al.

Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics

Yaniv Nikankin, Anja Reusch, Aaron Mueller et al.

GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing

Rongyao Fang, Chengqi Duan, Kun Wang et al.

Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons

Yuheng Chen, Pengfei Cao, Yubo Chen et al.

AAAI 2024arXiv:2308.13198

knowledge neuronsmultilingual language modelsfactual knowledge storageintegrated gradients method+4

59

citations

#58

Thinkless: LLM Learns When to Think

Gongfan Fang, Xinyin Ma, Xinchao Wang

Magnushammer: A Transformer-Based Approach to Premise Selection

Maciej Mikuła, Szymon Tworkowski, Szymon Antoniak et al.

Large Language Models Are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales

Taeyoon Kwon, Kai Ong, Dongjin Kang et al.

AAAI 2024arXiv:2312.07399

clinical reasoninglarge language modelsdisease diagnosisprompt-based learning+3

57

citations

#61

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Mingyang Chen, Linzhuang Sun, Tianpeng Li et al.

NeurIPS 2025arXiv:2503.19470

reasoning with searchreinforcement learningmulti-hop question answeringsearch-guided reasoning+3

56

citations

#62

TEILP: Time Prediction over Knowledge Graphs via Logical Reasoning

Siheng Xiong, Yuan Yang, Ali Payani et al.

AAAI 2024arXiv:2312.15816

temporal knowledge graphsevent time predictionlogical reasoningtemporal event knowledge graph+3

54

citations

#63

Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge

Swarnadeep Saha, Xian Li, Marjan Ghazvininejad et al.

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

Chi-Pin Huang, Yueh-Hua Wu, Min-Hung Chen et al.

NeurIPS 2025arXiv:2507.16815

vision-language-action reasoningreinforced visual planningembodied reasoning plansmultimodal instruction interpretation+4

53

citations

#65

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Ziyang Ma, Yinghao Ma, Yanqiao Zhu et al.

NeurIPS 2025arXiv:2505.13032

audio-language modelsmultimodal audio reasoningchain-of-thought rationaleaudio question answering+4

52

citations

#66

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Enshen Zhou, Jingkun An, Cheng Chi et al.

NeurIPS 2025arXiv:2506.04308

spatial referringvision-language modelsdepth encoder integrationsupervised fine-tuning+4

51

citations

#67

Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark

Fangjun Li, David C. Hogg, Anthony G. Cohn

AAAI 2024arXiv:2401.03991

spatial reasoninglarge language modelsbenchmark evaluationtemplate-to-relation mapping+4

51

citations

#68

Discovering and Mitigating Visual Biases through Keyword Explanation

Younghyun Kim, Sangwoo Mo, Minkyu Kim et al.

Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

Tian Ye, Zicheng Xu, Yuanzhi Li et al.

Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference

Aniket Vashishtha, Abbavaram Gowtham Reddy, Abhinav Kumar et al.

ICLR 2025arXiv:2310.15117

causal inferencecausal ordercausal graphspairwise prompting+4

48

citations

#71

GRIT: Teaching MLLMs to Think with Images

Yue Fan, Xuehai He, Diji Yang et al.

SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

Phillip Howard, Avinash Madasu, Tiep Le et al.

What does the Knowledge Neuron Thesis Have to do with Knowledge?

Jingcheng Niu, Andrew Liu, Zining Zhu et al.

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

Yixuan Wu, Yizhou Wang, Shixiang Tang et al.

ECCV 2024arXiv:2403.12488

multimodal large language modelszero-shot object detectionprompting paradigmdetection prompting toolkit+4

47

citations

#75

StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Zhuoqun Li, Xuanang Chen, Haiyang Yu et al.

ICLR 2025arXiv:2410.08815

retrieval-augmented generationknowledge-intensive reasoninginformation structurizationlarge language models+3

46

citations

#76

NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions

Weizhe Yuan, Jane Yu, Song Jiang et al.

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

Mehul Damani, Idan Shenfeld, Andi Peng et al.

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

Andy (DiJia) Su, Hanlin Zhu, Yingchen Xu et al.

Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing

Junfei Wu, Jian Guan, Kaituo Feng et al.

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Xingyu Fu, Minqian Liu, Zhengyuan Yang et al.

Atom of Thoughts for Markov LLM Test-Time Scaling

Fengwei Teng, Quan Shi, Zhaoyang Yu et al.

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Yibin Wang, li zhimin, Yuhang Zang et al.

RATT: A Thought Structure for Coherent and Correct LLM Reasoning

Jinghan Zhang, Xiting Wang, Weijieying Ren et al.

MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation

Zhongshen Zeng, Pengguang Chen, Shu Liu et al.

Reasoning Gym: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Zafir Stojanovski, Oliver Stanley, Joe Sharratt et al.

Combining Induction and Transduction for Abstract Reasoning

Wen-Ding Li, Keya Hu, Carter Larsen et al.

Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

Zhen Zhang, Xuehai He, Weixiang Yan et al.

Causal Prompting: Debiasing Large Language Model Prompting Based on Front-Door Adjustment

Congzhi Zhang, Linhai Zhang, Jialong Wu et al.

ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning

Ziyu Wan, Yunxiang Li, Xiaoyu Wen et al.

NeurIPS 2025arXiv:2503.09501

meta-thinkingmulti-agent reinforcement learninglarge language modelsreasoning processes+4

36

citations

#90

Think Only When You Need with Large Hybrid-Reasoning Models

Lingjie Jiang, Xun Wu, Shaohan Huang et al.

NeurIPS 2025arXiv:2505.14631

large reasoning modelshybrid reasoning modelsadaptive thinking selectionreinforcement learning optimization+4

35

citations

#91

Making Large Language Models Better Planners with Reasoning-Decision Alignment

Zhijian Huang, Tao Tang, Shaoxiang Chen et al.

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Ziyao Shangguan, Chuhan Li, Yuxuan Ding et al.

$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

Mintong Kang, Bo Li

ICLR 2025arXiv:2407.05557

llm safety guardrailsprobabilistic graphical modelsmarkov logic networksprobabilistic circuits+4

34

citations

#94

Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models

Liqi He, Zuchao Li, Xiantao Cai et al.

AAAI 2024arXiv:2312.08762

chain-of-thought reasoningmulti-modal reasoninglatent space learningdiffusion processes+4

34

citations

#95

CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning

Ji Qi, Ming Ding, Weihan Wang et al.

ICLR 2025arXiv:2402.04236

vision-language modelschain-of-manipulations reasoningvisual reasoningmulti-turn multi-image architecture+4

33

citations

#96

Preference Optimization for Reasoning with Pseudo Feedback

Fangkai Jiao, Geyang Guo, Xingxing Zhang et al.

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Xiangyu Zhao, Peiyuan Zhang, Kexian Tang et al.

Reasoning Models Better Express Their Confidence

Dongkeun Yoon, Seungone Kim, Sohee Yang et al.

NeurIPS 2025arXiv:2505.14489

confidence calibrationchain-of-thought reasoninglarge language modelsslow thinking behaviors+2

32

citations

#99

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Barys Liskavets, Maxim Ushakov, Shuvendu Roy et al.

Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation

Derong Xu, Xinhang Li, Ziheng Zhang et al.

AAAI 2025

31

citations

Chain-of-Thought Reasoning

Top Conferences

Related Topics (Language Models)

Top Papers

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

LISA: Reasoning Segmentation via Large Language Model

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Video-R1: Reinforcing Video Reasoning in MLLMs

Listen, Think, and Understand

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

Think before you speak: Training Language Models With Pause Tokens

Advancing LLM Reasoning Generalists with Preference Trees

Compositional Chain-of-Thought Prompting for Large Multimodal Models

Can Large Language Models Infer Causation from Correlation?

Training Language Models to Reason Efficiently

Hypothesis Search: Inductive Reasoning with Language Models

Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources

Physics of Language Models: Part 3.2, Knowledge Manipulation

Linearity of Relation Decoding in Transformer Language Models

Retrieval Head Mechanistically Explains Long-Context Factuality

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning

Large Language Models as Analogical Reasoners

CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL

C3oT: Generating Shorter Chain-of-Thought Without Compromising Effectiveness

Imagine While Reasoning in Space: Multimodal Visualization-of-Thought

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving

Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

At Which Training Stage Does Code Data Help LLMs Reasoning?

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

In-Context Pretraining: Language Modeling Beyond Document Boundaries

BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning

Towards Foundation Models for Knowledge Graph Reasoning

KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

General-Reasoner: Advancing LLM Reasoning Across All Domains

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics

GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing

Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons

Thinkless: LLM Learns When to Think

Magnushammer: A Transformer-Based Approach to Premise Selection

Large Language Models Are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

TEILP: Time Prediction over Knowledge Graphs via Logical Reasoning

Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark

Discovering and Mitigating Visual Biases through Keyword Explanation

Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference

GRIT: Teaching MLLMs to Think with Images

SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

What does the Knowledge Neuron Thesis Have to do with Knowledge?

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions