2025 "mathematical reasoning" Papers

51 papers found • Page 1 of 2

Accelerating RL for LLM Reasoning with Optimal Advantage Regression

Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.

NEURIPS 2025posterarXiv:2505.20686
12
citations

Advancing LLM Reasoning Generalists with Preference Trees

Lifan Yuan, Ganqu Cui, Hanbin Wang et al.

ICLR 2025posterarXiv:2404.02078
179
citations

Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages

Zui Chen, Tianqiao Liu, Tongqing et al.

ICLR 2025posterarXiv:2501.14002
11
citations

Analyzing the Power of Chain of Thought through Memorization Capabilities

Lijia Yu, Xiao-Shan Gao, Lijun Zhang

NEURIPS 2025posterarXiv:2511.01190

Angles Don’t Lie: Unlocking Training‑Efficient RL Through the Model’s Own Signals

Qinsi Wang, Jinghan Ke, Hancheng Ye et al.

NEURIPS 2025spotlight

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux et al.

ICLR 2025posterarXiv:2410.18252
39
citations

Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning

Jiayu Wang, Yifei Ming, Zixuan Ke et al.

NEURIPS 2025posterarXiv:2506.04723
1
citations

Can LLMs Solve Longer Math Word Problems Better?

Xin Xu, Tong Xiao, Zitong Chao et al.

ICLR 2025posterarXiv:2405.14804
25
citations

CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning

Yuanheng Fang, Guoqing Chao, Wenqiang Lei et al.

AAAI 2025paperarXiv:2501.12226
2
citations

Chain of Execution Supervision Promotes General Reasoning in Large Language Models

Nuo Chen, Zehua Li, Keqin Bao et al.

NEURIPS 2025posterarXiv:2510.23629

ChatVLA-2: Vision-Language-Action Model with Open-World Reasoning

Zhongyi Zhou, Yichen Zhu, Xiaoyu Liu et al.

NEURIPS 2025poster

CLAWS:Creativity detection for LLM-generated solutions using Attention Window of Sections

Keuntae Kim, Eunhye Jeong, Sehyeon Lee et al.

NEURIPS 2025poster

Conformal Language Model Reasoning with Coherent Factuality

Maxon Rubin-Toles, Maya Gambhir, Keshav Ramji et al.

ICLR 2025posterarXiv:2505.17126
5
citations

Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning

Jingjing Jiang, Chao Ma, Xurui Song et al.

ICCV 2025highlightarXiv:2507.07424
7
citations

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

Siyan Zhao, Devaansh Gupta, Qinqing Zheng et al.

NEURIPS 2025spotlightarXiv:2504.12216
75
citations

DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization

Gang Li, Ming Lin, Tomer Galanti et al.

NEURIPS 2025posterarXiv:2505.12366
10
citations

Don’t Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models

Sohyun An, Ruochen Wang, Tianyi Zhou et al.

NEURIPS 2025poster
11
citations

Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling

Xinglin Wang, Yiwei Li, Shaoxiong Feng et al.

NEURIPS 2025posterarXiv:2506.15707
5
citations

GRIP: A Graph-Based Reasoning Instruction Producer

Jiankang Wang, Jianjun Xu, Xiaorui Wang et al.

NEURIPS 2025posterarXiv:2412.08864
2
citations

Know What You Don't Know: Uncertainty Calibration of Process Reward Models

Young-Jin Park, Kristjan Greenewald, Kaveh Alimohammadi et al.

NEURIPS 2025posterarXiv:2506.09338
4
citations

Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models

Lucas Bandarkar, Benjamin Muller, Pritish Yuvraj et al.

ICLR 2025posterarXiv:2410.01335
13
citations

LeanAgent: Lifelong Learning for Formal Theorem Proving

Adarsh Kumarappan, Mohit Tiwari, Peiyang Song et al.

ICLR 2025posterarXiv:2410.06209
12
citations

Learning to Focus: Causal Attention Distillation via Gradient‐Guided Token Pruning

Yiju Guo, Wenkai Yang, Zexu Sun et al.

NEURIPS 2025posterarXiv:2506.07851
3
citations

Lookahead Routing for Large Language Models

Canbin Huang, Tianyuan Shi, Yuhua Zhu et al.

NEURIPS 2025posterarXiv:2510.19506

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

Zimu Lu, Aojun Zhou, Ke Wang et al.

ICLR 2025posterarXiv:2410.08196
27
citations

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

Yicheng Xiao, Lin Song, Yukang Chen et al.

NEURIPS 2025posterarXiv:2505.13031
18
citations

Mixture of Inputs: Text Generation Beyond Discrete Token Sampling

Yufan Zhuang, Liyuan Liu, Chandan Singh et al.

NEURIPS 2025poster

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solver

Zhenting Qi, Mingyuan MA, Jiahang Xu et al.

ICLR 2025posterarXiv:2408.06195
127
citations

OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization

Yiyou Sun, Shawn Hu, Georgia Zhou et al.

NEURIPS 2025posterarXiv:2506.18880
28
citations

On Extending Direct Preference Optimization to Accommodate Ties

Jinghong Chen, Guangyu Yang, Weizhe Lin et al.

NEURIPS 2025posterarXiv:2409.17431
5
citations

OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling

Zhicheng YANG, Yiwei Wang, Yinya Huang et al.

ICLR 2025posterarXiv:2407.09887
27
citations

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Jiarui Yao, Yifan Hao, Hanning Zhang et al.

NEURIPS 2025posterarXiv:2505.02391
11
citations

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

Tian Ye, Zicheng Xu, Yuanzhi Li et al.

ICLR 2025posterarXiv:2407.20311
98
citations

Preference Optimization for Reasoning with Pseudo Feedback

Fangkai Jiao, Geyang Guo, Xingxing Zhang et al.

ICLR 2025posterarXiv:2411.16345
34
citations

Progress or Regress? Self-Improvement Reversal in Post-training

Ting Wu, Xuefeng Li, Pengfei Liu

ICLR 2025posterarXiv:2407.05013
19
citations

Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning

Zenan Li, Zhaoyu Li, Wen Tang et al.

ICLR 2025posterarXiv:2502.13834
14
citations

RaSA: Rank-Sharing Low-Rank Adaptation

Zhiwei He, Zhaopeng Tu, Xing Wang et al.

ICLR 2025posterarXiv:2503.12576
4
citations

RAST: Reasoning Activation in LLMs via Small-model Transfer

Siru Ouyang, Xinyu Zhu, Zilin Xiao et al.

NEURIPS 2025posterarXiv:2506.15710
1
citations

Reasoning Planning for Language Models

Ngoc Bao Nguyen, Trung Hieu Nguyen, Ruifeng She et al.

NEURIPS 2025spotlightarXiv:2511.00521

ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning

Ziyu Wan, Yunxiang Li, Xiaoyu Wen et al.

NEURIPS 2025posterarXiv:2503.09501
36
citations

Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization

Qingyang Zhang, Haitao Wu, Changqing Zhang et al.

NEURIPS 2025spotlightarXiv:2504.05812
70
citations

SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

Jiaqi Chen, Bang Zhang, Ruotian Ma et al.

NEURIPS 2025posterarXiv:2504.19162
21
citations

SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

Ling Yang, Zhaochen Yu, Tianjun Zhang et al.

ICLR 2025posterarXiv:2410.09008
12
citations

Teaching Language Models to Reason with Tools

Chengpeng Li, Zhengyang Tang, Ziniu Li et al.

NEURIPS 2025posterarXiv:2510.20342
2
citations

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.

NEURIPS 2025posterarXiv:2506.01347
74
citations

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Zayne Sprague, Fangcong Yin, Juan Rodriguez et al.

ICLR 2025posterarXiv:2409.12183
239
citations

Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning

Wenkai Yang, Shuming Ma, Yankai Lin et al.

NEURIPS 2025posterarXiv:2502.18080
96
citations

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards

Xiaoyuan Liu, Tian Liang, Zhiwei He et al.

NEURIPS 2025posterarXiv:2505.13445
16
citations

UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models

Xin Xu, Jiaxin ZHANG, Tianhao Chen et al.

ICLR 2025posterarXiv:2501.13766
13
citations

VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models

JIACHENG RUAN, Wenzhen Yuan, Xian Gao et al.

ICCV 2025posterarXiv:2503.07478
15
citations
PreviousNext