2025 "mathematical reasoning" Papers
51 papers found • Page 1 of 2
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.
Advancing LLM Reasoning Generalists with Preference Trees
Lifan Yuan, Ganqu Cui, Hanbin Wang et al.
Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages
Zui Chen, Tianqiao Liu, Tongqing et al.
Analyzing the Power of Chain of Thought through Memorization Capabilities
Lijia Yu, Xiao-Shan Gao, Lijun Zhang
Angles Don’t Lie: Unlocking Training‑Efficient RL Through the Model’s Own Signals
Qinsi Wang, Jinghan Ke, Hancheng Ye et al.
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux et al.
Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
Jiayu Wang, Yifei Ming, Zixuan Ke et al.
Can LLMs Solve Longer Math Word Problems Better?
Xin Xu, Tong Xiao, Zitong Chao et al.
CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning
Yuanheng Fang, Guoqing Chao, Wenqiang Lei et al.
Chain of Execution Supervision Promotes General Reasoning in Large Language Models
Nuo Chen, Zehua Li, Keqin Bao et al.
ChatVLA-2: Vision-Language-Action Model with Open-World Reasoning
Zhongyi Zhou, Yichen Zhu, Xiaoyu Liu et al.
CLAWS:Creativity detection for LLM-generated solutions using Attention Window of Sections
Keuntae Kim, Eunhye Jeong, Sehyeon Lee et al.
Conformal Language Model Reasoning with Coherent Factuality
Maxon Rubin-Toles, Maya Gambhir, Keshav Ramji et al.
Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning
Jingjing Jiang, Chao Ma, Xurui Song et al.
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
Siyan Zhao, Devaansh Gupta, Qinqing Zheng et al.
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li, Ming Lin, Tomer Galanti et al.
Don’t Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models
Sohyun An, Ruochen Wang, Tianyi Zhou et al.
Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling
Xinglin Wang, Yiwei Li, Shaoxiong Feng et al.
GRIP: A Graph-Based Reasoning Instruction Producer
Jiankang Wang, Jianjun Xu, Xiaorui Wang et al.
Know What You Don't Know: Uncertainty Calibration of Process Reward Models
Young-Jin Park, Kristjan Greenewald, Kaveh Alimohammadi et al.
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Lucas Bandarkar, Benjamin Muller, Pritish Yuvraj et al.
LeanAgent: Lifelong Learning for Formal Theorem Proving
Adarsh Kumarappan, Mohit Tiwari, Peiyang Song et al.
Learning to Focus: Causal Attention Distillation via Gradient‐Guided Token Pruning
Yiju Guo, Wenkai Yang, Zexu Sun et al.
Lookahead Routing for Large Language Models
Canbin Huang, Tianyuan Shi, Yuhua Zhu et al.
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
Zimu Lu, Aojun Zhou, Ke Wang et al.
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao, Lin Song, Yukang Chen et al.
Mixture of Inputs: Text Generation Beyond Discrete Token Sampling
Yufan Zhuang, Liyuan Liu, Chandan Singh et al.
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solver
Zhenting Qi, Mingyuan MA, Jiahang Xu et al.
OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization
Yiyou Sun, Shawn Hu, Georgia Zhou et al.
On Extending Direct Preference Optimization to Accommodate Ties
Jinghong Chen, Guangyu Yang, Weizhe Lin et al.
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling
Zhicheng YANG, Yiwei Wang, Yinya Huang et al.
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Jiarui Yao, Yifan Hao, Hanning Zhang et al.
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Tian Ye, Zicheng Xu, Yuanzhi Li et al.
Preference Optimization for Reasoning with Pseudo Feedback
Fangkai Jiao, Geyang Guo, Xingxing Zhang et al.
Progress or Regress? Self-Improvement Reversal in Post-training
Ting Wu, Xuefeng Li, Pengfei Liu
Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning
Zenan Li, Zhaoyu Li, Wen Tang et al.
RaSA: Rank-Sharing Low-Rank Adaptation
Zhiwei He, Zhaopeng Tu, Xing Wang et al.
RAST: Reasoning Activation in LLMs via Small-model Transfer
Siru Ouyang, Xinyu Zhu, Zilin Xiao et al.
Reasoning Planning for Language Models
Ngoc Bao Nguyen, Trung Hieu Nguyen, Ruifeng She et al.
ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning
Ziyu Wan, Yunxiang Li, Xiaoyu Wen et al.
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization
Qingyang Zhang, Haitao Wu, Changqing Zhang et al.
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen, Bang Zhang, Ruotian Ma et al.
SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
Ling Yang, Zhaochen Yu, Tianjun Zhang et al.
Teaching Language Models to Reason with Tools
Chengpeng Li, Zhengyang Tang, Ziniu Li et al.
The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning
Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Zayne Sprague, Fangcong Yin, Juan Rodriguez et al.
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Wenkai Yang, Shuming Ma, Yankai Lin et al.
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
Xiaoyuan Liu, Tian Liang, Zhiwei He et al.
UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models
Xin Xu, Jiaxin ZHANG, Tianhao Chen et al.
VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models
JIACHENG RUAN, Wenzhen Yuan, Xian Gao et al.