2025 "reasoning tasks" Papers
21 papers found
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.
Advancing LLM Reasoning Generalists with Preference Trees
Lifan Yuan, Ganqu Cui, Hanbin Wang et al.
Analyzing the Power of Chain of Thought through Memorization Capabilities
Lijia Yu, Xiao-Shan Gao, Lijun Zhang
AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Wei Fu, Jiaxuan Gao, Xujie Shen et al.
Bag of Tricks for Inference-time Computation of LLM Reasoning
Fan LIU, Wen-Shuo Chao, Naiqiang Tan et al.
Balancing Act: Diversity and Consistency in Large Language Model Ensembles
Ahmed Abdulaal, Chen Jin, Nina Montaña-Brown et al.
Benchmarking Agentic Workflow Generation
Shuofei Qiao, Runnan Fang, Zhisong Qiu et al.
Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
Jiayu Wang, Yifei Ming, Zixuan Ke et al.
C3PO: Optimized Large Language Model Cascades with Probabilistic Cost Constraints for Reasoning
Antonios Valkanas, Soumyasundar Pal, Pavel Rumiantsev et al.
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
Liliang Ren, Congcong Chen, Haoran Xu et al.
Enhancing Language Model Agents using Diversity of Thoughts
Vijay Chandra Lingam, Behrooz Tehrani, sujay sanghavi et al.
Fast attention mechanisms: a tale of parallelism
Jingwen Liu, Hantao Yu, Clayton Sanford et al.
Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data
Xinyi Wang, Antonis Antoniades, Yanai Elazar et al.
InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for Model Fusion
Yuanyi Wang, Zhaoyi Yan, Yiming Zhang et al.
LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits
Duy Nguyen, Archiki Prasad, Elias Stengel-Eskin et al.
Multipole Attention for Efficient Long Context Reasoning
Coleman Hooper, Sebastian Zhao, Luca Manolache et al.
On the self-verification limitations of large language models on reasoning and planning tasks
Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati
PID-controlled Langevin Dynamics for Faster Sampling on Generative Models
Hongyi Chen, Jianhai Shu, Jingtao Ding et al.
The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning
Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.
ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning
Shulin Huang, Linyi Yang, Yan Song et al.
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo, Kaiyan Zhang, Li Sheng et al.