2025 "reinforcement learning" Papers
97 papers found • Page 1 of 2
$\texttt{G1}$: Teaching LLMs to Reason on Graphs with Reinforcement Learning
Xiaojun Guo, Ang Li, Yifei Wang et al.
A Causal Lens for Learning Long-term Fair Policies
Jacob Lear, Lu Zhang
AdaptiveAE: An Adaptive Exposure Strategy for HDR Capturing in Dynamic Scenes
Tianyi Xu, Fan Zhang, Boxin Shi et al.
Adding Conditional Control to Diffusion Models with Reinforcement Learning
Yulai Zhao, Masatoshi Uehara, Gabriele Scalia et al.
A Differential and Pointwise Control Approach to Reinforcement Learning
Minh Nguyen, Chandrajit Bajaj
A Generalized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications
Zhenyu Tao, Wei Xu, Xiaohu You
ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition
Daolang Huang, Xinyi Wen, Ayush Bharti et al.
AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws
Oren Neumann, Claudius Gros
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
Haoran Xu, Shuozhe Li, Harshit Sikchi et al.
Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning
Chongyi Zheng, Jens Tuyls, Joanne Peng et al.
Checklists Are Better Than Reward Models For Aligning Language Models
Vijay Viswanathan, Yanchao Sun, Xiang Kong et al.
Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics
Runzhe Wu, Ayush Sekhari, Akshay Krishnamurthy et al.
Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation
Feichen Gan, Lu Youcun, Yingying Zhang et al.
Continual Knowledge Adaptation for Reinforcement Learning
Jinwu Hu, ZiHao Lian, Zhiquan Wen et al.
CURE: Co-Evolving Coders and Unit Testers via Reinforcement Learning
Yinjie Wang, Ling Yang, Ye Tian et al.
Cypher-RI: Reinforcement Learning for Integrating Schema Selection into Cypher Generation
Hanchen Su, Xuyuan Li, Yan Zhou et al.
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
Siyan Zhao, Devaansh Gupta, Qinqing Zheng et al.
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Huajian Xin, Z.Z. Ren, Junxiao Song et al.
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.
Discrete Codebook World Models for Continuous Control
Aidan Scannell, Mohammadreza Nakhaeinezhadfard, Kalle Kujanpää et al.
Dynamic Configuration for Cutting Plane Separators via Reinforcement Learning on Incremental Graph
Mingxuan Ye, Jie Wang, Fangzhou et al.
Dynamic Contrastive Skill Learning with State-Transition Based Skill Clustering and Dynamic Length Adjustment
Jinwoo Choi, Seung-Woo Seo
Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games
Runyu Lu, Peng Zhang, Ruochuan Shi et al.
EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution
Zhebei Shen, Qifan Yu, Juncheng Li et al.
From Kolmogorov to Cauchy: Shallow XNet Surpasses KANs
Xin Li, Xiaotao Zheng, Zhihong Xia
GeoExplorer: Active Geo-localization with Curiosity-Driven Exploration
Li Mi, Manon Béchaz, Zeming Chen et al.
Globally Optimal Policy Gradient Algorithms for Reinforcement Learning with PID Control Policies
Vipul Sharma, Wesley Suttle, S Sivaranjani
GoalLadder: Incremental Goal Discovery with Vision-Language Models
Alexey Zakharov, Shimon Whiteson
GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining
Chunyu Wei, Wenji Hu, Xingjia Hao et al.
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Tong Wei, Yijun Yang, Junliang Xing et al.
HCRMP: An LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving
Zhiwen Chen, Hanming Deng, Zhuoren Li et al.
Heterogeneous Graph Transformers for Simultaneous Mobile Multi-Robot Task Allocation and Scheduling under Temporal Constraints
Batuhan Altundas, Shengkang Chen, Shivika Singh et al.
Hierarchical World Models as Visual Whole-Body Humanoid Controllers
Nick Hansen, Jyothir S V, Vlad Sobal et al.
Improving Monte Carlo Tree Search for Symbolic Regression
Zhengyao Huang, Daniel Huang, Tiannan Xiao et al.
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
Cong Lu, Shengran Hu, Jeff Clune
Iterative Foundation Model Fine-Tuning on Multiple Rewards
Pouya M. Ghari, simone sciabola, Ye Wang
Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning
Kaihang Pan, Yang Wu, Wendong Bu et al.
Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
Michael Matthews, Michael Beukman, Chris Lu et al.
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning
Seanie Lee, Minsu Kim, Lynn Cherif et al.
LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models
Qianyue Hao, Yiwen Song, Qingmin Liao et al.
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
Haoxian Chen, Hanyang Zhao, Henry Lam et al.
Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning
Gunshi Gupta, Karmesh Yadav, Zsolt Kira et al.
Meta-learning how to Share Credit among Macro-Actions
Ionel-Alexandru Hosu, Traian Rebedea, Razvan Pascanu
MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility
Wayne Wu, Honglin He, Jack He et al.
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Xi Chen, Mingkang Zhu, Shaoteng Liu et al.
Modelling the control of offline processing with reinforcement learning
Eleanor Spens, Neil Burgess, Tim Behrens
MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization
Chenglong Wang, Yang Gan, Hang Zhou et al.
Multi-Agent Collaboration via Evolving Orchestration
Yufan Dang, Chen Qian, Xueheng Luo et al.
NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation
Longtian Qiu, Shan Ning, Jiaxuan Sun et al.
No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes
Jasmine Bayrooti, Sattar Vakili, Amanda Prorok et al.