"reinforcement learning" Papers
299 papers found • Page 1 of 6
Conference
$\texttt{G1}$: Teaching LLMs to Reason on Graphs with Reinforcement Learning
Xiaojun Guo, Ang Li, Yifei Wang et al.
A Causal Lens for Learning Long-term Fair Policies
Jacob Lear, Lu Zhang
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.
Action abstractions for amortized sampling
Oussama Boussif, Léna Ezzine, Joseph Viviano et al.
ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning
Yarden As, Bhavya, Lenart Treven et al.
AdaptiveAE: An Adaptive Exposure Strategy for HDR Capturing in Dynamic Scenes
Tianyi Xu, Fan Zhang, Boxin Shi et al.
Adding Conditional Control to Diffusion Models with Reinforcement Learning
Yulai Zhao, Masatoshi Uehara, Gabriele Scalia et al.
A Differential and Pointwise Control Approach to Reinforcement Learning
Minh Nguyen, Chandrajit Bajaj
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
Mingzhe Du, Anh Tuan Luu, Yue Liu et al.
A Generalized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications
Zhenyu Tao, Wei Xu, Xiaohu You
ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition
Daolang Huang, Xinyi Wen, Ayush Bharti et al.
AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws
Oren Neumann, Claudius Gros
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
Haoran Xu, Shuozhe Li, Harshit Sikchi et al.
AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Wei Fu, Jiaxuan Gao, Xujie Shen et al.
Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards
Charles Arnal, Gaëtan Narozniak, Vivien Cabannes et al.
AutoEdit: Automatic Hyperparameter Tuning for Image Editing
Chau Pham, Quan Dao, Mahesh Bhosale et al.
BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning
Artem Zholus, Maksim Kuznetsov, Roman Schutski et al.
CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward
Yandong Guan, Xilin Wang, XiMing Xing et al.
Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning
Chongyi Zheng, Jens Tuyls, Joanne Peng et al.
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
Muye Huang, Lingling Zhang, Jie Ma et al.
Checklists Are Better Than Reward Models For Aligning Language Models
Vijay Viswanathan, Yanchao Sun, Xiang Kong et al.
Complexity Scaling Laws for Neural Models using Combinatorial Optimization
Lowell Weissman, Michael Krumdick, A. Abbott
Computational Hardness of Reinforcement Learning with Partial $q^{\pi}$-Realizability
Shayan Karimi, Xiaoqi Tan
Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics
Runzhe Wu, Ayush Sekhari, Akshay Krishnamurthy et al.
Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation
Feichen Gan, Lu Youcun, Yingying Zhang et al.
Continual Knowledge Adaptation for Reinforcement Learning
Jinwu Hu, ZiHao Lian, Zhiquan Wen et al.
CORE: Collaborative Optimization with Reinforcement Learning and Evolutionary Algorithm for Floorplanning
Pengyi Li, Shixiong Kai, Jianye Hao et al.
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Jingjing Jiang, Chongjie Si, Jun Luo et al.
Cross-Embodiment Dexterous Grasping with Reinforcement Learning
Haoqi Yuan, Bohan Zhou, Yuhui Fu et al.
CURE: Co-Evolving Coders and Unit Testers via Reinforcement Learning
Yinjie Wang, Ling Yang, Ye Tian et al.
Cypher-RI: Reinforcement Learning for Integrating Schema Selection into Cypher Generation
Hanchen Su, Xuyuan Li, Yan Zhou et al.
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
Siyan Zhao, Devaansh Gupta, Qinqing Zheng et al.
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Huajian Xin, Z.Z. Ren, Junxiao Song et al.
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li, Ming Lin, Tomer Galanti et al.
Discrete Codebook World Models for Continuous Control
Aidan Scannell, Mohammadreza Nakhaeinezhadfard, Kalle Kujanpää et al.
Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning
Qi Wang, Zhipeng Zhang, Baao Xie et al.
Doubly Optimal Policy Evaluation for Reinforcement Learning
Shuze Liu, Claire Chen, Shangtong Zhang
Dynamic Configuration for Cutting Plane Separators via Reinforcement Learning on Incremental Graph
Mingxuan Ye, Jie Wang, Fangzhou et al.
Dynamic Contrastive Skill Learning with State-Transition Based Skill Clustering and Dynamic Length Adjustment
Jinwoo Choi, Seung-Woo Seo
DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation
Jiashuo Sun, Xianrui Zhong, Sizhe Zhou et al.
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao, Alexandru Meterez, Sham M. Kakade et al.
EDELINE: Enhancing Memory in Diffusion-based World Models via Linear-Time Sequence Modeling
Jia-Hua Lee, Bor-Jiun Lin, Wei-Fang Sun et al.
Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning
Claire Chen, Shuze Liu, Shangtong Zhang
Efficient Reinforcement Learning in Probabilistic Reward Machines
Xiaofeng Lin, Xuezhou Zhang
Efficient Reinforcement Learning Through Adaptively Pretrained Visual Encoder
Yuhan Zhang, Guoqing Ma, Guangfu Hao et al.
Efficient Reinforcement Learning with Large Language Model Priors
Xue Yan, Yan Song, Xidong Feng et al.
Enhancing Diversity In Parallel Agents: A Maximum State Entropy Exploration Story
Vincenzo De Paola, Riccardo Zamboni, Mirco Mutti et al.
Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data
Shilong Deng, Zetao Zheng, Hongcai He et al.
Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games
Runyu Lu, Peng Zhang, Ruochuan Shi et al.