2024 "policy optimization" Papers
28 papers found
Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning
Yen-Ju Chen, Nai-Chieh Huang, Ching-pei Lee et al.
Adapting Static Fairness to Sequential Decision-Making: Bias Mitigation Strategies towards Equal Long-term Benefit Rate
Yuancheng Xu, Chenghao Deng, Yanchao Sun et al.
Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations
Feng Gao, Liangzhi Shi, Shenao Zhang et al.
Bayesian Design Principles for Offline-to-Online Reinforcement Learning
Hao Hu, yiqin yang, Jianing Ye et al.
Constrained Reinforcement Learning Under Model Mismatch
Zhongchang Sun, Sihong He, Fei Miao et al.
Dealing With Unbounded Gradients in Stochastic Saddle-point Optimization
Gergely Neu, Nneka Okolo
Degeneration-free Policy Optimization: RL Fine-Tuning for Language Models without Degeneration
Youngsoo Jang, Geon-Hyeong Kim, Byoungjip Kim et al.
DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization
Wenze Chen, Shiyu Huang, Yuan Chiang et al.
EvoRainbow: Combining Improvements in Evolutionary Reinforcement Learning for Policy Search
Pengyi Li, Yan Zheng, Hongyao Tang et al.
Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
Yihan Du, Anna Winnicki, Gal Dalal et al.
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
JoonHo Lee, Jae Oh Woo, Juree Seok et al.
Information-Directed Pessimism for Offline Reinforcement Learning
Alec Koppel, Sujay Bhatt, Jiacheng Guo et al.
Iterative Regularized Policy Optimization with Imperfect Demonstrations
Xudong Gong, Feng Dawei, Kele Xu et al.
Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback
songyang gao, Qiming Ge, Wei Shen et al.
Model-based Reinforcement Learning for Confounded POMDPs
Mao Hong, Zhengling Qi, Yanxun Xu
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Asaf Cassel, Haipeng Luo, Aviv Rosenberg et al.
Optimistic Model Rollouts for Pessimistic Offline Policy Optimization
Yuanzhao Zhai, Yiying Li, Zijian Gao et al.
Optimizing Local Satisfaction of Long-Run Average Objectives in Markov Decision Processes
David Klaska, Antonin Kucera, Vojtěch Kůr et al.
Position: Automatic Environment Shaping is the Next Frontier in RL
Younghyo Park, Gabriel Margolis, Pulkit Agrawal
Probabilistic Constrained Reinforcement Learning with Formal Interpretability
YANRAN WANG, QIUCHEN QIAN, David Boyle
Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization
Liam Schramm, Abdeslam Boularias
Provably Robust DPO: Aligning Language Models with Noisy Feedback
Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan
Rate-Optimal Policy Optimization for Linear Markov Decision Processes
Uri Sherman, Alon Cohen, Tomer Koren et al.
Reflective Policy Optimization
Yaozhong Gan, yan renye, zhe wu et al.
ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages
Andrew Jesson, Christopher Lu, Gunshi Gupta et al.
Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences
Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban et al.
Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient
Ju-Hyun Kim, Seungki Min
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
Juntao Dai, Yaodong Yang, Qian Zheng et al.