2024 Poster "policy optimization" Papers
24 papers found
Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning
Yen-Ju Chen, Nai-Chieh Huang, Ching-pei Lee et al.
Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations
Feng Gao, Liangzhi Shi, Shenao Zhang et al.
Bayesian Design Principles for Offline-to-Online Reinforcement Learning
Hao Hu, yiqin yang, Jianing Ye et al.
Constrained Reinforcement Learning Under Model Mismatch
Zhongchang Sun, Sihong He, Fei Miao et al.
Dealing With Unbounded Gradients in Stochastic Saddle-point Optimization
Gergely Neu, Nneka Okolo
Degeneration-free Policy Optimization: RL Fine-Tuning for Language Models without Degeneration
Youngsoo Jang, Geon-Hyeong Kim, Byoungjip Kim et al.
EvoRainbow: Combining Improvements in Evolutionary Reinforcement Learning for Policy Search
Pengyi Li, Yan Zheng, Hongyao Tang et al.
Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
Yihan Du, Anna Winnicki, Gal Dalal et al.
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
JoonHo Lee, Jae Oh Woo, Juree Seok et al.
Information-Directed Pessimism for Offline Reinforcement Learning
Alec Koppel, Sujay Bhatt, Jiacheng Guo et al.
Iterative Regularized Policy Optimization with Imperfect Demonstrations
Xudong Gong, Feng Dawei, Kele Xu et al.
Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback
songyang gao, Qiming Ge, Wei Shen et al.
Model-based Reinforcement Learning for Confounded POMDPs
Mao Hong, Zhengling Qi, Yanxun Xu
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Asaf Cassel, Haipeng Luo, Aviv Rosenberg et al.
Position: Automatic Environment Shaping is the Next Frontier in RL
Younghyo Park, Gabriel Margolis, Pulkit Agrawal
Probabilistic Constrained Reinforcement Learning with Formal Interpretability
YANRAN WANG, QIUCHEN QIAN, David Boyle
Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization
Liam Schramm, Abdeslam Boularias
Provably Robust DPO: Aligning Language Models with Noisy Feedback
Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan
Rate-Optimal Policy Optimization for Linear Markov Decision Processes
Uri Sherman, Alon Cohen, Tomer Koren et al.
Reflective Policy Optimization
Yaozhong Gan, yan renye, zhe wu et al.
ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages
Andrew Jesson, Christopher Lu, Gunshi Gupta et al.
Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences
Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban et al.
Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient
Ju-Hyun Kim, Seungki Min
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
Juntao Dai, Yaodong Yang, Qian Zheng et al.