Poster "policy gradient methods" Papers
19 papers found
$\phi$-Update: A Class of Policy Update Methods with Policy Convergence Guarantee
Wenye Li, Jiacai Liu, Ke Wei
A Policy-Gradient Approach to Solving Imperfect-Information Games with Best-Iterate Convergence
Mingyang Liu, Gabriele Farina, Asuman Ozdaglar
Cross-Domain Off-Policy Evaluation and Learning for Contextual Bandits
Yuta Natsubori, Masataka Ushiku, Yuta Saito
Mitigating Information Loss in Tree-Based Reinforcement Learning via Direct Optimization
Sascha Marton, Tim Grams, Florian Vogt et al.
On the Convergence of Projected Policy Gradient for Any Constant Step Sizes
Jiacai Liu, Wenye Li, Dachao Lin et al.
Policy Gradient with Kernel Quadrature
Tetsuro Morimura, Satoshi Hayakawa
REINFORCE Converges to Optimal Policies with Any Learning Rate
Samuel Robertson, Thang Chu, Bo Dai et al.
Accelerated Policy Gradient for s-rectangular Robust MDPs with Large State Spaces
Ziyi Chen, Heng Huang
Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning
Yen-Ju Chen, Nai-Chieh Huang, Ching-pei Lee et al.
Do Transformer World Models Give Better Policy Gradients?
Michel Ma, Tianwei Ni, Clement Gehring et al.
GFlowNet Training by Policy Gradients
Puhua Niu, Shili Wu, Mingzhou Fan et al.
How to Explore with Belief: State Entropy Maximization in POMDPs
Riccardo Zamboni, Duilio Cirino, Marcello Restelli et al.
Major-Minor Mean Field Multi-Agent Reinforcement Learning
Kai Cui, Christian Fabian, Anam Tahir et al.
Mollification Effects of Policy Gradient Methods
Tao Wang, Sylvia Herbert, Sicun Gao
Optimistic Multi-Agent Policy Gradient
Wenshuai Zhao, Yi Zhao, Zhiyuan Li et al.
Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient
Ju-Hyun Kim, Seungki Min
SAPG: Split and Aggregate Policy Gradients
Jayesh Singla, Ananye Agarwal, Deepak Pathak
Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process
Xiangxin Zhou, Liang Wang, Yichi Zhou
Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles
Bhrij Patel, Wesley A. Suttle, Alec Koppel et al.