"policy gradient methods" Papers
22 papers found
$\phi$-Update: A Class of Policy Update Methods with Policy Convergence Guarantee
Wenye Li, Jiacai Liu, Ke Wei
Cross-Domain Off-Policy Evaluation and Learning for Contextual Bandits
Yuta Natsubori, Masataka Ushiku, Yuta Saito
Mitigating Information Loss in Tree-Based Reinforcement Learning via Direct Optimization
Sascha Marton, Tim Grams, Florian Vogt et al.
On the Convergence of Projected Policy Gradient for Any Constant Step Sizes
Jiacai Liu, Wenye Li, Dachao Lin et al.
Policy Gradient with Kernel Quadrature
Tetsuro Morimura, Satoshi Hayakawa
Accelerated Policy Gradient for s-rectangular Robust MDPs with Large State Spaces
Ziyi Chen, Heng Huang
Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning
Yen-Ju Chen, Nai-Chieh Huang, Ching-pei Lee et al.
Dialogue for Prompting: A Policy-Gradient-Based Discrete Prompt Generation for Few-Shot Learning
Chengzhengxu Li, Xiaoming Liu, Yichen Wang et al.
Do Transformer World Models Give Better Policy Gradients?
Michel Ma, Tianwei Ni, Clement Gehring et al.
GFlowNet Training by Policy Gradients
Puhua Niu, Shili Wu, Mingzhou Fan et al.
How to Explore with Belief: State Entropy Maximization in POMDPs
Riccardo Zamboni, Duilio Cirino, Marcello Restelli et al.
Learning Optimal Deterministic Policies with Stochastic Policy Gradients
Alessandro Montenegro, Marco Mussi, Alberto Maria Metelli et al.
Major-Minor Mean Field Multi-Agent Reinforcement Learning
Kai Cui, Christian Fabian, Anam Tahir et al.
Mean Field Langevin Actor-Critic: Faster Convergence and Global Optimality beyond Lazy Learning
Kakei Yamamoto, Kazusato Oko, Zhuoran Yang et al.
Mollification Effects of Policy Gradient Methods
Tao Wang, Sylvia Herbert, Sicun Gao
Non-Asymptotic Analysis for Single-Loop (Natural) Actor-Critic with Compatible Function Approximation
Yudan Wang, Yue Wang, Yi Zhou et al.
Optimistic Multi-Agent Policy Gradient
Wenshuai Zhao, Yi Zhao, Zhiyuan Li et al.
Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence beyond the Minty Property
I. Anagnostides, Ioannis Panageas, Gabriele Farina et al.
Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient
Ju-Hyun Kim, Seungki Min
SAPG: Split and Aggregate Policy Gradients
Jayesh Singla, Ananye Agarwal, Deepak Pathak
Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process
Xiangxin Zhou, Liang Wang, Yichi Zhou
Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles
Bhrij Patel, Wesley A. Suttle, Alec Koppel et al.