"on-policy reinforcement learning" Papers
5 papers found
Absolute Policy Optimization: Enhancing Lower Probability Bound of Performance with High Confidence
Weiye Zhao, Feihan Li, Yifan Sun et al.
ICML 2024poster
Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling
Jakob Hollenstein, Georg Martius, Justus Piater
AAAI 2024paperarXiv:2312.11091
8
citations
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Fahim Tajwar, Anikait Singh, Archit Sharma et al.
ICML 2024poster
Reflective Policy Optimization
Yaozhong Gan, yan renye, zhe wu et al.
ICML 2024poster
SAPG: Split and Aggregate Policy Gradients
Jayesh Singla, Ananye Agarwal, Deepak Pathak
ICML 2024poster