"proximal policy optimization" Papers
10 papers found
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
Zijia Zhao, Longteng Guo, Jie Cheng et al.
ICLR 2025posterarXiv:2410.10456
8
citations
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning
Yuzheng Hu, Fan Wu, Haotian Ye et al.
NeurIPS 2025oralarXiv:2505.19281
2
citations
Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling
Jakob Hollenstein, Georg Martius, Justus Piater
AAAI 2024paperarXiv:2312.11091
8
citations
Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
Minchan Kim, Minyeong Kim, Junik Bae et al.
ECCV 2024posterarXiv:2403.16167
10
citations
Graph-Based Prediction and Planning Policy Network (GP3Net) for Scalable Self-Driving in Dynamic Environments Using Deep Reinforcement Learning
Jayabrata Chowdhury, Venkataramanan Shivaraman, Suresh Sundaram et al.
AAAI 2024paperarXiv:2312.05784
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Shusheng Xu, Wei Fu, Jiaxuan Gao et al.
ICML 2024poster
Learning Diverse Risk Preferences in Population-Based Self-Play
Yuhua Jiang, Qihan Liu, Xiaoteng Ma et al.
AAAI 2024paperarXiv:2305.11476
Multimodal Label Relevance Ranking via Reinforcement Learning
Taian Guo, Taolin Zhang, Haoqian Wu et al.
ECCV 2024posterarXiv:2407.13221
1
citations
Reflective Policy Optimization
Yaozhong Gan, yan renye, zhe wu et al.
ICML 2024poster
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li, Tian Xu, Yushun Zhang et al.
ICML 2024poster