Poster "policy optimization" Papers

41 papers found

$q$-exponential family for policy optimization

Lingwei Zhu, Haseeb Shah, Han Wang et al.

ICLR 2025posterarXiv:2408.07245

A Differential and Pointwise Control Approach to Reinforcement Learning

Minh Nguyen, Chandrajit Bajaj

NeurIPS 2025posterarXiv:2404.15617
1
citations

CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models

Zhihang Lin, Mingbao Lin, Yuan Xie et al.

NeurIPS 2025posterarXiv:2503.22342
47
citations

EconGym: A Scalable AI Testbed with Diverse Economic Tasks

Qirui Mi, Qipeng Yang, Zijun Fan et al.

NeurIPS 2025posterarXiv:2506.12110
4
citations

EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution

Zhebei Shen, Qifan Yu, Juncheng Li et al.

NeurIPS 2025poster

How to Train Your LLM Web Agent: A Statistical Diagnosis

Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza et al.

NeurIPS 2025poster
4
citations

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

Yicheng Xiao, Lin Song, Yukang Chen et al.

NeurIPS 2025posterarXiv:2505.13031
18
citations

Non-convex entropic mean-field optimization via Best Response flow

Razvan-Andrei Lascu, Mateusz Majka

NeurIPS 2025posterarXiv:2505.22760
1
citations

On the Convergence of Projected Policy Gradient for Any Constant Step Sizes

Jiacai Liu, Wenye Li, Dachao Lin et al.

NeurIPS 2025posterarXiv:2311.01104
4
citations

On the Sample Complexity of Differentially Private Policy Optimization

Yi He, Xingyu Zhou

NeurIPS 2025posterarXiv:2510.21060

Progress Reward Model for Reinforcement Learning via Large Language Models

Xiuhui Zhang, Ning Gao, Xingyu Jiang et al.

NeurIPS 2025poster

ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding

Yiyang Zhou, Yangfan He, Yaofeng Su et al.

NeurIPS 2025posterarXiv:2506.01300
28
citations

RRM: Robust Reward Model Training Mitigates Reward Hacking

Tianqi Liu, Wei Xiong, Jie Ren et al.

ICLR 2025posterarXiv:2409.13156
44
citations

Sharp Analysis for KL-Regularized Contextual Bandits and RLHF

Heyang Zhao, Chenlu Ye, Quanquan Gu et al.

NeurIPS 2025posterarXiv:2411.04625
14
citations

Uncertainty and Influence aware Reward Model Refinement for Reinforcement Learning from Human Feedback

Zexu Sun, Yiju Guo, Yankai Lin et al.

ICLR 2025poster
3
citations

Unlocking Multimodal Mathematical Reasoning via Process Reward Model

Ruilin Luo, Zhuofan Zheng, Lei Wang et al.

NeurIPS 2025posterarXiv:2501.04686
29
citations

Visual-RFT: Visual Reinforcement Fine-Tuning

Ziyu Liu, Zeyi Sun, Yuhang Zang et al.

ICCV 2025posterarXiv:2503.01785
347
citations

Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning

Yen-Ju Chen, Nai-Chieh Huang, Ching-pei Lee et al.

ICML 2024poster

Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations

Feng Gao, Liangzhi Shi, Shenao Zhang et al.

ICML 2024poster

Bayesian Design Principles for Offline-to-Online Reinforcement Learning

Hao Hu, yiqin yang, Jianing Ye et al.

ICML 2024poster

Constrained Reinforcement Learning Under Model Mismatch

Zhongchang Sun, Sihong He, Fei Miao et al.

ICML 2024poster

Dealing With Unbounded Gradients in Stochastic Saddle-point Optimization

Gergely Neu, Nneka Okolo

ICML 2024poster

Degeneration-free Policy Optimization: RL Fine-Tuning for Language Models without Degeneration

Youngsoo Jang, Geon-Hyeong Kim, Byoungjip Kim et al.

ICML 2024poster

EvoRainbow: Combining Improvements in Evolutionary Reinforcement Learning for Policy Search

Pengyi Li, Yan Zheng, Hongyao Tang et al.

ICML 2024poster

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

Yihan Du, Anna Winnicki, Gal Dalal et al.

ICML 2024poster

Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation

JoonHo Lee, Jae Oh Woo, Juree Seok et al.

ICML 2024poster

Information-Directed Pessimism for Offline Reinforcement Learning

Alec Koppel, Sujay Bhatt, Jiacheng Guo et al.

ICML 2024poster

Iterative Regularized Policy Optimization with Imperfect Demonstrations

Xudong Gong, Feng Dawei, Kele Xu et al.

ICML 2024poster

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

songyang gao, Qiming Ge, Wei Shen et al.

ICML 2024poster

Model-based Reinforcement Learning for Confounded POMDPs

Mao Hong, Zhengling Qi, Yanxun Xu

ICML 2024poster

Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback

Asaf Cassel, Haipeng Luo, Aviv Rosenberg et al.

ICML 2024poster

Position: Automatic Environment Shaping is the Next Frontier in RL

Younghyo Park, Gabriel Margolis, Pulkit Agrawal

ICML 2024poster

Probabilistic Constrained Reinforcement Learning with Formal Interpretability

YANRAN WANG, QIUCHEN QIAN, David Boyle

ICML 2024poster

Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization

Liam Schramm, Abdeslam Boularias

ICML 2024poster

Provably Robust DPO: Aligning Language Models with Noisy Feedback

Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan

ICML 2024poster

Rate-Optimal Policy Optimization for Linear Markov Decision Processes

Uri Sherman, Alon Cohen, Tomer Koren et al.

ICML 2024poster

Reflective Policy Optimization

Yaozhong Gan, yan renye, zhe wu et al.

ICML 2024poster

ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

Andrew Jesson, Christopher Lu, Gunshi Gupta et al.

ICML 2024poster

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban et al.

ICML 2024poster

Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient

Ju-Hyun Kim, Seungki Min

ICML 2024poster

Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation

Juntao Dai, Yaodong Yang, Qian Zheng et al.

ICML 2024poster