Policy Optimization
Policy gradient and optimization methods
Top Papers
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang, Jiaxing Huang, Huanjin Yao et al.
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
Weiran Yao, Shelby Heinecke, Juan Carlos Niebles et al.
Theoretical guarantees on the best-of-n alignment policy
Ahmad Beirami, Alekh Agarwal, Jonathan Berant et al.
GLOP: Learning Global Partition and Local Construction for Solving Large-Scale Routing Problems in Real-Time
Haoran Ye, Jiarui Wang, Helan Liang et al.
Offline Actor-Critic for Average Reward MDPs
William Powell, Jeongyeol Kwon, Qiaomin Xie et al.
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning
Chongyu Fan, Jiancheng Liu, Licong Lin et al.
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models
Zhihang Lin, Mingbao Lin, Yuan Xie et al.
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
Audrey Huang, Wenhao Zhan, Tengyang Xie et al.
Gradient Reweighting: Towards Imbalanced Class-Incremental Learning
Jiangpeng He
Preference Optimization for Reasoning with Pseudo Feedback
Fangkai Jiao, Geyang Guo, Xingxing Zhang et al.
CPPO: Continual Learning for Reinforcement Learning with Human Feedback
Han Zhang, Yu Lei, Lin Gui et al.
Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective
Sifan Wang, Ananyae bhartari, Bowen Li et al.
Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity
Eduard Gorbunov, Nazarii Tupitsa, Sayantan Choudhury et al.
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
Junkang Wu, Yuexiang Xie, Zhengyi Yang et al.
Self-Improvement for Neural Combinatorial Optimization: Sample Without Replacement, but Improvement
Dominik Grimm, Jonathan Pirnay
ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning
Chen-Xiao Gao, Chenyang Wu, Mingjun Cao et al.
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
Yinmin Zhang, Jie Liu, Chuming Li et al.
AffordDP: Generalizable Diffusion Policy with Transferable Affordance
Shijie Wu, Yihang Zhu, Yunao Huang et al.
Efficient Online Reinforcement Learning for Diffusion Policy
Haitong Ma, Tianyi Chen, Kai Wang et al.
The AdEMAMix Optimizer: Better, Faster, Older
Matteo Pagliardini, Pierre Ablin, David Grangier
Self-Consistency Preference Optimization
Archiki Prasad, Weizhe Yuan, Richard Yuanzhe Pang et al.
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao, Yige Yuan, Zhengyu Chen et al.
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Yuzi Yan, Yibo Miao, Jialian Li et al.
ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks
Qiang Liu, Mengyu Chu, Nils Thuerey
Leaving the Nest: Going beyond Local Loss Functions for Predict-Then-Optimize
Sanket Shah, Bryan Wilder, Andrew Perrault et al.
GOAL: A Generalist Combinatorial Optimization Agent Learner
Darko Drakulić, Sofia Michel, Jean-Marc Andreoli
Adversarial Adaptive Sampling: Unify PINN and Optimal Transport for the Approximation of PDEs
Kejun Tang, Jiayu Zhai, Xiaoliang Wan et al.
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Zhen Liu, Tim Xiao, Weiyang Liu et al.
Constrained Bayesian Optimization under Partial Observations: Balanced Improvements and Provable Convergence
Shengbo Wang, Ke Li
Traffic Flow Optimisation for Lifelong Multi-Agent Path Finding
Zhe Chen, Daniel Harabor, Jiaoyang Li et al.
B2Opt: Learning to Optimize Black-box Optimization with Little Budget
Xiaobin Li, Kai Wu, Xiaoyu Zhang et al.
Understanding Optimization in Deep Learning with Central Flows
Jeremy Cohen, Alex Damian, Ameet Talwalkar et al.
Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping
Zijie Pan, Jiachen Lu, Xiatian Zhu et al.
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Hanyang Zhao, Genta Winata, Anirban Das et al.
u-$\mu$P: The Unit-Scaled Maximal Update Parametrization
Charles Blake, Constantin Eichenberg, Josef Dean et al.
No Preference Left Behind: Group Distributional Preference Optimization
Binwei Yao, Zefan Cai, Yun-Shiuan Chuang et al.
Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods
Oussama Zekri, Nicolas Boulle
Aioli: A Unified Optimization Framework for Language Model Data Mixing
Mayee Chen, Michael Hu, Nicholas Lourie et al.
BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute
Dujian Ding, Ankur Mallick, Shaokun Zhang et al.
ET-SEED: EFFICIENT TRAJECTORY-LEVEL SE(3) EQUIVARIANT DIFFUSION POLICY
Chenrui Tie, Yue Chen, Ruihai Wu et al.
Apollo-MILP: An Alternating Prediction-Correction Neural Solving Framework for Mixed-Integer Linear Programming
Haoyang Liu, Jie Wang, Zijie Geng et al.
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
Yiran Guo, Lijie Xu, Jie Liu et al.
Multi-Objective Bayesian Optimization with Active Preference Learning
Ryota Ozaki, Kazuki Ishikawa, Youhei Kanzaki et al.
Deep Distributed Optimization for Large-Scale Quadratic Programming
Augustinos Saravanos, Hunter Kuperman, Alex Oshin et al.
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization
XiangCheng Zhang, Fang Kong, Baoxiang Wang et al.
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.
One Forward is Enough for Neural Network Training via Likelihood Ratio Method
Jinyang Jiang, Zeliang Zhang, Chenliang Xu et al.
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization
Yue Zhang, Liqiang Jing, Vibhav Gogate
In Search of Adam’s Secret Sauce
Antonio Orvieto, Robert Gower
Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods
Sara Klein, Simon Weissmann, Leif Döring
Pareto Deep Long-Tailed Recognition: A Conflict-Averse Solution
Zhipeng Zhou, Liu Liu, Peilin Zhao et al.
DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models
Ziyi Wu, Anil Kag, Ivan Skorokhodov et al.
Bilevel Optimization under Unbounded Smoothness: A New Algorithm and Convergence Analysis
Jie Hao, Xiaochuan Gong, Mingrui Liu
SDGMNet: Statistic-Based Dynamic Gradient Modulation for Local Descriptor Learning
Yuxin Deng, Jiayi Ma
Learning to Pivot as a Smart Expert
Tianhao Liu, Shanwen Pu, Dongdong Ge et al.
QLABGrad: A Hyperparameter-Free and Convergence-Guaranteed Scheme for Deep Learning
Fang-Xiang Wu, Minghan Fu
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models
Zeman Li, Xinwei Zhang, Peilin Zhong et al.
PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
Qihan Huang, Weilong Dai, Jinlong Liu et al.
Cumulative Regret Analysis of the Piyavskii–Shubert Algorithm and Its Variants for Global Optimization
Kaan Gokcesu, Hakan Gökcesu
Understanding and Improving Optimization in Predictive Coding Networks
Nicholas Alonso, Jeffrey Krichmar, Emre Neftci
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Jin Zhou, Kaiwen Wang, Jonathan Chang et al.
Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models
Yingqing Guo, Yukang Yang, Hui Yuan et al.
Radiology Report Generation via Multi-objective Preference Optimization
Ting Xiao, Lei Shi, Peng Liu et al.
Pareto Front-Diverse Batch Multi-Objective Bayesian Optimization
Alaleh Ahmadianshalchi, Syrine Belakaria, Janardhan Rao Doppa
Few for Many: Tchebycheff Set Scalarization for Many-Objective Optimization
Xi Lin, Yilu Liu, Xiaoyuan Zhang et al.
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
Yunhong Lu, Qichao Wang, Hengyuan Cao et al.
Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation
Zhao Song, Mingquan Ye, Junze Yin et al.
DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization
Wenze Chen, Shiyu Huang, Yuan Chiang et al.
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Bingrui Li, Wei Huang, Andi Han et al.
Transition Path Sampling with Improved Off-Policy Training of Diffusion Path Samplers
Kiyoung Seong, Seonghyun Park, Seonghwan Kim et al.
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang, Lei Ying
Improved Active Learning via Dependent Leverage Score Sampling
Atsushi Shimizu, Xiaoou Cheng, Christopher Musco et al.
Improving Physics-Augmented Continuum Neural Radiance Field-Based Geometry-Agnostic System Identification with Lagrangian Particle Optimization
Takuhiro Kaneko
Boost Your Human Image Generation Model via Direct Preference Optimization
Sanghyeon Na, Yonggyu Kim, Hyunjoon Lee
Offline-to-Online Hyperparameter Transfer for Stochastic Bandits
Dravyansh Sharma, Arun Suggala
Online Guidance Graph Optimization for Lifelong Multi-Agent Path Finding
Hongzhi Zang, Yulun Zhang, He Jiang et al.
Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments
Yun Qu, Cheems Wang, Yixiu Mao et al.
Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective
Zhaoxin Wang, Handing Wang, Cong Tian et al.
Sharpness-Aware Minimization: General Analysis and Improved Rates
Dimitris Oikonomou, Nicolas Loizou
Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling
Jakob Hollenstein, Georg Martius, Justus Piater
Backdoor Adjustment via Group Adaptation for Debiased Coupon Recommendations
Junpeng Fang, Gongduo Zhang, Qing Cui et al.
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Yifei Zhou, Ayush Sekhari, Yuda Song et al.
Offline Multi-Agent Reinforcement Learning via In-Sample Sequential Policy Optimization
Zongkai Liu, Qian Lin, Chao Yu et al.
Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization
Daniel Palenicek, Florian Vogt, Joe Watson et al.
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li, Ming Lin, Tomer Galanti et al.
Direct Alignment with Heterogeneous Preferences
Ali Shirali, Arash Nasr-Esfahany, Abdullah Alomar et al.
Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
Seongho Son, William Bankes, Sayak Ray Chowdhury et al.
Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function
Maria-Florina Balcan, Anh Nguyen, Dravyansh Sharma
Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
Paria Rashidinejad, Yuandong Tian
Regret Analysis of Repeated Delegated Choice
Suho Shin, Keivan Rezaei, Mohammad Hajiaghayi et al.
Decision Tree Induction Through LLMs via Semantically-Aware Evolution
Tennison Liu, Nicolas Huynh, Mihaela van der Schaar
Identifying Policy Gradient Subspaces
Jan Schneider, Pierre Schumacher, Simon Guist et al.
Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning
Yang Xu, Washim Mondal, Vaneet Aggarwal
Dialogue for Prompting: A Policy-Gradient-Based Discrete Prompt Generation for Few-Shot Learning
Chengzhengxu Li, Xiaoming Liu, Yichen Wang et al.
Forward KL Regularized Preference Optimization for Aligning Diffusion Policies
Zhao Shan, Chenyou Fan, Shuang Qiu et al.
Learning Cross-hand Policies of High-DOF Reaching and Grasping
Qijin She, Shishun Zhang, Yunfan Ye et al.
Two-timescale Extragradient for Finding Local Minimax Points
Jiseok Chae, Kyuwon Kim, Donghwan Kim
EVOS: Efficient Implicit Neural Training via EVOlutionary Selector
Weixiang Zhang, Shuzhao Xie, Chengwei Ren et al.
Learning a Neural Solver for Parametric PDEs to Enhance Physics-Informed Methods
Lise Le Boudec, Emmanuel de Bézenac, Louis Serrano et al.