"reward maximization" Papers
3 papers found
DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching
Guanghe Li, Yixiang Shan, Zhengbang Zhu et al.
ICML 2024poster
Feedback Efficient Online Fine-Tuning of Diffusion Models
Masatoshi Uehara, Yulai Zhao, Kevin Black et al.
ICML 2024poster
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Kenneth Li, Samy Jelassi, Hugh Zhang et al.
ICML 2024poster