Poster "human feedback alignment" Papers
4 papers found
Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining
Aaron Li, Robin Netzorg, Zhihan Cheng et al.
ICML 2024poster
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu, Michael Jordan, Jiantao Jiao
ICML 2024poster
Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
Sanghyun Kim, Seohyeon Jung, Balhae Kim et al.
ECCV 2024posterarXiv:2407.21032
9
citations
ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback
Ganqu Cui, Lifan Yuan, Ning Ding et al.
ICML 2024poster