2025 "reinforcement learning human feedback" Papers
5 papers found
Ask a Strong LLM Judge when Your Reward Model is Uncertain
Zhenghao Xu, Qin Lu, Qingru Zhang et al.
NEURIPS 2025posterarXiv:2510.20369
Efficient and Near-Optimal Algorithm for Contextual Dueling Bandits with Offline Regression Oracles
Aadirupa Saha, Robert Schapire
NEURIPS 2025poster
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Xiyue Peng, Hengquan Guo, Jiawei Zhang et al.
NEURIPS 2025posterarXiv:2410.19933
5
citations
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Yuheng Zhang, Dian Yu, Baolin Peng et al.
ICLR 2025posterarXiv:2407.00617
31
citations
Towards Federated RLHF with Aggregated Client Preference for LLMs
Feijie Wu, Xiaoze Liu, Haoyu Wang et al.
ICLR 2025posterarXiv:2407.03038
9
citations