"reward modeling" Papers

12 papers found

Advancing LLM Reasoning Generalists with Preference Trees

Lifan Yuan, Ganqu Cui, Hanbin Wang et al.

ICLR 2025posterarXiv:2404.02078
179
citations

Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing

Yisong Xiao, Aishan Liu, Siyuan Liang et al.

NeurIPS 2025posterarXiv:2510.01243
2
citations

PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment

Daiwei Chen, Yi Chen, Aniket Rege et al.

ICLR 2025poster
9
citations

Variational Best-of-N Alignment

Afra Amini, Tim Vieira, Elliott Ash et al.

ICLR 2025posterarXiv:2407.06057
37
citations

VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation

Kuo-Han Hung, Pang-Chi Lo, Jia-Fong Yeh et al.

ICLR 2025posterarXiv:2405.16545
2
citations

Detecting and Preventing Hallucinations in Large Vision Language Models

Anisha Gunjal, Jihan Yin, Erhan Bas

AAAI 2024paperarXiv:2308.06394
256
citations

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

Yanda Chen, Ruiqi Zhong, Narutatsu Ri et al.

ICML 2024spotlight

Efficient Exploration for LLMs

Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao et al.

ICML 2024poster

HarmonyDream: Task Harmonization Inside World Models

Haoyu Ma, Jialong Wu, Ningya Feng et al.

ICML 2024poster

Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation

JoonHo Lee, Jae Oh Woo, Juree Seok et al.

ICML 2024poster

Stealthy Imitation: Reward-guided Environment-free Policy Stealing

Zhixiong Zhuang, Irina Nicolae, Mario Fritz

ICML 2024poster

Token-level Direct Preference Optimization

Yongcheng Zeng, Guoqing Liu, Weiyu Ma et al.

ICML 2024poster