2024 "reward modeling" Papers
7 papers found
Detecting and Preventing Hallucinations in Large Vision Language Models
Anisha Gunjal, Jihan Yin, Erhan Bas
AAAI 2024paperarXiv:2308.06394
256
citations
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Yanda Chen, Ruiqi Zhong, Narutatsu Ri et al.
ICML 2024spotlight
Efficient Exploration for LLMs
Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao et al.
ICML 2024poster
HarmonyDream: Task Harmonization Inside World Models
Haoyu Ma, Jialong Wu, Ningya Feng et al.
ICML 2024poster
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
JoonHo Lee, Jae Oh Woo, Juree Seok et al.
ICML 2024poster
Stealthy Imitation: Reward-guided Environment-free Policy Stealing
Zhixiong Zhuang, Irina Nicolae, Mario Fritz
ICML 2024poster
Token-level Direct Preference Optimization
Yongcheng Zeng, Guoqing Liu, Weiyu Ma et al.
ICML 2024poster