2025 Poster "model alignment" Papers
10 papers found
Anyprefer: An Agentic Framework for Preference Data Synthesis
Yiyang Zhou, Zhaoyang Wang, Tianle Wang et al.
ICLR 2025posterarXiv:2504.19276
10
citations
HelpSteer2-Preference: Complementing Ratings with Preferences
Zhilin Wang, Alexander Bukharin, Olivier Delalleau et al.
ICLR 2025posterarXiv:2410.01257
103
citations
Jailbreaking as a Reward Misspecification Problem
Zhihui Xie, Jiahui Gao, Lei Li et al.
ICLR 2025posterarXiv:2406.14393
9
citations
Large Language Models Assume People are More Rational than We Really are
Ryan Liu, Jiayi Geng, Joshua Peterson et al.
ICLR 2025posterarXiv:2406.17055
37
citations
LLaVA-Critic: Learning to Evaluate Multimodal Models
Tianyi Xiong, Xiyao Wang, Dong Guo et al.
CVPR 2025posterarXiv:2410.02712
95
citations
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Zhangchen Xu, Fengqing Jiang, Luyao Niu et al.
ICLR 2025posterarXiv:2406.08464
261
citations
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Zhaorun Chen, Zichen Wen, Yichao Du et al.
NeurIPS 2025posterarXiv:2407.04842
57
citations
SAS: Segment Any 3D Scene with Integrated 2D Priors
Zhuoyuan Li, Jiahao Lu, Jiacheng Deng et al.
ICCV 2025posterarXiv:2503.08512
2
citations
Scalable Ranked Preference Optimization for Text-to-Image Generation
Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata et al.
ICCV 2025posterarXiv:2410.18013
21
citations
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.
ICLR 2025posterarXiv:2410.18640
14
citations