"model alignment" Papers

24 papers found

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

An Auditing Test to Detect Behavioral Shift in Language Models

Leo Richter, Xuanli He, Pasquale Minervini et al.

ICLR 2025oralarXiv:2410.19406

citations

Anyprefer: An Agentic Framework for Preference Data Synthesis

Yiyang Zhou, Zhaoyang Wang, Tianle Wang et al.

ICLR 2025arXiv:2504.19276

citations

Composition and Alignment of Diffusion Models using Constrained Learning

Shervin Khalafi, Ignacio Hounie, Dongsheng Ding et al.

NEURIPS 2025arXiv:2508.19104

citations

HelpSteer2-Preference: Complementing Ratings with Preferences

Zhilin Wang, Alexander Bukharin, Olivier Delalleau et al.

ICLR 2025arXiv:2410.01257

112

citations

Improving Large Vision and Language Models by Learning from a Panel of Peers

Jefferson Hernandez, Jing Shi, Simon Jenni et al.

ICCV 2025arXiv:2509.01610

citations

Information Theoretic Text-to-Image Alignment

Chao Wang, Giulio Franzese, alessandro finamore et al.

ICLR 2025arXiv:2405.20759

citations

Jailbreaking as a Reward Misspecification Problem

Zhihui Xie, Jiahui Gao, Lei Li et al.

ICLR 2025arXiv:2406.14393

citations

Large Language Models Assume People are More Rational than We Really are

Ryan Liu, Jiayi Geng, Joshua Peterson et al.

ICLR 2025arXiv:2406.17055

citations

LLaVA-Critic: Learning to Evaluate Multimodal Models

Tianyi Xiong, Xiyao Wang, Dong Guo et al.

CVPR 2025arXiv:2410.02712

103

citations

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

Yushi Bai, Jiajie Zhang, Xin Lv et al.

ICLR 2025arXiv:2408.07055

103

citations

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Zhangchen Xu, Fengqing Jiang, Luyao Niu et al.

ICLR 2025arXiv:2406.08464

276

citations

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Zhaorun Chen, Zichen Wen, Yichao Du et al.

NEURIPS 2025arXiv:2407.04842

citations

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

Hanyang Zhao, Genta Winata, Anirban Das et al.

ICLR 2025arXiv:2410.04203

citations

SAS: Segment Any 3D Scene with Integrated 2D Priors

Zhuoyuan Li, Jiahao Lu, Jiacheng Deng et al.

ICCV 2025arXiv:2503.08512

citations

Scalable Ranked Preference Optimization for Text-to-Image Generation

Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata et al.

ICCV 2025arXiv:2410.18013

citations

Self-Boosting Large Language Models with Synthetic Preference Data

Qingxiu Dong, Li Dong, Xingxing Zhang et al.

ICLR 2025arXiv:2410.06961

citations

The Best Instruction-Tuning Data are Those That Fit

Dylan Zhang, Qirun Dai, Hao Peng

NEURIPS 2025spotlightarXiv:2502.04194

citations

Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model

Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.

ICLR 2025arXiv:2410.18640

citations

Active Preference Learning for Large Language Models

William Muldrew, Peter Hayes, Mingtian Zhang et al.

ICML 2024arXiv:2402.08114

citations

Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection

QIJIE MO, Yipeng Gao, Shenghao Fu et al.

ECCV 2024arXiv:2407.11499

citations

Learning and Forgetting Unsafe Examples in Large Language Models

Jiachen Zhao, Zhun Deng, David Madras et al.

ICML 2024oralarXiv:2312.12736

citations

Model Alignment as Prospect Theoretic Optimization

Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff et al.

ICML 2024spotlightarXiv:2402.01306

871

citations

Recovering the Pre-Fine-Tuning Weights of Generative Models

Eliahu Horwitz, Jonathan Kahana, Yedid Hoshen

ICML 2024arXiv:2402.10208

citations

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Collin Burns, Pavel Izmailov, Jan Kirchner et al.

ICML 2024arXiv:2312.09390

406

citations