"llm alignment" Papers

11 papers found

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss

Xin Mao, Huimin Xu, Feng-Lin Li et al.

ICLR 2025arXiv:2410.04834

citations

Avoiding exp(R) scaling in RLHF through Preference-based Exploration

Mingyu Chen, Yiding Chen, Wen Sun et al.

NEURIPS 2025

citations

Fixing It in Post: A Comparative Study of LLM Post-Training Data Quality and Model Performance

Aladin Djuhera, Swanand Kadhe, Syed Zawad et al.

NEURIPS 2025spotlightarXiv:2506.06522

Is In-Context Learning Sufficient for Instruction Following in LLMs?

Hao Zhao, Maksym Andriushchenko, francesco croce et al.

ICLR 2025arXiv:2405.19874

citations

Learn Your Reference Model for Real Good Alignment

Alexey Gorbatovski, Boris Shaposhnikov, Alexey Malakhov et al.

ICLR 2025arXiv:2404.09656

citations

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Maojia Song, Shang Hong Sim, Rishabh Bhardwaj et al.

ICLR 2025arXiv:2409.11242

citations

Meta-Learning Objectives for Preference Optimization

Carlo Alfano, Silvia Sapora, Jakob Foerster et al.

NEURIPS 2025arXiv:2411.06568

citations

TODO: Enhancing LLM Alignment with Ternary Preferences

Yuxiang Guo, Lu Yin, Bo Jiang et al.

ICLR 2025arXiv:2411.02442

citations

Uncovering Gaps in How Humans and LLMs Interpret Subjective Language

Erik Jones, Arjun Patrawala, Jacob Steinhardt

ICLR 2025arXiv:2503.04113

citations

Debating with More Persuasive LLMs Leads to More Truthful Answers

Akbir Khan, John Hughes, Dan Valentine et al.

ICML 2024arXiv:2402.06782

212

citations

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

Hao Zhao, Maksym Andriushchenko, Francesco Croce et al.

ICML 2024arXiv:2402.04833

citations