"llm alignment" Papers

11 papers found

As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss

Xin Mao, Huimin Xu, Feng-Lin Li et al.

ICLR 2025arXiv:2410.04834
3
citations

Avoiding exp(R) scaling in RLHF through Preference-based Exploration

Mingyu Chen, Yiding Chen, Wen Sun et al.

NEURIPS 2025
3
citations

Fixing It in Post: A Comparative Study of LLM Post-Training Data Quality and Model Performance

Aladin Djuhera, Swanand Kadhe, Syed Zawad et al.

NEURIPS 2025spotlightarXiv:2506.06522

Is In-Context Learning Sufficient for Instruction Following in LLMs?

Hao Zhao, Maksym Andriushchenko, francesco croce et al.

ICLR 2025arXiv:2405.19874
21
citations

Learn Your Reference Model for Real Good Alignment

Alexey Gorbatovski, Boris Shaposhnikov, Alexey Malakhov et al.

ICLR 2025arXiv:2404.09656
49
citations

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Maojia Song, Shang Hong Sim, Rishabh Bhardwaj et al.

ICLR 2025arXiv:2409.11242
29
citations

Meta-Learning Objectives for Preference Optimization

Carlo Alfano, Silvia Sapora, Jakob Foerster et al.

NEURIPS 2025arXiv:2411.06568
3
citations

TODO: Enhancing LLM Alignment with Ternary Preferences

Yuxiang Guo, Lu Yin, Bo Jiang et al.

ICLR 2025arXiv:2411.02442
5
citations

Uncovering Gaps in How Humans and LLMs Interpret Subjective Language

Erik Jones, Arjun Patrawala, Jacob Steinhardt

ICLR 2025arXiv:2503.04113
3
citations

Debating with More Persuasive LLMs Leads to More Truthful Answers

Akbir Khan, John Hughes, Dan Valentine et al.

ICML 2024arXiv:2402.06782
212
citations

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

Hao Zhao, Maksym Andriushchenko, Francesco Croce et al.

ICML 2024arXiv:2402.04833
88
citations