"llm alignment" Papers
11 papers found
Conference
As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss
Xin Mao, Huimin Xu, Feng-Lin Li et al.
ICLR 2025arXiv:2410.04834
3
citations
Avoiding exp(R) scaling in RLHF through Preference-based Exploration
Mingyu Chen, Yiding Chen, Wen Sun et al.
NEURIPS 2025
3
citations
Fixing It in Post: A Comparative Study of LLM Post-Training Data Quality and Model Performance
Aladin Djuhera, Swanand Kadhe, Syed Zawad et al.
NEURIPS 2025spotlightarXiv:2506.06522
Is In-Context Learning Sufficient for Instruction Following in LLMs?
Hao Zhao, Maksym Andriushchenko, francesco croce et al.
ICLR 2025arXiv:2405.19874
21
citations
Learn Your Reference Model for Real Good Alignment
Alexey Gorbatovski, Boris Shaposhnikov, Alexey Malakhov et al.
ICLR 2025arXiv:2404.09656
49
citations
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Maojia Song, Shang Hong Sim, Rishabh Bhardwaj et al.
ICLR 2025arXiv:2409.11242
29
citations
Meta-Learning Objectives for Preference Optimization
Carlo Alfano, Silvia Sapora, Jakob Foerster et al.
NEURIPS 2025arXiv:2411.06568
3
citations
TODO: Enhancing LLM Alignment with Ternary Preferences
Yuxiang Guo, Lu Yin, Bo Jiang et al.
ICLR 2025arXiv:2411.02442
5
citations
Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Erik Jones, Arjun Patrawala, Jacob Steinhardt
ICLR 2025arXiv:2503.04113
3
citations
Debating with More Persuasive LLMs Leads to More Truthful Answers
Akbir Khan, John Hughes, Dan Valentine et al.
ICML 2024arXiv:2402.06782
212
citations
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Hao Zhao, Maksym Andriushchenko, Francesco Croce et al.
ICML 2024arXiv:2402.04833
88
citations