Poster "llm alignment" Papers
8 papers found
Avoiding exp(R) scaling in RLHF through Preference-based Exploration
Mingyu Chen, Yiding Chen, Wen Sun et al.
NeurIPS 2025poster
3
citations
Is In-Context Learning Sufficient for Instruction Following in LLMs?
Hao Zhao, Maksym Andriushchenko, francesco croce et al.
ICLR 2025posterarXiv:2405.19874
21
citations
Learn Your Reference Model for Real Good Alignment
Alexey Gorbatovski, Boris Shaposhnikov, Alexey Malakhov et al.
ICLR 2025posterarXiv:2404.09656
46
citations
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Maojia Song, Shang Hong Sim, Rishabh Bhardwaj et al.
ICLR 2025posterarXiv:2409.11242
29
citations
Meta-Learning Objectives for Preference Optimization
Carlo Alfano, Silvia Sapora, Jakob Foerster et al.
NeurIPS 2025posterarXiv:2411.06568
2
citations
Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Erik Jones, Arjun Patrawala, Jacob Steinhardt
ICLR 2025posterarXiv:2503.04113
2
citations
Debating with More Persuasive LLMs Leads to More Truthful Answers
Akbir Khan, John Hughes, Dan Valentine et al.
ICML 2024poster
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Hao Zhao, Maksym Andriushchenko, Francesco Croce et al.
ICML 2024poster