Poster "alignment methods" Papers
3 papers found
IF-Guide: Influence Function-Guided Detoxification of LLMs
Zachary Coalson, Juhan Bae, Nicholas Carlini et al.
NEURIPS 2025posterarXiv:2506.01790
1
citations
L3Ms — Lagrange Large Language Models
Guneet Singh Dhillon, Xingjian Shi, Yee Whye Teh et al.
ICLR 2025posterarXiv:2410.21533
1
citations
Token-Level Self-Play with Importance-Aware Guidance for Large Language Models
Tue Le, Hoang Tran, Quyen Tran et al.
NEURIPS 2025poster