"preference tuning" Papers
3 papers found
Conference
Learning Dynamics of LLM Finetuning
YI REN, Danica Sutherland
ICLR 2025arXiv:2407.10490
66
citations
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Yi Ding, Ruqi Zhang
NEURIPS 2025arXiv:2505.22651
7
citations
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee, Xiaoyan Bai, Itamar Pres et al.
ICML 2024arXiv:2401.01967
165
citations