ICLR 2025 "preference optimization methods" Papers
2 papers found
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective
Ruichen Shao, Bei Li, Gangao Liu et al.
ICLR 2025oralarXiv:2502.14340
7
citations
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking
Benjamin Feuer, Micah Goldblum, Teresa Datta et al.
ICLR 2025posterarXiv:2409.15268
27
citations