NeurIPS Poster "preference learning" Papers
7 papers found
Bayesian Optimization with Preference Exploration using a Monotonic Neural Network Ensemble
Hanyang Wang, Juergen Branke, Matthias Poloczek
NeurIPS 2025posterarXiv:2501.18792
Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections
Bo Wang, Qinyuan Cheng, Runyu Peng et al.
NeurIPS 2025posterarXiv:2507.00018
14
citations
Preference Learning with Lie Detectors can Induce Honesty or Evasion
Chris Cundy, Adam Gleave
NeurIPS 2025posterarXiv:2505.13787
4
citations
Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward
Dipendra Misra, Aldo Pacchiano, Ta-Chung Chi et al.
NeurIPS 2025posterarXiv:2601.19055
Scalable Valuation of Human Feedback through Provably Robust Model Alignment
Masahiro Fujisawa, Masaki Adachi, Michael A Osborne
NeurIPS 2025posterarXiv:2505.17859
1
citations
Self-Refining Language Model Anonymizers via Adversarial Distillation
Kyuyoung Kim, Hyunjun Jeon, Jinwoo Shin
NeurIPS 2025posterarXiv:2506.01420
3
citations
Sparta Alignment: Collectively Aligning Multiple Language Models through Combat
Yuru Jiang, Wenxuan Ding, Shangbin Feng et al.
NeurIPS 2025posterarXiv:2506.04721
3
citations