"ai alignment" Papers
8 papers found
Conference
Deep Value Benchmark: Measuring Whether Models Generalize Deep values or Shallow Preferences
Joshua Ashkinaze, Hua Shen, Saipranav Avula et al.
NEURIPS 2025oralarXiv:2511.02109
1
citations
Efficient and Near-Optimal Algorithm for Contextual Dueling Bandits with Offline Regression Oracles
Aadirupa Saha, Robert Schapire
NEURIPS 2025
Impartial Selection with Predictions
NEURIPS 2025arXiv:2510.19002
Learning “Partner-Aware” Collaborators in Multi-Party Collaboration
Abhijnan Nath, Nikhil Krishnaswamy
NEURIPS 2025arXiv:2510.22462
Many LLMs Are More Utilitarian Than One
Anita Keshmirian, Razan Baltaji, Babak Hemmatian et al.
NEURIPS 2025oralarXiv:2507.00814
2
citations
Preference Learning for AI Alignment: a Causal Perspective
Katarzyna Kobalczyk, Mihaela van der Schaar
ICML 2025arXiv:2506.05967
2
citations
AI Alignment with Changing and Influenceable Reward Functions
Micah Carroll, Davis Foote, Anand Siththaranjan et al.
ICML 2024arXiv:2405.17713
43
citations
Position: Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
Vincent Conitzer, Rachel Freedman, Jobstq Heitzig et al.
ICML 2024