NEURIPS 2025 "ai alignment" Papers
5 papers found
Deep Value Benchmark: Measuring Whether Models Generalize Deep values or Shallow Preferences
Joshua Ashkinaze, Hua Shen, Saipranav Avula et al.
NEURIPS 2025oralarXiv:2511.02109
Efficient and Near-Optimal Algorithm for Contextual Dueling Bandits with Offline Regression Oracles
Aadirupa Saha, Robert Schapire
NEURIPS 2025poster
Impartial Selection with Predictions
NEURIPS 2025arXiv:2510.19002
Learning “Partner-Aware” Collaborators in Multi-Party Collaboration
Abhijnan Nath, Nikhil Krishnaswamy
NEURIPS 2025posterarXiv:2510.22462
Many LLMs Are More Utilitarian Than One
Anita Keshmirian, Razan Baltaji, Babak Hemmatian et al.
NEURIPS 2025oralarXiv:2507.00814
2
citations