2025 Oral "ai alignment" Papers
2 papers found
Deep Value Benchmark: Measuring Whether Models Generalize Deep values or Shallow Preferences
Joshua Ashkinaze, Hua Shen, Saipranav Avula et al.
NeurIPS 2025oralarXiv:2511.02109
Many LLMs Are More Utilitarian Than One
Anita Keshmirian, Razan Baltaji, Babak Hemmatian et al.
NeurIPS 2025oralarXiv:2507.00814
2
citations