"model trustworthiness" Papers
2 papers found
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
Aaron J. Li, Satyapriya Krishna, Hima Lakkaraju
ICLR 2025posterarXiv:2404.18870
10
citations
The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models
Lijun Sheng, Jian Liang, Ran He et al.
NEURIPS 2025posterarXiv:2506.24000
1
citations