by Hima Lakkaraju Papers
3 papers found
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
Zhenting Qi, Hanlin Zhang, Eric P Xing et al.
ICLR 2025posterarXiv:2402.17840
47
citations
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
Aaron J. Li, Satyapriya Krishna, Hima Lakkaraju
ICLR 2025posterarXiv:2404.18870
10
citations
Quantifying Generalization Complexity for Large Language Models
Zhenting Qi, Hongyin Luo, Xuliang Huang et al.
ICLR 2025poster