"instruction-tuned models" Papers
3 papers found
Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron
Yiran Zhao, Wenxuan Zhang, Yuxi Xie et al.
ICLR 2025poster
A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators
Chen Zhang, L. F. D’Haro, Yiming Chen et al.
AAAI 2024paperarXiv:2312.15407
49
citations
Discovering Bias in Latent Space: An Unsupervised Debiasing Approach
Dyah Adila, Shuai Zhang, Boran Han et al.
ICML 2024poster