2024 "toxicity mitigation" Papers
2 papers found
Learning and Forgetting Unsafe Examples in Large Language Models
Jiachen Zhao, Zhun Deng, David Madras et al.
ICML 2024oralarXiv:2312.12736
Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models
Xavi Suau, Pieter Delobelle, Katherine Metcalf et al.
ICML 2024posterarXiv:2407.12824