2025 Poster "toxicity mitigation" Papers
2 papers found
Controlling Language and Diffusion Models by Transporting Activations
Pau Rodriguez, Arno Blaas, Michal Klein et al.
ICLR 2025posterarXiv:2410.23054
18
citations
Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing
Yisong Xiao, Aishan Liu, Siyuan Liang et al.
NEURIPS 2025posterarXiv:2510.01243
2
citations