"toxicity mitigation" Papers

4 papers found

Filters:toxicity mitigation Clear all

Conference

AAAI 2025 (3,028)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,140)oral (1,594)spotlight (1,421)highlight (975)

Controlling Language and Diffusion Models by Transporting Activations

Pau Rodriguez, Arno Blaas, Michal Klein et al.

ICLR 2025posterarXiv:2410.23054

Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing

Yisong Xiao, Aishan Liu, Siyuan Liang et al.

NEURIPS 2025posterarXiv:2510.01243

Learning and Forgetting Unsafe Examples in Large Language Models

Jiachen Zhao, Zhun Deng, David Madras et al.

ICML 2024oralarXiv:2312.12736

Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models

Xavi Suau, Pieter Delobelle, Katherine Metcalf et al.

ICML 2024posterarXiv:2407.12824