"activation steering" Papers
6 papers found
Conference
Controlling Language and Diffusion Models by Transporting Activations
Pau Rodriguez, Arno Blaas, Michal Klein et al.
ICLR 2025arXiv:2410.23054
22
citations
LayerNavigator: Finding Promising Intervention Layers for Efficient Activation Steering in Large Language Models
Hao Sun, Huailiang Peng, Qiong Dai et al.
NEURIPS 2025oral
SCANS: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering
Zouying Cao, Yifei Yang, Hai Zhao
AAAI 2025paperarXiv:2408.11491
23
citations
Steering Protein Language Models
Long-Kai Huang, Rongyi Zhu, Bing He et al.
ICML 2025arXiv:2509.07983
3
citations
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought'' Control
Hannah Cyberey, David Evans
COLM 2025paper
10
citations
Steering When Necessary: Flexible Steering Large Language Models with Backtracking
Zifeng Cheng, Jinwei Gan, Zhiwei Jiang et al.
NEURIPS 2025arXiv:2508.17621
1
citations