NEURIPS 2025 "activation steering" Papers
2 papers found
LayerNavigator: Finding Promising Intervention Layers for Efficient Activation Steering in Large Language Models
Hao Sun, Huailiang Peng, Qiong Dai et al.
NEURIPS 2025oral
Steering When Necessary: Flexible Steering Large Language Models with Backtracking
Zifeng Cheng, Jinwei Gan, Zhiwei Jiang et al.
NEURIPS 2025posterarXiv:2508.17621
1
citations