"language model control" Papers
3 papers found
AI Control: Improving Safety Despite Intentional Subversion
Ryan Greenblatt, Buck Shlegeris, Kshitij Sachan et al.
ICML 2024poster
A Language Model’s Guide Through Latent Space
Dimitri von Rütte, Sotiris Anagnostidis, Gregor Bachmann et al.
ICML 2024poster
Successor Features for Efficient Multi-Subject Controlled Text Generation
Meng Cao, Mehdi Fatemi, Jackie Chi Kit Cheung et al.
ICML 2024poster