2025 "language model representations" Papers
2 papers found
Not All Language Model Features Are One-Dimensionally Linear
Josh Engels, Eric Michaud, Isaac Liao et al.
ICLR 2025posterarXiv:2405.14860
89
citations
On Linear Representations and Pretraining Data Frequency in Language Models
Jack Merullo, Noah Smith, Sarah Wiegreffe et al.
ICLR 2025posterarXiv:2504.12459
11
citations