NEURIPS 2025 "language models" Papers
6 papers found
Better Estimation of the Kullback--Leibler Divergence Between Language Models
Afra Amini, Tim Vieira, Ryan Cotterell
NEURIPS 2025posterarXiv:2504.10637
Dense SAE Latents Are Features, Not Bugs
Xiaoqing Sun, Alessandro Stolfo, Joshua Engels et al.
NEURIPS 2025posterarXiv:2506.15679
6
citations
Emergence of Linear Truth Encodings in Language Models
Shauli Ravfogel, Gilad Yehudai, Tal Linzen et al.
NEURIPS 2025posterarXiv:2510.15804
3
citations
Generalizing Verifiable Instruction Following
Valentina Pyatkin, Saumya Malik, Victoria Graf et al.
NEURIPS 2025posterarXiv:2507.02833
35
citations
The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning
Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.
NEURIPS 2025posterarXiv:2506.01347
74
citations
Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It
Yulu Qin, Dheeraj Varghese, Adam Dahlgren Lindström et al.
NEURIPS 2025oralarXiv:2507.13328