"transformer models" Papers

15 papers found

Conference

AAAI 2025 (3,028)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NeurIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,140)oral (1,594)spotlight (1,421)highlight (975)

A multiscale analysis of mean-field transformers in the moderate interaction regime

Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi

NeurIPS 2025oralarXiv:2509.25040

citations

Can In-context Learning Really Generalize to Out-of-distribution Tasks?

Qixun Wang, Yifei Wang, Xianghua Ying et al.

ICLR 2025posterarXiv:2410.09695

citations

Geometry of Decision Making in Language Models

Abhinav Joshi, Divyanshu Bhatt, Ashutosh Modi

NeurIPS 2025posterarXiv:2511.20315

LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty

Christoforos N. Spartalis, Theodoros Semertzidis, Efstratios Gavves et al.

CVPR 2025posterarXiv:2503.18314

citations

Mixture of Parrots: Experts improve memorization more than reasoning

Samy Jelassi, Clara Mohri, David Brandfonbrener et al.

ICLR 2025posterarXiv:2410.19034

citations

Self-Verifying Reflection Helps Transformers with CoT Reasoning

Zhongwei Yu, Wannian Xia, Xue Yan et al.

NeurIPS 2025posterarXiv:2510.12157

citations

Case-Based or Rule-Based: How Do Transformers Do the Math?

Yi Hu, Xiaojuan Tang, Haotong Yang et al.

ICML 2024poster

Delving into Differentially Private Transformer

Youlong Ding, Xueyang Wu, Yining meng et al.

ICML 2024poster

FrameQuant: Flexible Low-Bit Quantization for Transformers

Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.

ICML 2024poster

Interpretability Illusions in the Generalization of Simplified Models

Dan Friedman, Andrew Lampinen, Lucas Dixon et al.

ICML 2024poster

Learning Associative Memories with Gradient Descent

Vivien Cabannnes, Berfin Simsek, Alberto Bietti

ICML 2024poster

MoMo: Momentum Models for Adaptive Learning Rates

Fabian Schaipp, Ruben Ohana, Michael Eickenberg et al.

ICML 2024poster

Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model

Mikail Khona, Maya Okawa, Jan Hula et al.

ICML 2024poster

Vectorized Conditional Neural Fields: A Framework for Solving Time-dependent Parametric Partial Differential Equations

Jan Hagnberger, Marimuthu Kalimuthu, Daniel Musekamp et al.

ICML 2024oral

What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation

Aaditya Singh, Ted Moskovitz, Feilx Hill et al.

ICML 2024spotlight