2025 "transformer architecture" Papers

55 papers found • Page 1 of 2

Filters:2025 transformer architecture Clear all

Conference

AAAI 2025 (3,028)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NeurIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,140)oral (1,594)spotlight (1,421)highlight (975)

ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition

Daolang Huang, Xinyi Wen, Ayush Bharti et al.

NeurIPS 2025spotlightarXiv:2506.07259

citations

An OpenMind for 3D Medical Vision Self-supervised Learning

Tassilo Wald, Constantin Ulrich, Jonathan Suprijadi et al.

ICCV 2025posterarXiv:2412.17041

citations

Attention-based clustering

Rodrigo Maulen Soto, Pierre Marion, Claire Boyer

NeurIPS 2025posterarXiv:2505.13112

Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities

Mayank Jobanputra, Yana Veitsman, Yash Sarrof et al.

NeurIPS 2025posterarXiv:2505.21785

citations

Can Transformers Do Enumerative Geometry?

Baran Hashemi, Roderic Corominas, Alessandro Giacchetto

ICLR 2025posterarXiv:2408.14915

citations

ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices

Hao Yu, Tangyu Jiang, Shuning Jia et al.

CVPR 2025posterarXiv:2506.03737

citations

Context-Enhanced Memory-Refined Transformer for Online Action Detection

Zhanzhong Pang, Fadime Sener, Angela Yao

CVPR 2025posterarXiv:2503.18359

citations

Continual Multiple Instance Learning with Enhanced Localization for Histopathological Whole Slide Image Analysis

Byung Hyun Lee, Wongi Jeong, Woojae Han et al.

ICCV 2025posterarXiv:2507.02395

citations

CrossSpectra: Exploiting Cross-Layer Smoothness for Parameter-Efficient Fine-Tuning

Yifei Zhang, Hao Zhu, Junhao Dong et al.

NeurIPS 2025poster

DiffE2E: Rethinking End-to-End Driving with a Hybrid Diffusion-Regression-Classification Policy

Rui Zhao, Yuze Fan, Ziguo Chen et al.

NeurIPS 2025poster

Differential Transformer

Tianzhu Ye, Li Dong, Yuqing Xia et al.

ICLR 2025posterarXiv:2410.05258

Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection

Jia Guo, Shuai Lu, Weihang Zhang et al.

CVPR 2025posterarXiv:2405.14325

citations

Dynamic Diffusion Transformer

Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.

ICLR 2025posterarXiv:2410.03456

citations

Dynamic Semantic-Aware Correlation Modeling for UAV Tracking

Xinyu Zhou, Tongxin Pan, Lingyi Hong et al.

NeurIPS 2025posterarXiv:2510.21351

Efficient Concertormer for Image Deblurring and Beyond

Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien et al.

ICCV 2025posterarXiv:2404.06135

Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels

Pierre Vuillecard, Jean-marc Odobez

CVPR 2025posterarXiv:2502.20249

citations

ESCAPE: Equivariant Shape Completion via Anchor Point Encoding

Burak Bekci, Nassir Navab, Federico Tombari et al.

CVPR 2025posterarXiv:2412.00952

citations

Evolutionary Reasoning Does Not Arise in Standard Usage of Protein Language Models

Yasha Ektefaie, Andrew Shen, Lavik Jain et al.

NeurIPS 2025poster

FFN Fusion: Rethinking Sequential Computation in Large Language Models

Akhiad Bercovich, Mohammed Dabbah, Omri Puny et al.

NeurIPS 2025spotlightarXiv:2503.18908

citations

Flatten Graphs as Sequences: Transformers are Scalable Graph Generators

Dexiong Chen, Markus Krimmel, Karsten Borgwardt

NeurIPS 2025posterarXiv:2502.02216

citations

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

Sean McLeish, John Kirchenbauer, David Miller et al.

NeurIPS 2025posterarXiv:2502.06857

citations

GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes

Pradyumn Goyal, Dmitrii Petrov, Sheldon Andrews et al.

ICCV 2025posterarXiv:2504.02747

citations

Grammar Reinforcement Learning: path and cycle counting in graphs with a Context-Free Grammar and Transformer approach

Jason Piquenot, Maxime Berar, Romain Raveaux et al.

ICLR 2025poster

HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts

Neil He, Rishabh Anand, Hiren Madhu et al.

NeurIPS 2025posterarXiv:2505.24722

citations

Impact of Layer Norm on Memorization and Generalization in Transformers

Rishi Singhal, Jung-Eun Kim

NeurIPS 2025posterarXiv:2511.10566

citations

Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads

Zhoutong Wu, Yuan Zhang, Yiming Dong et al.

NeurIPS 2025posterarXiv:2510.16807

Learning Crossmodal Interaction Patterns via Attributed Bipartite Graphs for Single-Cell Omics

Xiaotang Wang, Xuanwei Lin, Yun Zhu et al.

NeurIPS 2025poster

LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding

Shen Zhang, Siyuan Liang, Yaning Tan et al.

NeurIPS 2025posterarXiv:2503.04344

citations

LMFusion: Adapting Pretrained Language Models for Multimodal Generation

Weijia Shi, Xiaochuang Han, Chunting Zhou et al.

NeurIPS 2025posterarXiv:2412.15188

citations

MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm

Ziyan Guo, Zeyu HU, Na Zhao et al.

ICCV 2025posterarXiv:2502.02358

citations

On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery

Renpu Liu, Ruida Zhou, Cong Shen et al.

ICLR 2025posterarXiv:2410.13981

citations

On the Role of Hidden States of Modern Hopfield Network in Transformer

NeurIPS 2025arXiv:2511.20698

Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning

Baiyuan Chen, Shinji Ito, Masaaki Imaizumi

NeurIPS 2025posterarXiv:2508.16027

Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models

Zhijian Zhuo, Ya Wang, Yutao Zeng et al.

ICLR 2025posterarXiv:2411.03884

citations

Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors

Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.

CVPR 2025posterarXiv:2503.17316

citations

Quantum Doubly Stochastic Transformers

Jannis Born, Filip Skogh, Kahn Rhrissorrakrai et al.

NeurIPS 2025spotlightarXiv:2504.16275

citations

Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Ge Wu, Shen Zhang, Ruijing Shi et al.

NeurIPS 2025oralarXiv:2507.01467

citations

Revisiting Convolution Architecture in the Realm of DNA Foundation Models

Yu Bo, Weian Mao, Daniel Shao et al.

ICLR 2025posterarXiv:2502.18538

citations

SAS: Simulated Attention Score

Chuanyang Zheng, Jiankai Sun, Yihang Gao et al.

NeurIPS 2025posterarXiv:2507.07694

citations

Spiking Neural Networks Need High-Frequency Information

Yuetong Fang, Deming Zhou, Ziqing Wang et al.

NeurIPS 2025posterarXiv:2505.18608

SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition

Zeqi Zheng, Yanchen Huang, Yingchao Yu et al.

ICCV 2025posterarXiv:2503.15986

citations

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

Vikash Sehwag, Xianghao Kong, Jingtao Li et al.

CVPR 2025posterarXiv:2407.15811

citations

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Heming Xia, Yongqi Li, Jun Zhang et al.

ICLR 2025posterarXiv:2410.06916

citations

Systematic Outliers in Large Language Models

Yongqi An, Xu Zhao, Tao Yu et al.

ICLR 2025posterarXiv:2502.06415

citations

Task Descriptors Help Transformers Learn Linear Models In-Context

Ruomin Huang, Rong Ge

ICLR 2025poster

citations

Transformer brain encoders explain human high-level visual responses

Hossein Adeli, Sun Minni, Nikolaus Kriegeskorte

NeurIPS 2025spotlightarXiv:2505.17329

citations

Transformers are almost optimal metalearners for linear classification

Roey Magen, Gal Vardi

NeurIPS 2025posterarXiv:2510.19797

citations

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought

Jianhao Huang, Zixuan Wang, Jason Lee

ICLR 2025posterarXiv:2502.21212

citations

Transformers without Normalization

Jiachen Zhu, Xinlei Chen, Kaiming He et al.

CVPR 2025posterarXiv:2503.10622

citations

TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup

Fanxu Meng, Pingzhi Tang, Zengwei Yao et al.

NeurIPS 2025spotlight

← Previous

1 2