"transformer architecture" Papers

116 篇论文 • 第 1 页,共 3 页

ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition

Daolang Huang, Xinyi Wen, Ayush Bharti et al.

NeurIPS 2025spotlightarXiv:2506.07259
2
citations

Can Transformers Do Enumerative Geometry?

Baran Hashemi, Roderic Corominas, Alessandro Giacchetto

ICLR 2025posterarXiv:2408.14915
9
citations

ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices

Hao Yu, Tangyu Jiang, Shuning Jia et al.

CVPR 2025posterarXiv:2506.03737
3
citations

DiffE2E: Rethinking End-to-End Driving with a Hybrid Diffusion-Regression-Classification Policy

Rui Zhao, Yuze Fan, Ziguo Chen et al.

NeurIPS 2025poster

Dynamic Diffusion Transformer

Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.

ICLR 2025posterarXiv:2410.03456
34
citations

Dynamic Semantic-Aware Correlation Modeling for UAV Tracking

Xinyu Zhou, Tongxin Pan, Lingyi Hong et al.

NeurIPS 2025posterarXiv:2510.21351

Efficient Concertormer for Image Deblurring and Beyond

Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien et al.

ICCV 2025posterarXiv:2404.06135

Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels

Pierre Vuillecard, Jean-marc Odobez

CVPR 2025posterarXiv:2502.20249
7
citations

FFN Fusion: Rethinking Sequential Computation in Large Language Models

Akhiad Bercovich, Mohammed Dabbah, Omri Puny et al.

NeurIPS 2025spotlightarXiv:2503.18908
2
citations

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

Sean McLeish, John Kirchenbauer, David Miller et al.

NeurIPS 2025posterarXiv:2502.06857
10
citations

Grammar Reinforcement Learning: path and cycle counting in graphs with a Context-Free Grammar and Transformer approach

Jason Piquenot, Maxime Berar, Romain Raveaux et al.

ICLR 2025poster

Impact of Layer Norm on Memorization and Generalization in Transformers

Rishi Singhal, Jung-Eun Kim

NeurIPS 2025posterarXiv:2511.10566
1
citations

Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads

Zhoutong Wu, Yuan Zhang, Yiming Dong et al.

NeurIPS 2025posterarXiv:2510.16807

MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm

Ziyan Guo, Zeyu HU, Na Zhao et al.

ICCV 2025posterarXiv:2502.02358
12
citations

On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery

Renpu Liu, Ruida Zhou, Cong Shen et al.

ICLR 2025posterarXiv:2410.13981
4
citations

On the Role of Hidden States of Modern Hopfield Network in Transformer

NeurIPS 2025arXiv:2511.20698

Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models

Zhijian Zhuo, Ya Wang, Yutao Zeng et al.

ICLR 2025posterarXiv:2411.03884
5
citations

Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Ge Wu, Shen Zhang, Ruijing Shi et al.

NeurIPS 2025oralarXiv:2507.01467
27
citations

Revisiting Convolution Architecture in the Realm of DNA Foundation Models

Yu Bo, Weian Mao, Daniel Shao et al.

ICLR 2025posterarXiv:2502.18538
4
citations

SAS: Simulated Attention Score

Chuanyang Zheng, Jiankai Sun, Yihang Gao et al.

NeurIPS 2025posterarXiv:2507.07694
2
citations

SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition

Zeqi Zheng, Yanchen Huang, Yingchao Yu et al.

ICCV 2025posterarXiv:2503.15986
1
citations

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

Vikash Sehwag, Xianghao Kong, Jingtao Li et al.

CVPR 2025posterarXiv:2407.15811
26
citations

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Heming Xia, Yongqi Li, Jun Zhang et al.

ICLR 2025posterarXiv:2410.06916
39
citations

Transformer brain encoders explain human high-level visual responses

Hossein Adeli, Sun Minni, Nikolaus Kriegeskorte

NeurIPS 2025spotlightarXiv:2505.17329
4
citations

Transformers are almost optimal metalearners for linear classification

Roey Magen, Gal Vardi

NeurIPS 2025posterarXiv:2510.19797
1
citations

Transformers without Normalization

Jiachen Zhu, Xinlei Chen, Kaiming He et al.

CVPR 2025posterarXiv:2503.10622
96
citations

TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup

Fanxu Meng, Pingzhi Tang, Zengwei Yao et al.

NeurIPS 2025spotlight

UFM: A Simple Path towards Unified Dense Correspondence with Flow

Yuchen Zhang, Nikhil Keetha, Chenwei Lyu et al.

NeurIPS 2025posterarXiv:2506.09278
13
citations

Unlabeled Data Can Provably Enhance In-Context Learning of Transformers

Renpu Liu, Jing Yang

NeurIPS 2025posterarXiv:2601.10058
1
citations

Why In-Context Learning Models are Good Few-Shot Learners?

Shiguang Wu, Yaqing Wang, Quanming Yao

ICLR 2025poster

ALERT-Transformer: Bridging Asynchronous and Synchronous Machine Learning for Real-Time Event-based Spatio-Temporal Data

Carmen Martin-Turrero, Maxence Bouvier, Manuel Breitenstein et al.

ICML 2024oral

An Incremental Unified Framework for Small Defect Inspection

Jiaqi Tang, Hao Lu, Xiaogang Xu et al.

ECCV 2024posterarXiv:2312.08917
21
citations

A Tale of Tails: Model Collapse as a Change of Scaling Laws

Elvis Dohmatob, Yunzhen Feng, Pu Yang et al.

ICML 2024poster

Attention Disturbance and Dual-Path Constraint Network for Occluded Person Re-identification

Jiaer Xia, Lei Tan, Pingyang Dai et al.

AAAI 2024paperarXiv:2303.10976
24
citations

Attention Meets Post-hoc Interpretability: A Mathematical Perspective

Gianluigi Lopardo, Frederic Precioso, Damien Garreau

ICML 2024poster

Auctionformer: A Unified Deep Learning Algorithm for Solving Equilibrium Strategies in Auction Games

Kexin Huang, Ziqian Chen, xue wang et al.

ICML 2024poster

AVSegFormer: Audio-Visual Segmentation with Transformer

Shengyi Gao, Zhe Chen, Guo Chen et al.

AAAI 2024paperarXiv:2307.01146

Breaking through the learning plateaus of in-context learning in Transformer

Jingwen Fu, Tao Yang, Yuwang Wang et al.

ICML 2024poster

Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA

Wentao Mo, Yang Liu

AAAI 2024paperarXiv:2402.15933
26
citations

CarFormer: Self-Driving with Learned Object-Centric Representations

Shadi Hamdan, Fatma Guney

ECCV 2024posterarXiv:2407.15843
11
citations

Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption

Itamar Zimerman, Moran Baruch, Nir Drucker et al.

ICML 2024poster

Correlation Matching Transformation Transformers for UHD Image Restoration

Cong Wang, Jinshan Pan, Wei Wang et al.

AAAI 2024paperarXiv:2406.00629
57
citations

Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control

Zheng Xiong, Risto Vuorio, Jacob Beck et al.

ICML 2024poster

Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference

Piotr Nawrot, Adrian Łańcucki, Marcin Chochowski et al.

ICML 2024poster

Exploring Transformer Extrapolation

Zhen Qin, Yiran Zhong, Hui Deng

AAAI 2024paperarXiv:2307.10156
12
citations

Gated Linear Attention Transformers with Hardware-Efficient Training

Songlin Yang, Bailin Wang, Yikang Shen et al.

ICML 2024poster

GeoMFormer: A General Architecture for Geometric Molecular Representation Learning

Tianlang Chen, Shengjie Luo, Di He et al.

ICML 2024poster

Graph External Attention Enhanced Transformer

Jianqing Liang, Min Chen, Jiye Liang

ICML 2024poster

GridFormer: Point-Grid Transformer for Surface Reconstruction

Shengtao Li, Ge Gao, Yudong Liu et al.

AAAI 2024paperarXiv:2401.02292

HDformer: A Higher

Dimensional Transformer for Detecting Diabetes Utilizing Long-Range Vascular Signals - Ella Lan

AAAI 2024paperarXiv:2303.11340
3
citations
← PreviousNext →