Poster "transformer architecture" Papers

167 papers found • Page 2 of 4

One-Minute Video Generation with Test-Time Training

Jiarui Xu, Shihao Han, Karan Dalal et al.

CVPR 2025posterarXiv:2504.05298
66
citations

On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery

Renpu Liu, Ruida Zhou, Cong Shen et al.

ICLR 2025posterarXiv:2410.13981
4
citations

On the Optimization and Generalization of Multi-head Attention

Christos Thrampoulidis, Rouzbeh Ghaderi, Hossein Taheri et al.

ICLR 2025posterarXiv:2310.12680
44
citations

Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency

Kelvin Kan, Xingjian Li, Benjamin Zhang et al.

NeurIPS 2025posterarXiv:2505.13499
3
citations

Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning

Baiyuan Chen, Shinji Ito, Masaaki Imaizumi

NeurIPS 2025posterarXiv:2508.16027

Point-SAM: Promptable 3D Segmentation Model for Point Clouds

Yuchen Zhou, Jiayuan Gu, Tung Chiang et al.

ICLR 2025posterarXiv:2406.17741
40
citations

Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models

Zhijian Zhuo, Ya Wang, Yutao Zeng et al.

ICLR 2025posterarXiv:2411.03884
5
citations

Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors

Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.

CVPR 2025posterarXiv:2503.17316
33
citations

RaySt3R: Predicting Novel Depth Maps for Zero-Shot Object Completion

Bardienus Duisterhof, Jan Oberst, Bowen Wen et al.

NeurIPS 2025posterarXiv:2506.05285
4
citations

Revisiting Convolution Architecture in the Realm of DNA Foundation Models

Yu Bo, Weian Mao, Daniel Shao et al.

ICLR 2025posterarXiv:2502.18538
4
citations

SAS: Simulated Attention Score

Chuanyang Zheng, Jiankai Sun, Yihang Gao et al.

NeurIPS 2025posterarXiv:2507.07694
2
citations

S-Crescendo: A Nested Transformer Weaving Framework for Scalable Nonlinear System in S-Domain Representation

Junlang Huang, Chen Hao, Li Luo et al.

NeurIPS 2025posterarXiv:2505.11843

Selective Attention Improves Transformer

Yaniv Leviathan, Matan Kalman, Yossi Matias

ICLR 2025posterarXiv:2410.02703
20
citations

Spiking Neural Networks Need High-Frequency Information

Yuetong Fang, Deming Zhou, Ziqing Wang et al.

NeurIPS 2025posterarXiv:2505.18608

SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition

Zeqi Zheng, Yanchen Huang, Yingchao Yu et al.

ICCV 2025posterarXiv:2503.15986
1
citations

StarTrail: Concentric Ring Sequence Parallelism for Efficient Near-Infinite-Context Transformer Model Training

Ziming Liu, Shaoyu Wang, Shenggan Cheng et al.

NeurIPS 2025posterarXiv:2407.00611
2
citations

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

Vikash Sehwag, Xianghao Kong, Jingtao Li et al.

CVPR 2025posterarXiv:2407.15811
26
citations

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Heming Xia, Yongqi Li, Jun Zhang et al.

ICLR 2025posterarXiv:2410.06916
39
citations

Systematic Outliers in Large Language Models

Yongqi An, Xu Zhao, Tao Yu et al.

ICLR 2025posterarXiv:2502.06415
15
citations

Task Descriptors Help Transformers Learn Linear Models In-Context

Ruomin Huang, Rong Ge

ICLR 2025poster
3
citations

Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context

Taejong Joo, Diego Klabjan

NeurIPS 2025posterarXiv:2502.04580

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Jason Ramapuram, Federico Danieli, Eeshan Gunesh Dhekane et al.

ICLR 2025posterarXiv:2409.04431
34
citations

Towards Neural Scaling Laws for Time Series Foundation Models

Qingren Yao, Chao-Han Huck Yang, Renhe Jiang et al.

ICLR 2025posterarXiv:2410.12360

Transformer Learns Optimal Variable Selection in Group-Sparse Classification

Chenyang Zhang, Xuran Meng, Yuan Cao

ICLR 2025posterarXiv:2504.08638
4
citations

Transformers are almost optimal metalearners for linear classification

Roey Magen, Gal Vardi

NeurIPS 2025posterarXiv:2510.19797
1
citations

Transformers Handle Endogeneity in In-Context Linear Regression

Haodong Liang, Krishna Balasubramanian, Lifeng Lai

ICLR 2025posterarXiv:2410.01265
4
citations

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought

Jianhao Huang, Zixuan Wang, Jason Lee

ICLR 2025posterarXiv:2502.21212
18
citations

Transformers Struggle to Learn to Search

Abulhair Saparov, Srushti Ajay Pawar, Shreyas Pimpalgaonkar et al.

ICLR 2025posterarXiv:2412.04703
15
citations

Transformers without Normalization

Jiachen Zhu, Xinlei Chen, Kaiming He et al.

CVPR 2025posterarXiv:2503.10622
96
citations

UFM: A Simple Path towards Unified Dense Correspondence with Flow

Yuchen Zhang, Nikhil Keetha, Chenwei Lyu et al.

NeurIPS 2025posterarXiv:2506.09278
13
citations

Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study

Xingxuan Zhang, Haoran Wang, Jiansheng Li et al.

ICLR 2025posterarXiv:2503.15579
5
citations

UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping

Wenbo Wang, Fangyun Wei, Lei Zhou et al.

CVPR 2025posterarXiv:2412.02699
15
citations

Unlabeled Data Can Provably Enhance In-Context Learning of Transformers

Renpu Liu, Jing Yang

NeurIPS 2025posterarXiv:2601.10058
1
citations

Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding

Qian Ma, Ruoxiang Xu, Yongqiang Cai

NeurIPS 2025posterarXiv:2511.06376

What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis

Weronika Ormaniec, Felix Dangel, Sidak Pal Singh

ICLR 2025posterarXiv:2410.10986
10
citations

What Makes a Good Diffusion Planner for Decision Making?

Haofei Lu, Dongqi Han, Yifei Shen et al.

ICLR 2025posterarXiv:2503.00535
24
citations

Why In-Context Learning Models are Good Few-Shot Learners?

Shiguang Wu, Yaqing Wang, Quanming Yao

ICLR 2025poster

ZETA: Leveraging $Z$-order Curves for Efficient Top-$k$ Attention

Qiuhao Zeng, Jierui Huang, Peng Lu et al.

ICLR 2025posterarXiv:2501.14577
5
citations

A Comparative Study of Image Restoration Networks for General Backbone Network Design

Xiangyu Chen, Zheyuan Li, Yuandong Pu et al.

ECCV 2024posterarXiv:2310.11881
53
citations

An Incremental Unified Framework for Small Defect Inspection

Jiaqi Tang, Hao Lu, Xiaogang Xu et al.

ECCV 2024posterarXiv:2312.08917
21
citations

A Tale of Tails: Model Collapse as a Change of Scaling Laws

Elvis Dohmatob, Yunzhen Feng, Pu Yang et al.

ICML 2024poster

Attention Meets Post-hoc Interpretability: A Mathematical Perspective

Gianluigi Lopardo, Frederic Precioso, Damien Garreau

ICML 2024poster

Auctionformer: A Unified Deep Learning Algorithm for Solving Equilibrium Strategies in Auction Games

Kexin Huang, Ziqian Chen, xue wang et al.

ICML 2024poster

Breaking through the learning plateaus of in-context learning in Transformer

Jingwen Fu, Tao Yang, Yuwang Wang et al.

ICML 2024poster

CarFormer: Self-Driving with Learned Object-Centric Representations

Shadi Hamdan, Fatma Guney

ECCV 2024posterarXiv:2407.15843
11
citations

CityGuessr: City-Level Video Geo-Localization on a Global Scale

Parth Parag Kulkarni, Gaurav Kumar Nayak, Shah Mubarak

ECCV 2024posterarXiv:2411.06344
9
citations

Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption

Itamar Zimerman, Moran Baruch, Nir Drucker et al.

ICML 2024poster

Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control

Zheng Xiong, Risto Vuorio, Jacob Beck et al.

ICML 2024poster

Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference

Piotr Nawrot, Adrian Łańcucki, Marcin Chochowski et al.

ICML 2024poster

EDformer: Transformer-Based Event Denoising Across Varied Noise Levels

Bin Jiang, Bo Xiong, Bohan Qu et al.

ECCV 2024poster
11
citations