Poster "transformer architecture" Papers
138 papers found • Page 1 of 3
Adaptive Transformer Programs: Bridging the Gap Between Performance and Interpretability in Transformers
Quoc-Vinh Lai-Dang, Taemin Kang, Seungah Son
An OpenMind for 3D Medical Vision Self-supervised Learning
Tassilo Wald, Constantin Ulrich, Jonathan Suprijadi et al.
Attention-based clustering
Rodrigo Maulen Soto, Pierre Marion, Claire Boyer
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
Mayank Jobanputra, Yana Veitsman, Yash Sarrof et al.
Can Transformers Do Enumerative Geometry?
Baran Hashemi, Roderic Corominas, Alessandro Giacchetto
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices
Hao Yu, Tangyu Jiang, Shuning Jia et al.
Context-Enhanced Memory-Refined Transformer for Online Action Detection
Zhanzhong Pang, Fadime Sener, Angela Yao
Continual Multiple Instance Learning with Enhanced Localization for Histopathological Whole Slide Image Analysis
Byung Hyun Lee, Wongi Jeong, Woojae Han et al.
Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models
Hector Pasten, Felipe Urrutia, Hector Orellana et al.
CrossSpectra: Exploiting Cross-Layer Smoothness for Parameter-Efficient Fine-Tuning
Yifei Zhang, Hao Zhu, Junhao Dong et al.
DiffE2E: Rethinking End-to-End Driving with a Hybrid Diffusion-Regression-Classification Policy
Rui Zhao, Yuze Fan, Ziguo Chen et al.
Differential Transformer
Tianzhu Ye, Li Dong, Yuqing Xia et al.
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
Jia Guo, Shuai Lu, Weihang Zhang et al.
Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers
Lei Chen, Joan Bruna, Alberto Bietti
Dynamic Diffusion Transformer
Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.
Dynamic Semantic-Aware Correlation Modeling for UAV Tracking
Xinyu Zhou, Tongxin Pan, Lingyi Hong et al.
Efficient Concertormer for Image Deblurring and Beyond
Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien et al.
End-to-End Implicit Neural Representations for Classification
Alexander Gielisse, Jan van Gemert
Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels
Pierre Vuillecard, Jean-marc Odobez
Enhancing Image Restoration Transformer via Adaptive Translation Equivariance
JiaKui Hu, Zhengjian Yao, Lujia Jin et al.
ESCAPE: Equivariant Shape Completion via Anchor Point Encoding
Burak Bekci, Nassir Navab, Federico Tombari et al.
Evolutionary Reasoning Does Not Arise in Standard Usage of Protein Language Models
Yasha Ektefaie, Andrew Shen, Lavik Jain et al.
Flatten Graphs as Sequences: Transformers are Scalable Graph Generators
Dexiong Chen, Markus Krimmel, Karsten Borgwardt
FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction
Jiale Xu, Shenghua Gao, Ying Shan
Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Sean McLeish, John Kirchenbauer, David Miller et al.
GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes
Pradyumn Goyal, Dmitrii Petrov, Sheldon Andrews et al.
Grammar Reinforcement Learning: path and cycle counting in graphs with a Context-Free Grammar and Transformer approach
Jason Piquenot, Maxime Berar, Romain Raveaux et al.
HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts
Neil He, Rishabh Anand, Hiren Madhu et al.
Hyperbolic Genome Embeddings
Raiyan Khan, Philippe Chlenski, Itsik Pe'er
Impact of Layer Norm on Memorization and Generalization in Transformers
Rishi Singhal, Jung-Eun Kim
Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads
Zhoutong Wu, Yuan Zhang, Yiming Dong et al.
Learning Crossmodal Interaction Patterns via Attributed Bipartite Graphs for Single-Cell Omics
Xiaotang Wang, Xuanwei Lin, Yun Zhu et al.
LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding
Shen Zhang, Siyuan Liang, Yaning Tan et al.
LMFusion: Adapting Pretrained Language Models for Multimodal Generation
Weijia Shi, Xiaochuang Han, Chunting Zhou et al.
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias
Haian Jin, Hanwen Jiang, Hao Tan et al.
MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism
Zhixiong Nan, Xianghong Li, Tao Xiang et al.
MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm
Ziyan Guo, Zeyu HU, Na Zhao et al.
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
Renpu Liu, Ruida Zhou, Cong Shen et al.
On the Optimization and Generalization of Multi-head Attention
Christos Thrampoulidis, Rouzbeh Ghaderi, Hossein Taheri et al.
Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency
Kelvin Kan, Xingjian Li, Benjamin Zhang et al.
Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning
Baiyuan Chen, Shinji Ito, Masaaki Imaizumi
Point-SAM: Promptable 3D Segmentation Model for Point Clouds
Yuchen Zhou, Jiayuan Gu, Tung Chiang et al.
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
Zhijian Zhuo, Ya Wang, Yutao Zeng et al.
Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors
Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.
Revisiting Convolution Architecture in the Realm of DNA Foundation Models
Yu Bo, Weian Mao, Daniel Shao et al.
SAS: Simulated Attention Score
Chuanyang Zheng, Jiankai Sun, Yihang Gao et al.
Spiking Neural Networks Need High-Frequency Information
Yuetong Fang, Deming Zhou, Ziqing Wang et al.
SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition
Zeqi Zheng, Yanchen Huang, Yingchao Yu et al.
StarTrail: Concentric Ring Sequence Parallelism for Efficient Near-Infinite-Context Transformer Model Training
Ziming Liu, Shaoyu Wang, Shenggan Cheng et al.
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag, Xianghao Kong, Jingtao Li et al.