"transformer architectures" Papers
14 papers found
Attention on the Sphere
Boris Bonev, Max Rietmann, Andrea Paris et al.
NeurIPS 2025posterarXiv:2505.11157
Do ImageNet-trained Models Learn Shortcuts? The Impact of Frequency Shortcuts on Generalization
Shunxin Wang, Raymond Veldhuis, Nicola Strisciuglio
CVPR 2025posterarXiv:2503.03519
2
citations
EUGens: Efficient, Unified and General Dense Layers
Sang Min Kim, Byeongchan Kim, Arijit Sehanobish et al.
NeurIPS 2025poster
Learning in Compact Spaces with Approximately Normalized Transformer
Jörg Franke, Urs Spiegelhalter, Marianna Nezhurina et al.
NeurIPS 2025posterarXiv:2505.22014
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Nikola Zubic, Federico Soldà, Aurelio Sulser et al.
ICLR 2025posterarXiv:2405.16674
17
citations
L-SWAG: Layer-Sample Wise Activation with Gradients Information for Zero-Shot NAS on Vision Transformers
Sofia Casarin, Sergio Escalera, Oswald Lanz
CVPR 2025posterarXiv:2505.07300
2
citations
Optimal Brain Apoptosis
Mingyuan Sun, Zheng Fang, Jiaxu Wang et al.
ICLR 2025posterarXiv:2502.17941
3
citations
Scaling and context steer LLMs along the same computational path as the human brain
Joséphine Raugel, Jérémy Rapin, Stéphane d'Ascoli et al.
NeurIPS 2025oralarXiv:2512.01591
All-in-one simulation-based inference
Manuel Gloeckler, Michael Deistler, Christian Weilbach et al.
ICML 2024poster
Controllable Prompt Tuning For Balancing Group Distributional Robustness
Hoang Phan, Andrew Wilson, Qi Lei
ICML 2024poster
Improving Token-Based World Models with Parallel Observation Prediction
Lior Cohen, Kaixin Wang, Bingyi Kang et al.
ICML 2024poster
Loss Shaping Constraints for Long-Term Time Series Forecasting
Ignacio Hounie, Javier Porras-Valenzuela, Alejandro Ribeiro
ICML 2024poster
Outlier-aware Slicing for Post-Training Quantization in Vision Transformer
Yuexiao Ma, Huixia Li, Xiawu Zheng et al.
ICML 2024poster
Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation
Yibo Yang, Xiaojie Li, Motasem Alfarra et al.
ICML 2024poster