Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks

0citations

arXiv:2505.11881 Project

Citations

#1334

in NeurIPS 2025

of 5858 papers

Authors

Data Points

Authors

Giyeong Oh Woohyun Cho Siyeol Kim Suhwan Choi Youngjae Yu

Topics

residual connections orthogonal updates vanishing gradients training stability generalization accuracy vision transformers feature learning

Abstract

Residual connections are pivotal for deep neural networks, enabling greater depth by mitigating vanishing gradients. However, in standard residual updates, the module's output is directly added to the input stream. This can lead to updates that predominantly reinforce or modulate the existing stream direction, potentially underutilizing the module's capacity for learning entirely novel features. In this work, we introduce Orthogonal Residual Update: we decompose the module's output relative to the input stream and add only the component orthogonal to this stream. This design aims to guide modules to contribute primarily new representational directions, fostering richer feature learning while promoting more efficient training. We demonstrate that our orthogonal update strategy improves generalization accuracy and training stability across diverse architectures (ResNetV2, Vision Transformers) and datasets (CIFARs, TinyImageNet, ImageNet-1k), achieving, for instance, a +3.78 pp top-1 accuracy gain for ViT-B on ImageNet-1k.

Citation History

Jan 26, 2026

Jan 27, 2026

Feb 2, 2026