Dynamic Layer Tying for Parameter-Efficient Transformers

11citations
11
Citations
#406
in ICLR 2024
of 2297 papers
2
Authors
1
Data Points

Abstract

In the pursuit of reducing the number of trainable parameters in deep transformer networks, we employ Reinforcement Learning to dynamically select layers during training and tie them together. Every few iterations, the RL agent is asked whether to train each layer $i$ independently or to copy the weights of a previous layer $jShow more

Citation History

Jan 28, 2026
11