Rotary Masked Autoencoders are Versatile Learners

1citations

arXiv:2505.20535 Project

Citations

#1350

in NeurIPS 2025

of 5858 papers

Authors

Data Points

Authors

Uros Zivanovic Serafina Di Gioia Andre Scaffidi Martín de los Rios Gabriella Contardo Roberto Trotta

Topics

rotary positional embedding masked autoencoders irregular time-series multivariate time-series continuous positional information representation learning cross-modal learning position reconstruction

Abstract

Applying Transformers to irregular time-series typically requires specializations to their baseline architecture, which can result in additional computational overhead and increased method complexity. We present the Rotary Masked Autoencoder (RoMAE), which utilizes the popular Rotary Positional Embedding (RoPE) method for continuous positions. RoMAE is an extension to the Masked Autoencoder (MAE) that enables interpolation and representation learning with multidimensional continuous positional information while avoiding any time-series-specific architectural specializations. We showcase RoMAE's performance on a variety of modalities including irregular and multivariate time-series, images, and audio, demonstrating that RoMAE surpasses specialized time-series architectures on difficult datasets such as the DESC ELAsTiCC Challenge while maintaining MAE's usual performance across other modalities. In addition, we investigate RoMAE's ability to reconstruct the embedded continuous positions, demonstrating that including learned embeddings in the input sequence breaks RoPE's relative position property.

Citation History

Jan 25, 2026

Jan 27, 2026

Jan 31, 2026

1+1