Learning Multimodal Latent Generative Models with Energy-Based Prior

4
Citations
#696
in ECCV 2024
of 2387 papers
4
Authors
2
Data Points

Abstract

Multimodal models have gained increasing popularity recently. Many works have been proposed to learn the representations for different modalities. The representation can learn shared information from these domains, leading to increased and coherent joint and cross-generation. However, these works mainly considered standard Gaussian or Laplacian as their prior distribution. It can be challenging for the uni-modal and non-informative distribution to capture all the information from multiple data types. Meanwhile, energy-based models (EBM) have shown their effectiveness in multiple tasks due to their expressiveness and flexibility. But its capacity has yet to be discovered for the multimodal generative models. In this paper, we propose a novel framework to train multimodal latent generative models together with the energy-based models. The proposed method can lead to more expressive and informative prior which can better capture the information within multiple modalities. Our experiments showed that our model is effective and can increase generation coherence and latent classification for different multimodal datasets.

Citation History

Jan 26, 2026
4
Jan 27, 2026
4