🧬Generative Models

Variational Autoencoders

VAE models and latent variable learning

100 papers2,684 total citations
Compare with other topics
Feb '24 Jan '26551 papers
Also includes: variational autoencoders, vae, vaes, variational inference, latent variable models

Top Papers

#1

Scaling and evaluating sparse autoencoders

Leo Gao, Tom Dupre la Tour, Henk Tillman et al.

ICLR 2025arXiv:2406.04093
sparse autoencoderslanguage model interpretabilityfeature extractionk-sparse autoencoders+4
298
citations
#2

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

Yuan Zhang, Chun-Kai Fan, Junpeng Ma et al.

ICML 2025
190
citations
#3

Revisiting Feature Prediction for Learning Visual Representations from Video

Quentin Garrido, Yann LeCun, Michael Rabbat et al.

ICLR 2025
178
citations
#4

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Jeongho Kim, Gyojung Gu, Minho Park et al.

CVPR 2024
176
citations
#5

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

Haian Jin, Hanwen Jiang, Hao Tan et al.

ICLR 2025
86
citations
#6

AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error

Jonas Ricker, Denis Lukovnikov, Asja Fischer

CVPR 2024
85
citations
#7

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan et al.

ICLR 2025arXiv:2411.14257
sparse autoencodershallucination mechanismsentity recognitionknowledge awareness+3
77
citations
#8

REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers

Xingjian Leng, Jaskirat Singh, Yunzhong Hou et al.

ICCV 2025arXiv:2504.10483
latent diffusion modelsvariational auto-encoderend-to-end trainingrepresentation-alignment loss+3
73
citations
#9

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Ye Yuan, Xueting Li, Yangyi Huang et al.

CVPR 2024
69
citations
#10

Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control

Aleksandar Makelov, Georg Lange, Neel Nanda

ICLR 2025arXiv:2405.08366
sparse autoencodersinterpretabilitysparse dictionary learningindirect object identification+4
63
citations
#11

Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders

Yaohua Zha, Huizhen Ji, Jinmin Li et al.

AAAI 2024arXiv:2312.10726
masked autoencoders3d representation learningpoint cloud pre-trainingtransformer encoder+4
61
citations
#12

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

Adam Karvonen, Can Rager, Johnny Lin et al.

ICML 2025
51
citations
#13

Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better

Danny Driess, Jost Springenberg, Brian Ichter et al.

NeurIPS 2025arXiv:2505.23705
vision-language-action modelscontinuous control policiesdiffusion action expertflow matching+4
46
citations
#14

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Rui Chen, Jianfeng Zhang, Yixun Liang et al.

CVPR 2025
45
citations
#15

EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Theodoros Kouzelis, Ioannis Kakogeorgiou, Spyros Gidaris et al.

ICML 2025
45
citations
#16

Improving the Diffusability of Autoencoders

Ivan Skorokhodov, Sharath Girish, Benran Hu et al.

ICML 2025
34
citations
#17

On the Relation between Trainability and Dequantization of Variational Quantum Learning Models

Elies Gil-Fuster, Casper Gyurik, Adrian Perez-Salinas et al.

ICLR 2025arXiv:2406.07072
variational quantum machine learningparametrized quantum circuitsquantum kernel methodstrainability+3
33
citations
#18

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Hao Chen, Ze Wang, Xiang Li et al.

CVPR 2025
32
citations
#19

Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation

Yiming Wang, Pei Zhang, Baosong Yang et al.

ICLR 2025
32
citations
#20

Rethinking Graph Masked Autoencoders through Alignment and Uniformity

Liang Wang, Xiang Tao, Qiang Liu et al.

AAAI 2024arXiv:2402.07225
graph masked autoencodersgraph contrastive learningself-supervised learningalignment and uniformity+3
32
citations
#21

Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling

Zhihao Li, Yufei Wang, Heliang Zheng et al.

NeurIPS 2025
32
citations
#22

LaWa: Using Latent Space for In-Generation Image Watermarking

Ahmad Rezaei, Mohammad Akbari, Saeed Ranjbar Alvar et al.

ECCV 2024
31
citations
#23

Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity

Santiago Pascual, Chunghsin YEH, Ioannis Tsiamas et al.

ECCV 2024arXiv:2407.10387
video-to-audio generationaudio-visual synchronizationgenerative audio codecmasked generative model+2
31
citations
#24

Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory

Sensen Gao, Xiaojun Jia, Xuhong Ren et al.

ECCV 2024arXiv:2403.12445
vision-language pre-trainingmultimodal adversarial examplesadversarial transferabilityadversarial trajectory+3
31
citations
#25

LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding

Doohyuk Jang, Sihwan Park, June Yong Yang et al.

ICLR 2025
29
citations
#26

From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models

Etowah Adams, Liam Bai, Minji Lee et al.

ICML 2025
28
citations
#27

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Thomas Fel, Ekdeep Singh Lubana, Jacob Prince et al.

ICML 2025
28
citations
#28

Sparse autoencoders reveal selective remapping of visual concepts during adaptation

Hyesu Lim, Jinho Choi, Jaegul Choo et al.

ICLR 2025
27
citations
#29

Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry

Sai Sumedh R. Hindupur, Ekdeep S Lubana, Thomas Fel et al.

NeurIPS 2025
26
citations
#30

MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding

Rongchang Xie, Chen Du, Ping Song et al.

ICCV 2025arXiv:2411.17762
vision-language modelssemantic discrete encodingmultimodal understandingvisual generation+3
25
citations
#31

Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment

Harrish Thasarathan, Julian Forsyth, Thomas Fel et al.

ICML 2025
24
citations
#32

FLD: Fourier Latent Dynamics for Structured Motion Representation and Learning

Chenhao Li, Elijah Stanger-Jones, Steve Heim et al.

ICLR 2024
23
citations
#33

Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget

Johannes Lehner, Benedikt Alkin, Andreas Fürst et al.

AAAI 2024arXiv:2304.10520
masked image modelingmasked autoencodersinstance discriminationcontrastive tuning+4
21
citations
#34

VideoMAC: Video Masked Autoencoders Meet ConvNets

Gensheng Pei, Tao Chen, Xiruo Jiang et al.

CVPR 2024
20
citations
#35

Improved Video VAE for Latent Video Diffusion Model

Pingyu Wu, Kai Zhu, Yu Liu et al.

CVPR 2025arXiv:2411.06449
video vaelatent video diffusiontemporal-spatial compressionkeyframe-based compression+4
19
citations
#36

FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction

Siyu Jiao, Gengwei Zhang, Yinlong Qian et al.

NeurIPS 2025
19
citations
#37

T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory

Daehee Park, Jaeseok Jeong, Sung-Hoon Yoon et al.

CVPR 2024
18
citations
#38

UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving

Jian Zou, Tianyu Huang, Guanglei Yang et al.

ECCV 2024
17
citations
#39

Personalized Federated Collaborative Filtering: A Variational AutoEncoder Approach

Zhiwei Li, Guodong Long, Tianyi Zhou et al.

AAAI 2025
17
citations
#40

Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution

Haiyan Zhao, Heng Zhao, Bo Shen et al.

ICLR 2025arXiv:2410.00153
concept probinglinear classifiersrepresentation spacegaussian concept subspace+3
16
citations
#41

Adaptive Length Image Tokenization via Recurrent Allocation

Shivam Duggal, Phillip Isola, Antonio Torralba et al.

ICLR 2025
16
citations
#42

R-MAE: Regions Meet Masked Autoencoders

Duy-Kien Nguyen, Yanghao Li, Vaibhav Aggarwal et al.

ICLR 2024
16
citations
#43

Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression

Yuan Tian, Guo Lu, Guangtao Zhai

ECCV 2024
14
citations
#44

Explore In-Context Segmentation via Latent Diffusion Models

Chaoyang Wang, Xiangtai Li, Henghui Ding et al.

AAAI 2025
14
citations
#45

Grounding Language Models for Visual Entity Recognition

Zilin Xiao, Ming Gong, Paola Cascante-Bonilla et al.

ECCV 2024
13
citations
#46

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Jian Yang, Dacheng Yin, Yizhou Zhou et al.

CVPR 2025
13
citations
#47

Closed-Loop Unsupervised Representation Disentanglement with $\beta$-VAE Distillation and Diffusion Probabilistic Feedback

Xin Jin, Bohan Li, Baao Xie et al.

ECCV 2024
12
citations
#48

Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters

Kevin Li, Sachin Goyal, João D Semedo et al.

ICLR 2025
12
citations
#49

Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models

Can Demircan, Tankred Saanum, Akshay Jagadish et al.

ICLR 2025
11
citations
#50

ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders

Jefferson Hernandez, Ruben Villegas, Vicente Ordonez

ECCV 2024
11
citations
#51

LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion

Biao Zhang, Peter Wonka

ICLR 2025arXiv:2410.01295
hierarchical autoencoder3d representation learninglatent space compressiondiffusion models+4
11
citations
#52

BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization

Xueyang Zhou, Guiyao Tie, Guowen Zhang et al.

NeurIPS 2025
11
citations
#53

Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation

Kihong Kim, Haneol Lee, Jihye Park et al.

ECCV 2024
11
citations
#54

Lewis's Signaling Game as beta-VAE For Natural Word Lengths and Segments

Ryo Ueda, TADAHIRO TANIGUCHI

ICLR 2024
11
citations
#55

Interaction Asymmetry: A General Principle for Learning Composable Abstractions

Jack Brady, Julius von Kügelgen, Sebastien Lachapelle et al.

ICLR 2025
11
citations
#56

Latent Thought Models with Variational Bayes Inference-Time Computation

Deqian Kong, Minglu Zhao, Dehong Xu et al.

ICML 2025
11
citations
#57

Vector-ICL: In-context Learning with Continuous Vector Representations

Yufan Zhuang, Chandan Singh, Liyuan Liu et al.

ICLR 2025
10
citations
#58

Variational Inference for SDEs Driven by Fractional Noise

Rembert Daems, Manfred Opper, Guillaume Crevecoeur et al.

ICLR 2024
10
citations
#59

Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos

Keqiang Sun, Dori Litvak, Yunzhi Zhang et al.

ECCV 2024
10
citations
#60

VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis

Yumeng Li, William H Beluch, Margret Keuper et al.

ICLR 2025
10
citations
#61

Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words

Gouki Gouki, Hiroki Furuta, Yusuke Iwasawa et al.

ICLR 2025arXiv:2501.06254
sparse autoencoderspolysemous wordsinterpretability of llmsmonosemantic features+3
9
citations
#62

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit

Valérie Costa, Thomas Fel, Ekdeep S Lubana et al.

NeurIPS 2025
9
citations
#63

Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction

Guowei Xu, Jiale Tao, Wen Li et al.

ECCV 2024arXiv:2407.11494
human motion predictionsemantic latent directionsgenerative modelslatent space control+3
9
citations
#64

Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies

Nadav Timor, Jonathan Mamou, Daniel Korat et al.

ICML 2025
9
citations
#65

SILO: Solving Inverse Problems with Latent Operators

Ron Raphaeli, Sean Man, Michael Elad

ICCV 2025
9
citations
#66

Neural structure learning with stochastic differential equations

Benjie Wang, Joel Jennings, Wenbo Gong

ICLR 2024
9
citations
#67

From One to More: Contextual Part Latents for 3D Generation

Shaocong Dong, Lihe Ding, Xiao Chen et al.

ICCV 2025arXiv:2507.08772
3d generationlatent diffusion frameworkspart-aware decompositionmulti-part geometries+4
8
citations
#68

IVP-VAE: Modeling EHR Time Series with Initial Value Problem Solvers

Jingge Xiao, Leonie Basso, Wolfgang Nejdl et al.

AAAI 2024arXiv:2305.06741
irregularly sampled time serieselectronic health recordsinitial value problem solverscontinuous-time models+3
8
citations
#69

TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings

Dawei Yan, Pengcheng Li, Yang Li et al.

AAAI 2025
8
citations
#70

LEAD: Exploring Logit Space Evolution for Model Selection

Zixuan Hu, Xiaotong Li, SHIXIANG TANG et al.

CVPR 2024
7
citations
#71

Identification of Causal Structure with Latent Variables Based on Higher Order Cumulants

Wei Chen, Zhiyi Huang, Ruichu Cai et al.

AAAI 2024arXiv:2312.11934
causal discoverylatent variable modelshigher order cumulantsnon-gaussian data+2
7
citations
#72

Elucidating the Hierarchical Nature of Behavior with Masked Autoencoders

Lucas Stoffl, Andy Bonnetto, Stéphane D'Ascoli et al.

ECCV 2024
masked autoencodershierarchical behavior analysisaction segmentationmotion capture data+4
7
citations
#73

Text-Guided Video Masked Autoencoder

David Fan, Jue Wang, Shuai Liao et al.

ECCV 2024
7
citations
#74

Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics

Shishira R Maiya, Anubhav Anubhav, Matthew Gwilliam et al.

ECCV 2024arXiv:2408.02672
implicit neural representationsvideo compressionsemantic video retrievallatent space alignment+4
7
citations
#75

What Do Latent Action Models Actually Learn?

Chuheng Zhang, Tim Pearce, Pushi Zhang et al.

NeurIPS 2025arXiv:2506.15691
latent action modelscontrollable changesexogenous noiseprincipal component analysis+4
7
citations
#76

Towards the Disappearing Truth: Fine-Grained Joint Causal Influences Learning with Hidden Variable-Driven Causal Hypergraphs

Kun Zhu, Chunhui Zhao

AAAI 2024
7
citations
#77

Robust Multimodal Survival Prediction with Conditional Latent Differentiation Variational AutoEncoder

Junjie Zhou, Jiao Tang, Yingli Zuo et al.

CVPR 2025
7
citations
#78

Expressivity of Neural Networks with Random Weights and Learned Biases

Ezekiel Williams, Alexandre Payeur, Avery Ryoo et al.

ICLR 2025arXiv:2407.00957
universal function approximationrandom weight networkslearned bias optimizationfeedforward neural networks+4
6
citations
#79

Scalable Bayesian Learning with posteriors

Samuel Duffield, Kaelan Donatella, Johnathan Chiu et al.

ICLR 2025
6
citations
#80

Diffusion Bridge AutoEncoders for Unsupervised Representation Learning

Yeongmin Kim, Kwanghyeon Lee, Minsang Park et al.

ICLR 2025arXiv:2405.17111
diffusion modelsunsupervised representation learninginformation bottlenecklatent variable inference+3
6
citations
#81

VAE-Var: Variational Autoencoder-Enhanced Variational Methods for Data Assimilation in Meteorology

Yi Xiao, Qilong Jia, Kun Chen et al.

ICLR 2025
6
citations
#82

SAE-V: Interpreting Multimodal Models for Enhanced Alignment

Hantao Lou, Changye Li, Jiaming Ji et al.

ICML 2025
6
citations
#83

Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

Daniil Laptev, Nikita Balagansky, Yaroslav Aksenov et al.

ICML 2025
6
citations
#84

Linear combinations of latents in generative models: subspaces and beyond

Erik Bodin, Alexandru Stere, Dragos Margineantu et al.

ICLR 2025arXiv:2408.08558
latent variable manipulationgenerative model subspacesdiffusion modelsflow matching+4
6
citations
#85

Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation

YoungJoon Yoo, Jongwon Choi

AAAI 2024arXiv:2312.11532
topic modelinglatent codebooksvector-quantized variational autoencoderdocument generation+4
6
citations
#86

Conditional Latent Coding with Learnable Synthesized Reference for Deep Image Compression

Siqi Wu, Yinda Chen, Dong Liu et al.

AAAI 2025
6
citations
#87

MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization

Siyuan Li, Luyuan Zhang, Zedong Wang et al.

CVPR 2025arXiv:2504.00999
masked image modelingvector quantizationtoken merginglook-up free quantization+4
6
citations
#88

Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations

Lucy Farnik, Tim Lawson, Conor Houghton et al.

ICML 2025
6
citations
#89

Variational Search Distributions

Dan Steinberg, Rafael Oliveira, Cheng Soon Ong et al.

ICLR 2025
5
citations
#90

Dense SAE Latents Are Features, Not Bugs

Xiaoqing Sun, Alessandro Stolfo, Joshua Engels et al.

NeurIPS 2025
5
citations
#91

Structural Estimation of Partially Observed Linear Non-Gaussian Acyclic Model: A Practical Approach with Identifiability

Songyao Jin, Feng Xie, Guangyi Chen et al.

ICLR 2024
5
citations
#92

Latent Diffusion Models with Masked AutoEncoders

Junho Lee, Jeongwoo Shin, Hyungwook Choi et al.

ICCV 2025
5
citations
#93

Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation

Alessandro Palma, Sergei Rybakov, Leon Hetzel et al.

ICML 2025
5
citations
#94

Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs

Kejia Zhang, Keda TAO, Jiasheng Tang et al.

NeurIPS 2025
5
citations
#95

Rendering-Aware Reinforcement Learning for Vector Graphics Generation

Juan Rodriguez, Haotian Zhang, Abhay Puri et al.

NeurIPS 2025
5
citations
#96

DenoiseVAE: Learning Molecule-Adaptive Noise Distributions for Denoising-based 3D Molecular Pre-training

Yurou Liu, Jiahao Chen, Rui Jiao et al.

ICLR 2025
5
citations
#97

Latent Policy Barrier: Learning Robust Visuomotor Policies by Staying In-Distribution

Zhanyi Sun, Shuran Song

NeurIPS 2025
5
citations
#98

Multimodal Variational Autoencoder: A Barycentric View

Peijie Qiu, Wenhui Zhu, Sayantan Kumar et al.

AAAI 2025
5
citations
#99

Random Forest Autoencoders for Guided Representation Learning

Adrien Aumon, Shuang Ni, Myriam Lizotte et al.

NeurIPS 2025
4
citations
#100

ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning

Chau Pham, Juan C. Caicedo, Bryan Plummer

NeurIPS 2025arXiv:2503.19331
masked autoencodersmulti-channel imagingcross-channel learningvision transformers+4
4
citations