Zhe Gan
50
Papers
1,103
Total Citations
Papers (50)
Variational Autoencoder for Deep Learning of Images, Labels and Captions
NeurIPS 2016arXiv
813
citations
Triangle Generative Adversarial Networks
NeurIPS 2017arXiv
141
citations
Adversarial Symmetric Variational Autoencoder
NeurIPS 2017arXiv
79
citations
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
ICLR 2025
41
citations
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
CVPR 2025
23
citations
VAE Learning via Stein Variational Gradient Descent
NeurIPS 2017arXiv
6
citations
AttnGAN: Fine-Grained Text to Image Generation With Attentional Generative Adversarial Networks
CVPR 2018arXiv
0
citations
StoryGAN: A Sequential Conditional GAN for Story Visualization
CVPR 2019
0
citations
Tactical Rewind: Self-Correction via Backtracking in Vision-And-Language Navigation
CVPR 2019
0
citations
BachGAN: High-Resolution Image Synthesis From Salient Object Layout
CVPR 2020arXiv
0
citations
Violin: A Large-Scale Dataset for Video-and-Language Inference
CVPR 2020arXiv
0
citations
Wasserstein Contrastive Representation Distillation
CVPR 2021arXiv
0
citations
SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning
CVPR 2022arXiv
0
citations
An Empirical Study of Training End-to-End Vision-and-Language Transformers
CVPR 2022arXiv
0
citations
Injecting Semantic Concepts Into End-to-End Image Captioning
CVPR 2022arXiv
0
citations
Scaling Up Vision-Language Pre-Training for Image Captioning
CVPR 2022arXiv
0
citations
An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling
CVPR 2023arXiv
0
citations
ReCo: Region-Controlled Text-to-Image Generation
CVPR 2023arXiv
0
citations
LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling
CVPR 2023arXiv
0
citations
Generalized Decoding for Pixel, Image, and Language
CVPR 2023arXiv
0
citations
Non-Contrastive Learning Meets Language-Image Pre-Training
CVPR 2023arXiv
0
citations
Relation-Aware Graph Attention Network for Visual Question Answering
ICCV 2019
0
citations
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models
ICCV 2021arXiv
0
citations
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
ECCV 2020
0
citations
UNITER: UNiversal Image-TExt Representation Learning
ECCV 2020
0
citations
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
ECCV 2022
0
citations
Deep Temporal Sigmoid Belief Networks for Sequence Modeling
NeurIPS 2015
0
citations
Deep Poisson Factor Modeling
NeurIPS 2015
0
citations
Deconvolutional Paragraph Representation Learning
NeurIPS 2017arXiv
0
citations
Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
CVPR 2021arXiv
0
citations
Multimodal Autoregressive Pre-training of Large Vision Encoders
CVPR 2025
0
citations
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
ICCV 2025
0
citations
Learning Weight Uncertainty With Stochastic Gradient MCMC for Shape Classification
CVPR 2016
0
citations
StyleNet: Generating Attractive Visual Captions With Styles
CVPR 2017
0
citations
Semantic Compositional Networks for Visual Captioning
CVPR 2017arXiv
0
citations
Adversarial Text Generation via Feature-Mover's Distance
NeurIPS 2018arXiv
0
citations
Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization
NeurIPS 2018
0
citations
Improving Textual Network Learning with Variational Homophilic Embeddings
NeurIPS 2019
0
citations
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
NeurIPS 2020
0
citations
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
NeurIPS 2021
0
citations
Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective
NeurIPS 2021
0
citations
The Elastic Lottery Ticket Hypothesis
NeurIPS 2021
0
citations
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
NeurIPS 2022
0
citations
K-LITE: Learning Transferable Visual Models with External Knowledge
NeurIPS 2022
0
citations
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
NeurIPS 2022
0
citations
Scalable Deep Poisson Factor Analysis for Topic Modeling
ICML 2015
0
citations
Factored Temporal Sigmoid Belief Networks for Sequence Learning
ICML 2016
0
citations
Stochastic Gradient Monomial Gamma Sampler
ICML 2017
0
citations
Adversarial Feature Matching for Text Generation
ICML 2017
0
citations
JointGAN: Multi-Domain Joint Distribution Learning with Generative Adversarial Nets
ICML 2018
0
citations