Zhe Gan

50
Papers
1,103
Total Citations

Papers (50)

Variational Autoencoder for Deep Learning of Images, Labels and Captions

NeurIPS 2016arXiv
813
citations

Triangle Generative Adversarial Networks

NeurIPS 2017arXiv
141
citations

Adversarial Symmetric Variational Autoencoder

NeurIPS 2017arXiv
79
citations

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

ICLR 2025
41
citations

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons

CVPR 2025
23
citations

VAE Learning via Stein Variational Gradient Descent

NeurIPS 2017arXiv
6
citations

AttnGAN: Fine-Grained Text to Image Generation With Attentional Generative Adversarial Networks

CVPR 2018arXiv
0
citations

StoryGAN: A Sequential Conditional GAN for Story Visualization

CVPR 2019
0
citations

Tactical Rewind: Self-Correction via Backtracking in Vision-And-Language Navigation

CVPR 2019
0
citations

BachGAN: High-Resolution Image Synthesis From Salient Object Layout

CVPR 2020arXiv
0
citations

Violin: A Large-Scale Dataset for Video-and-Language Inference

CVPR 2020arXiv
0
citations

Wasserstein Contrastive Representation Distillation

CVPR 2021arXiv
0
citations

SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning

CVPR 2022arXiv
0
citations

An Empirical Study of Training End-to-End Vision-and-Language Transformers

CVPR 2022arXiv
0
citations

Injecting Semantic Concepts Into End-to-End Image Captioning

CVPR 2022arXiv
0
citations

Scaling Up Vision-Language Pre-Training for Image Captioning

CVPR 2022arXiv
0
citations

An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling

CVPR 2023arXiv
0
citations

ReCo: Region-Controlled Text-to-Image Generation

CVPR 2023arXiv
0
citations

LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling

CVPR 2023arXiv
0
citations

Generalized Decoding for Pixel, Image, and Language

CVPR 2023arXiv
0
citations

Non-Contrastive Learning Meets Language-Image Pre-Training

CVPR 2023arXiv
0
citations

Relation-Aware Graph Attention Network for Visual Question Answering

ICCV 2019
0
citations

Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models

ICCV 2021arXiv
0
citations

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

ECCV 2020
0
citations

UNITER: UNiversal Image-TExt Representation Learning

ECCV 2020
0
citations

UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling

ECCV 2022
0
citations

Deep Temporal Sigmoid Belief Networks for Sequence Modeling

NeurIPS 2015
0
citations

Deep Poisson Factor Modeling

NeurIPS 2015
0
citations

Deconvolutional Paragraph Representation Learning

NeurIPS 2017arXiv
0
citations

Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

CVPR 2021arXiv
0
citations

Multimodal Autoregressive Pre-training of Large Vision Encoders

CVPR 2025
0
citations

UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing

ICCV 2025
0
citations

Learning Weight Uncertainty With Stochastic Gradient MCMC for Shape Classification

CVPR 2016
0
citations

StyleNet: Generating Attractive Visual Captions With Styles

CVPR 2017
0
citations

Semantic Compositional Networks for Visual Captioning

CVPR 2017arXiv
0
citations

Adversarial Text Generation via Feature-Mover's Distance

NeurIPS 2018arXiv
0
citations

Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization

NeurIPS 2018
0
citations

Improving Textual Network Learning with Variational Homophilic Embeddings

NeurIPS 2019
0
citations

Large-Scale Adversarial Training for Vision-and-Language Representation Learning

NeurIPS 2020
0
citations

Chasing Sparsity in Vision Transformers: An End-to-End Exploration

NeurIPS 2021
0
citations

Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective

NeurIPS 2021
0
citations

The Elastic Lottery Ticket Hypothesis

NeurIPS 2021
0
citations

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

NeurIPS 2022
0
citations

K-LITE: Learning Transferable Visual Models with External Knowledge

NeurIPS 2022
0
citations

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

NeurIPS 2022
0
citations

Scalable Deep Poisson Factor Analysis for Topic Modeling

ICML 2015
0
citations

Factored Temporal Sigmoid Belief Networks for Sequence Learning

ICML 2016
0
citations

Stochastic Gradient Monomial Gamma Sampler

ICML 2017
0
citations

Adversarial Feature Matching for Text Generation

ICML 2017
0
citations

JointGAN: Multi-Domain Joint Distribution Learning with Generative Adversarial Nets

ICML 2018
0
citations