Yu Cheng

59
Papers
955
Total Citations

Papers (59)

MMD GAN: Towards Deeper Understanding of Moment Matching Network

NeurIPS 2017arXiv
763
citations

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

ICML 2025
72
citations

Doubly Convolutional Neural Networks

NeurIPS 2016arXiv
63
citations

On the Recursive Teaching Dimension of VC Classes

NeurIPS 2016
15
citations

Liger: Linearizing Large Language Models to Gated Recurrent Structures

ICML 2025
11
citations

Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think

CVPR 2025
10
citations

Scaling Laws for Floating–Point Quantization Training

ICML 2025
5
citations

Scaling Physical Reasoning with the PHYSICS Dataset

NeurIPS 2025
5
citations

Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints

ICCV 2025
3
citations

From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration

CVPR 2025
3
citations

StickMotion: Generating 3D Human Motions by Drawing a Stickman

CVPR 2025
3
citations

Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision

NeurIPS 2025
2
citations

StoryGAN: A Sequential Conditional GAN for Story Visualization

CVPR 2019
0
citations

Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning

CVPR 2020arXiv
0
citations

BachGAN: High-Resolution Image Synthesis From Salient Object Layout

CVPR 2020arXiv
0
citations

Violin: A Large-Scale Dataset for Video-and-Language Inference

CVPR 2020arXiv
0
citations

UC2: Universal Cross-Lingual Cross-Modal Vision-and-Language Pre-Training

CVPR 2021arXiv
0
citations

Context-Aware Biaffine Localizing Network for Temporal Sentence Grounding

CVPR 2021arXiv
0
citations

Few-Shot Object Detection via Classification Refinement and Distractor Retreatment

CVPR 2021
0
citations

Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

CVPR 2021arXiv
0
citations

The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy

CVPR 2022arXiv
0
citations

DSFNet: Dual Space Fusion Network for Occlusion-Robust 3D Dense Face Alignment

CVPR 2023
0
citations

You Are Catching My Attention: Are Vision Transformers Bad Learners Under Backdoor Attacks?

CVPR 2023
0
citations

An Exploration of Parameter Redundancy in Deep Networks With Circulant Projections

ICCV 2015
0
citations

Jointly Attentive Spatial-Temporal Pooling Networks for Video-Based Person Re-Identification

ICCV 2017arXiv
0
citations

Occlusion-Aware Networks for 3D Human Pose Estimation in Video

ICCV 2019
0
citations

Relation-Aware Graph Attention Network for Visual Question Answering

ICCV 2019
0
citations

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

ECCV 2020
0
citations

UNITER: UNiversal Image-TExt Representation Learning

ECCV 2020
0
citations

Point Cloud Domain Adaptation via Masked Local 3D Structure Prediction

ECCV 2022
0
citations

DNA: Improving Few-Shot Transfer Learning with Low-Rank Decomposition and Alignment

ECCV 2022
0
citations

Scalable Learning to Optimize: A Learned Optimizer Can Train Big Models

ECCV 2022
0
citations

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

ECCV 2022
0
citations

Object Tracking using Spatio-Temporal Networks for Future Prediction Location

ECCV 2020
0
citations

LangBridge: Interpreting Image as a Combination of Language Embeddings

ICCV 2025
0
citations

ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning

ICCV 2025
0
citations

LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models

ICCV 2025
0
citations

Unsupervised Domain Adaptative Temporal Sentence Localization with Mutual Information Maximization

AAAI 2024
0
citations

SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement

CVPR 2024
0
citations

$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts

ICML 2024
0
citations

LIDAO: Towards Limited Interventions for Debiasing (Large) Language Models

ICML 2024
0
citations

Walk and Learn: Facial Attribute Representation Learning From Egocentric Video and Contextual Data

CVPR 2016
0
citations

S3Pool: Pooling With Stochastic Spatial Sampling

CVPR 2017arXiv
0
citations

Fully-Adaptive Feature Sharing in Multi-Task Networks With Applications in Person Attribute Classification

CVPR 2017arXiv
0
citations

Towards Pose Invariant Face Recognition in the Wild

CVPR 2018
0
citations

Robust Learning of Fixed-Structure Bayesian Networks

NeurIPS 2018
0
citations

Dialog-based Interactive Image Retrieval

NeurIPS 2018
0
citations

Distinguishing Distributions When Samples Are Strategically Transformed

NeurIPS 2019
0
citations

Large-Scale Adversarial Training for Vision-and-Language Representation Learning

NeurIPS 2020
0
citations

Chasing Sparsity in Vision Transformers: An End-to-End Exploration

NeurIPS 2021
0
citations

Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective

NeurIPS 2021
0
citations

The Elastic Lottery Ticket Hypothesis

NeurIPS 2021
0
citations

Outlier-Robust Sparse Estimation via Non-Convex Optimization

NeurIPS 2022
0
citations

M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

NeurIPS 2022
0
citations

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

NeurIPS 2023
0
citations

Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing

NeurIPS 2023
0
citations

Robust Matrix Sensing in the Semi-Random Model

NeurIPS 2023
0
citations

Deep Structured Energy Based Models for Anomaly Detection

ICML 2016
0
citations

When Samples Are Strategically Selected

ICML 2019
0
citations