Yu Cheng
59
Papers
955
Total Citations
Papers (59)
MMD GAN: Towards Deeper Understanding of Moment Matching Network
NeurIPS 2017arXiv
763
citations
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
ICML 2025
72
citations
Doubly Convolutional Neural Networks
NeurIPS 2016arXiv
63
citations
On the Recursive Teaching Dimension of VC Classes
NeurIPS 2016
15
citations
Liger: Linearizing Large Language Models to Gated Recurrent Structures
ICML 2025
11
citations
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think
CVPR 2025
10
citations
Scaling Laws for Floating–Point Quantization Training
ICML 2025
5
citations
Scaling Physical Reasoning with the PHYSICS Dataset
NeurIPS 2025
5
citations
Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints
ICCV 2025
3
citations
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
CVPR 2025
3
citations
StickMotion: Generating 3D Human Motions by Drawing a Stickman
CVPR 2025
3
citations
Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision
NeurIPS 2025
2
citations
StoryGAN: A Sequential Conditional GAN for Story Visualization
CVPR 2019
0
citations
Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning
CVPR 2020arXiv
0
citations
BachGAN: High-Resolution Image Synthesis From Salient Object Layout
CVPR 2020arXiv
0
citations
Violin: A Large-Scale Dataset for Video-and-Language Inference
CVPR 2020arXiv
0
citations
UC2: Universal Cross-Lingual Cross-Modal Vision-and-Language Pre-Training
CVPR 2021arXiv
0
citations
Context-Aware Biaffine Localizing Network for Temporal Sentence Grounding
CVPR 2021arXiv
0
citations
Few-Shot Object Detection via Classification Refinement and Distractor Retreatment
CVPR 2021
0
citations
Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks
CVPR 2021arXiv
0
citations
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
CVPR 2022arXiv
0
citations
DSFNet: Dual Space Fusion Network for Occlusion-Robust 3D Dense Face Alignment
CVPR 2023
0
citations
You Are Catching My Attention: Are Vision Transformers Bad Learners Under Backdoor Attacks?
CVPR 2023
0
citations
An Exploration of Parameter Redundancy in Deep Networks With Circulant Projections
ICCV 2015
0
citations
Jointly Attentive Spatial-Temporal Pooling Networks for Video-Based Person Re-Identification
ICCV 2017arXiv
0
citations
Occlusion-Aware Networks for 3D Human Pose Estimation in Video
ICCV 2019
0
citations
Relation-Aware Graph Attention Network for Visual Question Answering
ICCV 2019
0
citations
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
ECCV 2020
0
citations
UNITER: UNiversal Image-TExt Representation Learning
ECCV 2020
0
citations
Point Cloud Domain Adaptation via Masked Local 3D Structure Prediction
ECCV 2022
0
citations
DNA: Improving Few-Shot Transfer Learning with Low-Rank Decomposition and Alignment
ECCV 2022
0
citations
Scalable Learning to Optimize: A Learned Optimizer Can Train Big Models
ECCV 2022
0
citations
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
ECCV 2022
0
citations
Object Tracking using Spatio-Temporal Networks for Future Prediction Location
ECCV 2020
0
citations
LangBridge: Interpreting Image as a Combination of Language Embeddings
ICCV 2025
0
citations
ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
ICCV 2025
0
citations
LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models
ICCV 2025
0
citations
Unsupervised Domain Adaptative Temporal Sentence Localization with Mutual Information Maximization
AAAI 2024
0
citations
SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement
CVPR 2024
0
citations
$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
ICML 2024
0
citations
LIDAO: Towards Limited Interventions for Debiasing (Large) Language Models
ICML 2024
0
citations
Walk and Learn: Facial Attribute Representation Learning From Egocentric Video and Contextual Data
CVPR 2016
0
citations
S3Pool: Pooling With Stochastic Spatial Sampling
CVPR 2017arXiv
0
citations
Fully-Adaptive Feature Sharing in Multi-Task Networks With Applications in Person Attribute Classification
CVPR 2017arXiv
0
citations
Towards Pose Invariant Face Recognition in the Wild
CVPR 2018
0
citations
Robust Learning of Fixed-Structure Bayesian Networks
NeurIPS 2018
0
citations
Dialog-based Interactive Image Retrieval
NeurIPS 2018
0
citations
Distinguishing Distributions When Samples Are Strategically Transformed
NeurIPS 2019
0
citations
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
NeurIPS 2020
0
citations
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
NeurIPS 2021
0
citations
Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective
NeurIPS 2021
0
citations
The Elastic Lottery Ticket Hypothesis
NeurIPS 2021
0
citations
Outlier-Robust Sparse Estimation via Non-Convex Optimization
NeurIPS 2022
0
citations
M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
NeurIPS 2022
0
citations
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
NeurIPS 2023
0
citations
Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing
NeurIPS 2023
0
citations
Robust Matrix Sensing in the Semi-Random Model
NeurIPS 2023
0
citations
Deep Structured Energy Based Models for Anomaly Detection
ICML 2016
0
citations
When Samples Are Strategically Selected
ICML 2019
0
citations