Yu Cheng

59

Papers

955

Total Citations

Papers (59)

MMD GAN: Towards Deeper Understanding of Moment Matching Network

NeurIPS 2017arXiv

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Doubly Convolutional Neural Networks

NeurIPS 2016arXiv

On the Recursive Teaching Dimension of VC Classes

Liger: Linearizing Large Language Models to Gated Recurrent Structures

Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think

Scaling Laws for Floating–Point Quantization Training

Scaling Physical Reasoning with the PHYSICS Dataset

Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints

From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration

StickMotion: Generating 3D Human Motions by Drawing a Stickman

Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision

StoryGAN: A Sequential Conditional GAN for Story Visualization

Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning

BachGAN: High-Resolution Image Synthesis From Salient Object Layout

Violin: A Large-Scale Dataset for Video-and-Language Inference

UC2: Universal Cross-Lingual Cross-Modal Vision-and-Language Pre-Training

Context-Aware Biaffine Localizing Network for Temporal Sentence Grounding

Few-Shot Object Detection via Classification Refinement and Distractor Retreatment

Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy

DSFNet: Dual Space Fusion Network for Occlusion-Robust 3D Dense Face Alignment

You Are Catching My Attention: Are Vision Transformers Bad Learners Under Backdoor Attacks?

An Exploration of Parameter Redundancy in Deep Networks With Circulant Projections

Jointly Attentive Spatial-Temporal Pooling Networks for Video-Based Person Re-Identification

Occlusion-Aware Networks for 3D Human Pose Estimation in Video

Relation-Aware Graph Attention Network for Visual Question Answering

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

UNITER: UNiversal Image-TExt Representation Learning

Point Cloud Domain Adaptation via Masked Local 3D Structure Prediction

DNA: Improving Few-Shot Transfer Learning with Low-Rank Decomposition and Alignment

Scalable Learning to Optimize: A Learned Optimizer Can Train Big Models

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

Object Tracking using Spatio-Temporal Networks for Future Prediction Location

LangBridge: Interpreting Image as a Combination of Language Embeddings

ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning

LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models

Unsupervised Domain Adaptative Temporal Sentence Localization with Mutual Information Maximization

SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement

$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts

LIDAO: Towards Limited Interventions for Debiasing (Large) Language Models

Walk and Learn: Facial Attribute Representation Learning From Egocentric Video and Contextual Data

S3Pool: Pooling With Stochastic Spatial Sampling

Fully-Adaptive Feature Sharing in Multi-Task Networks With Applications in Person Attribute Classification

Towards Pose Invariant Face Recognition in the Wild

Robust Learning of Fixed-Structure Bayesian Networks

Dialog-based Interactive Image Retrieval

Distinguishing Distributions When Samples Are Strategically Transformed

Large-Scale Adversarial Training for Vision-and-Language Representation Learning

Chasing Sparsity in Vision Transformers: An End-to-End Exploration

Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective

The Elastic Lottery Ticket Hypothesis

Outlier-Robust Sparse Estimation via Non-Convex Optimization

M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing

Robust Matrix Sensing in the Semi-Random Model

Deep Structured Energy Based Models for Anomaly Detection

When Samples Are Strategically Selected