Ming-Hsuan Yang

199
Papers
2,852
Total Citations

Papers (199)

Universal Style Transfer via Feature Transforms

NeurIPS 2017arXiv
1,083
citations

Language Model Beats Diffusion - Tokenizer is key to visual generation

ICLR 2024
525
citations

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

CVPR 2024
341
citations

Learning Affinity via Spatial Propagation Networks

NeurIPS 2017arXiv
300
citations

Every Pixel Matters: Center-aware Feature Alignment for Domain Adaptive Object Detector

ECCV 2020
246
citations

VidToMe: Video Token Merging for Zero-Shot Video Editing

CVPR 2024
89
citations

RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval

ECCV 2020
56
citations

Exploiting Diffusion Prior for Generalizable Dense Prediction

CVPR 2024
42
citations

Multi-subject Open-set Personalization in Video Generation

CVPR 2025arXiv
40
citations

Calibrated Multi-Preference Optimization for Aligning Diffusion Models

CVPR 2025
24
citations

Efficient Visual State Space Model for Image Deblurring

CVPR 2025
23
citations

Controllable Image Synthesis via SegVAE

ECCV 2020
23
citations

CSL: Class-Agnostic Structure-Constrained Learning for Segmentation including the Unseen

AAAI 2024arXiv
15
citations

AutoOcc: Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting

ICCV 2025
9
citations

OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection

NeurIPS 2025
8
citations

Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation

CVPR 2025
8
citations

Cropper: Vision-Language Model for Image Cropping through In-Context Learning

CVPR 2025
5
citations

Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance

CVPR 2024
4
citations

MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

ICCV 2025
3
citations

HoliGS: Holistic Gaussian Splatting for Embodied View Synthesis

NeurIPS 2025
3
citations

Learning Deblurring Texture Prior from Unpaired Data with Diffusion Model

ICCV 2025
2
citations

CompleteMe: Reference-based Human Image Completion

ICCV 2025
1
citations

From Prompt to Progression: Taming Video Diffusion Models for Seamless Attribute Transition

ICCV 2025
1
citations

Toward Material-Agnostic System Identification from Videos

ICCV 2025
1
citations

GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting

ICML 2024
0
citations

VideoPoet: A Large Language Model for Zero-Shot Video Generation

ICML 2024
0
citations

VideoPrism: A Foundational Visual Encoder for Video Understanding

ICML 2024
0
citations

Structural Sparse Tracking

CVPR 2015
0
citations

Adaptive Region Pooling for Object Detection

CVPR 2015
0
citations

PatchCut: Data-Driven Object Segmentation via Local Shape Transfer

CVPR 2015
0
citations

Salient Object Detection via Bootstrap Learning

CVPR 2015
0
citations

JOTS: Joint Online Tracking and Segmentation

CVPR 2015
0
citations

Deep Networks for Saliency Detection via Local Estimation and Global Search

CVPR 2015
0
citations

Multi-Objective Convolutional Learning for Face Labeling

CVPR 2015
0
citations

Multi-Instance Object Segmentation With Occlusion Handling

CVPR 2015
0
citations

Long-Term Correlation Tracking

CVPR 2015
0
citations

Object Contour Detection With a Fully Convolutional Encoder-Decoder Network

CVPR 2016
0
citations

Soft-Segmentation Guided Object Motion Deblurring

CVPR 2016
0
citations

Online Multi-Object Tracking via Structural Constraint Event Aggregation

CVPR 2016
0
citations

Blind Image Deblurring Using Dark Channel Prior

CVPR 2016
0
citations

A Comparative Study for Single Image Blind Deblurring

CVPR 2016
0
citations

Image Deblurring Using Smartphone Inertial Sensors

CVPR 2016
0
citations

Robust Kernel Estimation With Outliers Handling for Image Deblurring

CVPR 2016
0
citations

Weakly Supervised Object Localization With Progressive Domain Adaptation

CVPR 2016
0
citations

Video Segmentation via Object Flow

CVPR 2016
0
citations

Object Tracking via Dual Linear Structured SVM and Explicit Feature Map

CVPR 2016
0
citations

Hedged Deep Tracking

CVPR 2016
0
citations

Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution

CVPR 2017arXiv
0
citations

Deep Image Harmonization

CVPR 2017arXiv
0
citations

Learning Fully Convolutional Networks for Iterative Non-Blind Deconvolution

CVPR 2017arXiv
0
citations

Generative Face Completion

CVPR 2017arXiv
0
citations

Diversified Texture Synthesis With Feed-Forward Networks

CVPR 2017arXiv
0
citations

Multi-Task Correlation Particle Filter for Robust Object Tracking

CVPR 2017
0
citations

Correlation Tracking via Joint Discrimination and Reliability Learning

CVPR 2018arXiv
0
citations

Learning Superpixels With Segmentation-Aware Affinity Loss

CVPR 2018
0
citations

Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks

CVPR 2018
0
citations

SPLATNet: Sparse Lattice Networks for Point Cloud Processing

CVPR 2018arXiv
0
citations

Learning Dual Convolutional Neural Networks for Low-Level Vision

CVPR 2018arXiv
0
citations

PiCANet: Learning Pixel-Wise Contextual Attention for Saliency Detection

CVPR 2018arXiv
0
citations

Gated Fusion Network for Single Image Dehazing

CVPR 2018arXiv
0
citations

Learning to Localize Sound Source in Visual Scenes

CVPR 2018arXiv
0
citations

Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking

CVPR 2018arXiv
0
citations

Learning a Discriminative Prior for Blind Image Deblurring

CVPR 2018arXiv
0
citations

Fast and Accurate Online Video Object Segmentation via Tracking Parts

CVPR 2018arXiv
0
citations

Learning to Adapt Structured Output Space for Semantic Segmentation

CVPR 2018arXiv
0
citations

Weakly Supervised Coupled Networks for Visual Sentiment Analysis

CVPR 2018
0
citations

Deep Semantic Face Deblurring

CVPR 2018arXiv
0
citations

Learning Spatial-Aware Regressions for Visual Tracking

CVPR 2018arXiv
0
citations

VITAL: VIsual Tracking via Adversarial Learning

CVPR 2018arXiv
0
citations

Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation

CVPR 2018arXiv
0
citations

SCOPS: Self-Supervised Co-Part Segmentation

CVPR 2019
0
citations

Target-Aware Deep Tracking

CVPR 2019
0
citations

Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis

CVPR 2019
0
citations

Im2Pencil: Controllable Pencil Illustration From Photographs

CVPR 2019
0
citations

Spatially Variant Linear Representation Models for Joint Filtering

CVPR 2019
0
citations

CrDoCo: Pixel-Level Domain Transfer With Cross-Domain Consistency

CVPR 2019
0
citations

Depth-Aware Video Frame Interpolation

CVPR 2019
0
citations

Learning Linear Transformations for Fast Image and Video Style Transfer

CVPR 2019
0
citations

Inserting Videos Into Videos

CVPR 2019
0
citations

Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments

CVPR 2019
0
citations

Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline

CVPR 2020arXiv
0
citations

Composing Good Shots by Exploiting Mutual Relations

CVPR 2020
0
citations

CycleISP: Real Image Restoration via Improved Data Synthesis

CVPR 2020arXiv
0
citations

Multi-Scale Boosted Dehazing Network With Dense Feature Fusion

CVPR 2020arXiv
0
citations

Collaborative Distillation for Ultra-Resolution Universal Style Transfer

CVPR 2020arXiv
0
citations

Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition From a Domain Adaptation Perspective

CVPR 2020arXiv
0
citations

Weakly-Supervised Semantic Segmentation via Sub-Category Exploration

CVPR 2020arXiv
0
citations

Learning to See Through Obstructions

CVPR 2020
0
citations

ReMix: Towards Image-to-Image Translation With Limited Data

CVPR 2021arXiv
0
citations

Regularizing Generative Adversarial Networks Under Limited Data

CVPR 2021arXiv
0
citations

Decoupled Dynamic Filter Networks

CVPR 2021arXiv
0
citations

Spatiotemporal Contrastive Video Representation Learning

CVPR 2021arXiv
0
citations

Multi-Stage Progressive Image Restoration

CVPR 2021arXiv
0
citations

Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision

CVPR 2022arXiv
0
citations

Video Frame Interpolation Transformer

CVPR 2022arXiv
0
citations

Burst Image Restoration and Enhancement

CVPR 2022arXiv
0
citations

Restormer: Efficient Transformer for High-Resolution Image Restoration

CVPR 2022arXiv
0
citations

Hierarchical Modular Network for Video Captioning

CVPR 2022arXiv
0
citations

InOut: Diverse Image Outpainting via GAN Inversion

CVPR 2022
0
citations

Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection

CVPR 2023arXiv
0
citations

Burstormer: Burst Image Restoration and Enhancement Transformer

CVPR 2023arXiv
0
citations

Self-Supervised Super-Plane for Neural 3D Reconstruction

CVPR 2023
0
citations

MAGVIT: Masked Generative Video Transformer

CVPR 2023arXiv
0
citations

Improving Zero-Shot Generalization and Robustness of Multi-Modal Models

CVPR 2023arXiv
0
citations

Learning To Dub Movies via Hierarchical Prosody Models

CVPR 2023arXiv
0
citations

Self-Supervised AutoFlow

CVPR 2023arXiv
0
citations

Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery From Sparse Image Ensemble

CVPR 2023
0
citations

What Makes an Object Memorable?

ICCV 2015
0
citations

Fast and Accurate Head Pose Estimation via Random Projection Forests

ICCV 2015
0
citations

Hierarchical Convolutional Features for Visual Tracking

ICCV 2015
0
citations

Learning to Super-Resolve Blurry Face and Text Images

ICCV 2017
0
citations

Unsupervised Representation Learning by Sorting Sequences

ICCV 2017arXiv
0
citations

SegFlow: Joint Learning for Video Object Segmentation and Optical Flow

ICCV 2017arXiv
0
citations

Learning Discriminative Data Fitting Functions for Blind Image Deblurring

ICCV 2017
0
citations

Video Deblurring via Semantic Segmentation and Pixel-Wise Non-Linear Kernel

ICCV 2017arXiv
0
citations

Blind Image Deblurring With Outlier Handling

ICCV 2017
0
citations

CREST: Convolutional Residual Learning for Visual Tracking

ICCV 2017arXiv
0
citations

Scene Parsing With Global Context Embedding

ICCV 2017arXiv
0
citations

Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

ICCV 2017arXiv
0
citations

Referring Expression Generation and Comprehension via Attributes

ICCV 2017
0
citations

The Road To Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

ICCV 2021
0
citations

Learning To Stylize Novel Views

ICCV 2021arXiv
0
citations

COMISR: Compression-Informed Video Super-Resolution

ICCV 2021arXiv
0
citations

Hybrid Neural Fusion for Full-Frame Video Stabilization

ICCV 2021arXiv
0
citations

Discovering 3D Parts From Image Collections

ICCV 2021arXiv
0
citations

Benchmarking Ultra-High-Definition Image Super-Resolution

ICCV 2021
0
citations

Video Matting via Consistency-Regularized Graph Neural Networks

ICCV 2021
0
citations

D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations

ICCV 2021
0
citations

Unified Visual Relationship Detection with Vision and Language Models

ICCV 2023arXiv
0
citations

SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image

ICCV 2023arXiv
0
citations

Delving into Motion-Aware Matching for Monocular 3D Object Tracking

ICCV 2023arXiv
0
citations

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

ICCV 2023arXiv
0
citations

MiniROAD: Minimal RNN Framework for Online Action Detection

ICCV 2023
0
citations

Generative Multiplane Neural Radiance for 3D-Aware Image Generation

ICCV 2023arXiv
0
citations

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

ICCV 2023arXiv
0
citations

InfiniCity: Infinite-Scale City Synthesis

ICCV 2023arXiv
0
citations

CiteTracker: Correlating Image and Text for Visual Tracking

ICCV 2023arXiv
0
citations

High Quality Entity Segmentation

ICCV 2023arXiv
0
citations

Counting Crowds in Bad Weather

ICCV 2023arXiv
0
citations

Neural Design Network: Graphic Layout Generation with Constraints

ECCV 2020
0
citations

Learnable Cost Volume Using the Cayley Representation

ECCV 2020
0
citations

Video Object Detection via Object-level Temporal Aggregation

ECCV 2020
0
citations

Self-supervised Single-view 3D Reconstruction via Semantic Consistency

ECCV 2020
0
citations

Modeling Artistic Workflows for Image Generation and Editing

ECCV 2020
0
citations

Adversarial Training with Bi-directional Likelihood Regularization for Visual Classification

ECCV 2020
0
citations

Learning Enriched Features for Real Image Restoration and Enhancement

ECCV 2020
0
citations

Learning Visibility for Robust Dense Human Body Estimation

ECCV 2022
0
citations

Autoregressive 3D Shape Generation via Canonical Mapping

ECCV 2022
0
citations

Class-Agnostic Object Detection with Multi-modal Transformer

ECCV 2022
0
citations

Adaptive Transformers for Robust Few-Shot Cross-Domain Face Anti-Spoofing

ECCV 2022
0
citations

Scraping Textures from Natural Images for Synthesis and Editing

ECCV 2022
0
citations

Learning Discriminative Shrinkage Deep Networks for Image Deconvolution

ECCV 2022
0
citations

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation

ECCV 2022
0
citations

V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer

ECCV 2022
0
citations

Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis

NeurIPS 2015
0
citations

Semi-Supervised Learning for Optical Flow with Generative Adversarial Networks

NeurIPS 2017
0
citations

CLR: Channel-wise Lightweight Reprogramming for Continual Learning

ICCV 2023arXiv
0
citations

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

CVPR 2025
0
citations

UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior

CVPR 2025
0
citations

Move-in-2D: 2D-Conditioned Human Motion Generation

CVPR 2025
0
citations

Unified Dense Prediction of Video Diffusion

CVPR 2025
0
citations

Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing

ICCV 2025
0
citations

FaceLift: Learning Generalizable Single Image 3D Face Reconstruction from Synthetic Heads

ICCV 2025
0
citations

Efficient Concertormer for Image Deblurring and Beyond

ICCV 2025arXiv
0
citations

QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing

ICCV 2025
0
citations

Controllable 3D Outdoor Scene Generation via Scene Graphs

ICCV 2025
0
citations

Generating Synthetic Data for Unsupervised Federated Learning of Cross-Modal Retrieval

AAAI 2025
0
citations

BEV-MAE: Bird’s Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios

AAAI 2024
0
citations

DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes

CVPR 2024
0
citations

No More Ambiguity in 360° Room Layout via Bi-Layout Estimation

CVPR 2024
0
citations

Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence

CVPR 2024
0
citations

RTracker: Recoverable Tracking via PN Tree Structured Memory

CVPR 2024
0
citations

Text-Driven Image Editing via Learnable Regions

CVPR 2024
0
citations

VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding

CVPR 2024
0
citations

Weakly Supervised Video Individual Counting

CVPR 2024
0
citations

GLaMM: Pixel Grounding Large Multimodal Model

CVPR 2024
0
citations

Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring

CVPR 2024
0
citations

UniGS: Unified Representation for Image Generation and Segmentation

CVPR 2024
0
citations

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

CVPR 2024
0
citations

VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception

ICML 2024
0
citations

Deep Non-Blind Deconvolution via Generalized Low-Rank Approximation

NeurIPS 2018
0
citations

Deep Attentive Tracking via Reciprocative Learning

NeurIPS 2018
0
citations

Context-aware Synthesis and Placement of Object Instances

NeurIPS 2018
0
citations

Joint-task Self-supervised Learning for Temporal Correspondence

NeurIPS 2019
0
citations

Dancing to Music

NeurIPS 2019
0
citations

Quadratic Video Interpolation

NeurIPS 2019
0
citations

Online Adaptation for Consistent Mesh Reconstruction in the Wild

NeurIPS 2020
0
citations

Learning 3D Dense Correspondence via Canonical Point Autoencoder

NeurIPS 2021
0
citations

Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing

NeurIPS 2021
0
citations

Intriguing Properties of Vision Transformers

NeurIPS 2021
0
citations

End-to-end Multi-modal Video Temporal Grounding

NeurIPS 2021
0
citations

LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery

NeurIPS 2022
0
citations

AIMS: All-Inclusive Multi-Level Segmentation for Anything

NeurIPS 2023
0
citations

Video Timeline Modeling For News Story Understanding

NeurIPS 2023
0
citations

A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence

NeurIPS 2023
0
citations

ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections

NeurIPS 2023
0
citations

Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection

NeurIPS 2023
0
citations

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

NeurIPS 2023
0
citations

Module-wise Adaptive Distillation for Multimodality Foundation Models

NeurIPS 2023
0
citations