Ming-Hsuan Yang
199
Papers
2,852
Total Citations
Papers (199)
Universal Style Transfer via Feature Transforms
NeurIPS 2017arXiv
1,083
citations
Language Model Beats Diffusion - Tokenizer is key to visual generation
ICLR 2024
525
citations
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
CVPR 2024
341
citations
Learning Affinity via Spatial Propagation Networks
NeurIPS 2017arXiv
300
citations
Every Pixel Matters: Center-aware Feature Alignment for Domain Adaptive Object Detector
ECCV 2020
246
citations
VidToMe: Video Token Merging for Zero-Shot Video Editing
CVPR 2024
89
citations
RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval
ECCV 2020
56
citations
Exploiting Diffusion Prior for Generalizable Dense Prediction
CVPR 2024
42
citations
Multi-subject Open-set Personalization in Video Generation
CVPR 2025arXiv
40
citations
Calibrated Multi-Preference Optimization for Aligning Diffusion Models
CVPR 2025
24
citations
Efficient Visual State Space Model for Image Deblurring
CVPR 2025
23
citations
Controllable Image Synthesis via SegVAE
ECCV 2020
23
citations
CSL: Class-Agnostic Structure-Constrained Learning for Segmentation including the Unseen
AAAI 2024arXiv
15
citations
AutoOcc: Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting
ICCV 2025
9
citations
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection
NeurIPS 2025
8
citations
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
CVPR 2025
8
citations
Cropper: Vision-Language Model for Image Cropping through In-Context Learning
CVPR 2025
5
citations
Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance
CVPR 2024
4
citations
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
ICCV 2025
3
citations
HoliGS: Holistic Gaussian Splatting for Embodied View Synthesis
NeurIPS 2025
3
citations
Learning Deblurring Texture Prior from Unpaired Data with Diffusion Model
ICCV 2025
2
citations
CompleteMe: Reference-based Human Image Completion
ICCV 2025
1
citations
From Prompt to Progression: Taming Video Diffusion Models for Seamless Attribute Transition
ICCV 2025
1
citations
Toward Material-Agnostic System Identification from Videos
ICCV 2025
1
citations
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
ICML 2024
0
citations
VideoPoet: A Large Language Model for Zero-Shot Video Generation
ICML 2024
0
citations
VideoPrism: A Foundational Visual Encoder for Video Understanding
ICML 2024
0
citations
Structural Sparse Tracking
CVPR 2015
0
citations
Adaptive Region Pooling for Object Detection
CVPR 2015
0
citations
PatchCut: Data-Driven Object Segmentation via Local Shape Transfer
CVPR 2015
0
citations
Salient Object Detection via Bootstrap Learning
CVPR 2015
0
citations
JOTS: Joint Online Tracking and Segmentation
CVPR 2015
0
citations
Deep Networks for Saliency Detection via Local Estimation and Global Search
CVPR 2015
0
citations
Multi-Objective Convolutional Learning for Face Labeling
CVPR 2015
0
citations
Multi-Instance Object Segmentation With Occlusion Handling
CVPR 2015
0
citations
Long-Term Correlation Tracking
CVPR 2015
0
citations
Object Contour Detection With a Fully Convolutional Encoder-Decoder Network
CVPR 2016
0
citations
Soft-Segmentation Guided Object Motion Deblurring
CVPR 2016
0
citations
Online Multi-Object Tracking via Structural Constraint Event Aggregation
CVPR 2016
0
citations
Blind Image Deblurring Using Dark Channel Prior
CVPR 2016
0
citations
A Comparative Study for Single Image Blind Deblurring
CVPR 2016
0
citations
Image Deblurring Using Smartphone Inertial Sensors
CVPR 2016
0
citations
Robust Kernel Estimation With Outliers Handling for Image Deblurring
CVPR 2016
0
citations
Weakly Supervised Object Localization With Progressive Domain Adaptation
CVPR 2016
0
citations
Video Segmentation via Object Flow
CVPR 2016
0
citations
Object Tracking via Dual Linear Structured SVM and Explicit Feature Map
CVPR 2016
0
citations
Hedged Deep Tracking
CVPR 2016
0
citations
Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution
CVPR 2017arXiv
0
citations
Deep Image Harmonization
CVPR 2017arXiv
0
citations
Learning Fully Convolutional Networks for Iterative Non-Blind Deconvolution
CVPR 2017arXiv
0
citations
Generative Face Completion
CVPR 2017arXiv
0
citations
Diversified Texture Synthesis With Feed-Forward Networks
CVPR 2017arXiv
0
citations
Multi-Task Correlation Particle Filter for Robust Object Tracking
CVPR 2017
0
citations
Correlation Tracking via Joint Discrimination and Reliability Learning
CVPR 2018arXiv
0
citations
Learning Superpixels With Segmentation-Aware Affinity Loss
CVPR 2018
0
citations
Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks
CVPR 2018
0
citations
SPLATNet: Sparse Lattice Networks for Point Cloud Processing
CVPR 2018arXiv
0
citations
Learning Dual Convolutional Neural Networks for Low-Level Vision
CVPR 2018arXiv
0
citations
PiCANet: Learning Pixel-Wise Contextual Attention for Saliency Detection
CVPR 2018arXiv
0
citations
Gated Fusion Network for Single Image Dehazing
CVPR 2018arXiv
0
citations
Learning to Localize Sound Source in Visual Scenes
CVPR 2018arXiv
0
citations
Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking
CVPR 2018arXiv
0
citations
Learning a Discriminative Prior for Blind Image Deblurring
CVPR 2018arXiv
0
citations
Fast and Accurate Online Video Object Segmentation via Tracking Parts
CVPR 2018arXiv
0
citations
Learning to Adapt Structured Output Space for Semantic Segmentation
CVPR 2018arXiv
0
citations
Weakly Supervised Coupled Networks for Visual Sentiment Analysis
CVPR 2018
0
citations
Deep Semantic Face Deblurring
CVPR 2018arXiv
0
citations
Learning Spatial-Aware Regressions for Visual Tracking
CVPR 2018arXiv
0
citations
VITAL: VIsual Tracking via Adversarial Learning
CVPR 2018arXiv
0
citations
Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation
CVPR 2018arXiv
0
citations
SCOPS: Self-Supervised Co-Part Segmentation
CVPR 2019
0
citations
Target-Aware Deep Tracking
CVPR 2019
0
citations
Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis
CVPR 2019
0
citations
Im2Pencil: Controllable Pencil Illustration From Photographs
CVPR 2019
0
citations
Spatially Variant Linear Representation Models for Joint Filtering
CVPR 2019
0
citations
CrDoCo: Pixel-Level Domain Transfer With Cross-Domain Consistency
CVPR 2019
0
citations
Depth-Aware Video Frame Interpolation
CVPR 2019
0
citations
Learning Linear Transformations for Fast Image and Video Style Transfer
CVPR 2019
0
citations
Inserting Videos Into Videos
CVPR 2019
0
citations
Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments
CVPR 2019
0
citations
Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline
CVPR 2020arXiv
0
citations
Composing Good Shots by Exploiting Mutual Relations
CVPR 2020
0
citations
CycleISP: Real Image Restoration via Improved Data Synthesis
CVPR 2020arXiv
0
citations
Multi-Scale Boosted Dehazing Network With Dense Feature Fusion
CVPR 2020arXiv
0
citations
Collaborative Distillation for Ultra-Resolution Universal Style Transfer
CVPR 2020arXiv
0
citations
Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition From a Domain Adaptation Perspective
CVPR 2020arXiv
0
citations
Weakly-Supervised Semantic Segmentation via Sub-Category Exploration
CVPR 2020arXiv
0
citations
Learning to See Through Obstructions
CVPR 2020
0
citations
ReMix: Towards Image-to-Image Translation With Limited Data
CVPR 2021arXiv
0
citations
Regularizing Generative Adversarial Networks Under Limited Data
CVPR 2021arXiv
0
citations
Decoupled Dynamic Filter Networks
CVPR 2021arXiv
0
citations
Spatiotemporal Contrastive Video Representation Learning
CVPR 2021arXiv
0
citations
Multi-Stage Progressive Image Restoration
CVPR 2021arXiv
0
citations
Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision
CVPR 2022arXiv
0
citations
Video Frame Interpolation Transformer
CVPR 2022arXiv
0
citations
Burst Image Restoration and Enhancement
CVPR 2022arXiv
0
citations
Restormer: Efficient Transformer for High-Resolution Image Restoration
CVPR 2022arXiv
0
citations
Hierarchical Modular Network for Video Captioning
CVPR 2022arXiv
0
citations
InOut: Diverse Image Outpainting via GAN Inversion
CVPR 2022
0
citations
Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection
CVPR 2023arXiv
0
citations
Burstormer: Burst Image Restoration and Enhancement Transformer
CVPR 2023arXiv
0
citations
Self-Supervised Super-Plane for Neural 3D Reconstruction
CVPR 2023
0
citations
MAGVIT: Masked Generative Video Transformer
CVPR 2023arXiv
0
citations
Improving Zero-Shot Generalization and Robustness of Multi-Modal Models
CVPR 2023arXiv
0
citations
Learning To Dub Movies via Hierarchical Prosody Models
CVPR 2023arXiv
0
citations
Self-Supervised AutoFlow
CVPR 2023arXiv
0
citations
Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery From Sparse Image Ensemble
CVPR 2023
0
citations
What Makes an Object Memorable?
ICCV 2015
0
citations
Fast and Accurate Head Pose Estimation via Random Projection Forests
ICCV 2015
0
citations
Hierarchical Convolutional Features for Visual Tracking
ICCV 2015
0
citations
Learning to Super-Resolve Blurry Face and Text Images
ICCV 2017
0
citations
Unsupervised Representation Learning by Sorting Sequences
ICCV 2017arXiv
0
citations
SegFlow: Joint Learning for Video Object Segmentation and Optical Flow
ICCV 2017arXiv
0
citations
Learning Discriminative Data Fitting Functions for Blind Image Deblurring
ICCV 2017
0
citations
Video Deblurring via Semantic Segmentation and Pixel-Wise Non-Linear Kernel
ICCV 2017arXiv
0
citations
Blind Image Deblurring With Outlier Handling
ICCV 2017
0
citations
CREST: Convolutional Residual Learning for Visual Tracking
ICCV 2017arXiv
0
citations
Scene Parsing With Global Context Embedding
ICCV 2017arXiv
0
citations
Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos
ICCV 2017arXiv
0
citations
Referring Expression Generation and Comprehension via Attributes
ICCV 2017
0
citations
The Road To Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
ICCV 2021
0
citations
Learning To Stylize Novel Views
ICCV 2021arXiv
0
citations
COMISR: Compression-Informed Video Super-Resolution
ICCV 2021arXiv
0
citations
Hybrid Neural Fusion for Full-Frame Video Stabilization
ICCV 2021arXiv
0
citations
Discovering 3D Parts From Image Collections
ICCV 2021arXiv
0
citations
Benchmarking Ultra-High-Definition Image Super-Resolution
ICCV 2021
0
citations
Video Matting via Consistency-Regularized Graph Neural Networks
ICCV 2021
0
citations
D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations
ICCV 2021
0
citations
Unified Visual Relationship Detection with Vision and Language Models
ICCV 2023arXiv
0
citations
SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image
ICCV 2023arXiv
0
citations
Delving into Motion-Aware Matching for Monocular 3D Object Tracking
ICCV 2023arXiv
0
citations
Self-regulating Prompts: Foundational Model Adaptation without Forgetting
ICCV 2023arXiv
0
citations
MiniROAD: Minimal RNN Framework for Online Action Detection
ICCV 2023
0
citations
Generative Multiplane Neural Radiance for 3D-Aware Image Generation
ICCV 2023arXiv
0
citations
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
ICCV 2023arXiv
0
citations
InfiniCity: Infinite-Scale City Synthesis
ICCV 2023arXiv
0
citations
CiteTracker: Correlating Image and Text for Visual Tracking
ICCV 2023arXiv
0
citations
High Quality Entity Segmentation
ICCV 2023arXiv
0
citations
Counting Crowds in Bad Weather
ICCV 2023arXiv
0
citations
Neural Design Network: Graphic Layout Generation with Constraints
ECCV 2020
0
citations
Learnable Cost Volume Using the Cayley Representation
ECCV 2020
0
citations
Video Object Detection via Object-level Temporal Aggregation
ECCV 2020
0
citations
Self-supervised Single-view 3D Reconstruction via Semantic Consistency
ECCV 2020
0
citations
Modeling Artistic Workflows for Image Generation and Editing
ECCV 2020
0
citations
Adversarial Training with Bi-directional Likelihood Regularization for Visual Classification
ECCV 2020
0
citations
Learning Enriched Features for Real Image Restoration and Enhancement
ECCV 2020
0
citations
Learning Visibility for Robust Dense Human Body Estimation
ECCV 2022
0
citations
Autoregressive 3D Shape Generation via Canonical Mapping
ECCV 2022
0
citations
Class-Agnostic Object Detection with Multi-modal Transformer
ECCV 2022
0
citations
Adaptive Transformers for Robust Few-Shot Cross-Domain Face Anti-Spoofing
ECCV 2022
0
citations
Scraping Textures from Natural Images for Synthesis and Editing
ECCV 2022
0
citations
Learning Discriminative Shrinkage Deep Networks for Image Deconvolution
ECCV 2022
0
citations
CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation
ECCV 2022
0
citations
V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer
ECCV 2022
0
citations
Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis
NeurIPS 2015
0
citations
Semi-Supervised Learning for Optical Flow with Generative Adversarial Networks
NeurIPS 2017
0
citations
CLR: Channel-wise Lightweight Reprogramming for Continual Learning
ICCV 2023arXiv
0
citations
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
CVPR 2025
0
citations
UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior
CVPR 2025
0
citations
Move-in-2D: 2D-Conditioned Human Motion Generation
CVPR 2025
0
citations
Unified Dense Prediction of Video Diffusion
CVPR 2025
0
citations
Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing
ICCV 2025
0
citations
FaceLift: Learning Generalizable Single Image 3D Face Reconstruction from Synthetic Heads
ICCV 2025
0
citations
Efficient Concertormer for Image Deblurring and Beyond
ICCV 2025arXiv
0
citations
QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing
ICCV 2025
0
citations
Controllable 3D Outdoor Scene Generation via Scene Graphs
ICCV 2025
0
citations
Generating Synthetic Data for Unsupervised Federated Learning of Cross-Modal Retrieval
AAAI 2025
0
citations
BEV-MAE: Bird’s Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios
AAAI 2024
0
citations
DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
CVPR 2024
0
citations
No More Ambiguity in 360° Room Layout via Bi-Layout Estimation
CVPR 2024
0
citations
Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
CVPR 2024
0
citations
RTracker: Recoverable Tracking via PN Tree Structured Memory
CVPR 2024
0
citations
Text-Driven Image Editing via Learnable Regions
CVPR 2024
0
citations
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
CVPR 2024
0
citations
Weakly Supervised Video Individual Counting
CVPR 2024
0
citations
GLaMM: Pixel Grounding Large Multimodal Model
CVPR 2024
0
citations
Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring
CVPR 2024
0
citations
UniGS: Unified Representation for Image Generation and Segmentation
CVPR 2024
0
citations
PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection
CVPR 2024
0
citations
VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception
ICML 2024
0
citations
Deep Non-Blind Deconvolution via Generalized Low-Rank Approximation
NeurIPS 2018
0
citations
Deep Attentive Tracking via Reciprocative Learning
NeurIPS 2018
0
citations
Context-aware Synthesis and Placement of Object Instances
NeurIPS 2018
0
citations
Joint-task Self-supervised Learning for Temporal Correspondence
NeurIPS 2019
0
citations
Dancing to Music
NeurIPS 2019
0
citations
Quadratic Video Interpolation
NeurIPS 2019
0
citations
Online Adaptation for Consistent Mesh Reconstruction in the Wild
NeurIPS 2020
0
citations
Learning 3D Dense Correspondence via Canonical Point Autoencoder
NeurIPS 2021
0
citations
Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing
NeurIPS 2021
0
citations
Intriguing Properties of Vision Transformers
NeurIPS 2021
0
citations
End-to-end Multi-modal Video Temporal Grounding
NeurIPS 2021
0
citations
LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery
NeurIPS 2022
0
citations
AIMS: All-Inclusive Multi-Level Segmentation for Anything
NeurIPS 2023
0
citations
Video Timeline Modeling For News Story Understanding
NeurIPS 2023
0
citations
A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence
NeurIPS 2023
0
citations
ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections
NeurIPS 2023
0
citations
Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection
NeurIPS 2023
0
citations
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
NeurIPS 2023
0
citations
Module-wise Adaptive Distillation for Multimodality Foundation Models
NeurIPS 2023
0
citations