Ming-Hsuan Yang
45
Papers
1,144
Total Citations
Papers (45)
Language Model Beats Diffusion - Tokenizer is key to visual generation
ICLR 2024
525
citations
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
CVPR 2024
341
citations
VidToMe: Video Token Merging for Zero-Shot Video Editing
CVPR 2024
89
citations
Exploiting Diffusion Prior for Generalizable Dense Prediction
CVPR 2024
42
citations
Multi-subject Open-set Personalization in Video Generation
CVPR 2025arXiv
40
citations
Calibrated Multi-Preference Optimization for Aligning Diffusion Models
CVPR 2025
24
citations
Efficient Visual State Space Model for Image Deblurring
CVPR 2025
23
citations
CSL: Class-Agnostic Structure-Constrained Learning for Segmentation including the Unseen
AAAI 2024arXiv
15
citations
AutoOcc: Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting
ICCV 2025
9
citations
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection
NeurIPS 2025
8
citations
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
CVPR 2025
8
citations
Cropper: Vision-Language Model for Image Cropping through In-Context Learning
CVPR 2025
5
citations
Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance
CVPR 2024
4
citations
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
ICCV 2025
3
citations
HoliGS: Holistic Gaussian Splatting for Embodied View Synthesis
NeurIPS 2025
3
citations
Learning Deblurring Texture Prior from Unpaired Data with Diffusion Model
ICCV 2025
2
citations
Toward Material-Agnostic System Identification from Videos
ICCV 2025
1
citations
CompleteMe: Reference-based Human Image Completion
ICCV 2025
1
citations
From Prompt to Progression: Taming Video Diffusion Models for Seamless Attribute Transition
ICCV 2025
1
citations
GLaMM: Pixel Grounding Large Multimodal Model
CVPR 2024
0
citations
Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring
CVPR 2024
0
citations
UniGS: Unified Representation for Image Generation and Segmentation
CVPR 2024
0
citations
PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection
CVPR 2024
0
citations
VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception
ICML 2024
0
citations
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
ICML 2024
0
citations
VideoPoet: A Large Language Model for Zero-Shot Video Generation
ICML 2024
0
citations
VideoPrism: A Foundational Visual Encoder for Video Understanding
ICML 2024
0
citations
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
CVPR 2025
0
citations
UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior
CVPR 2025
0
citations
Move-in-2D: 2D-Conditioned Human Motion Generation
CVPR 2025
0
citations
Unified Dense Prediction of Video Diffusion
CVPR 2025
0
citations
Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing
ICCV 2025
0
citations
FaceLift: Learning Generalizable Single Image 3D Face Reconstruction from Synthetic Heads
ICCV 2025
0
citations
Efficient Concertormer for Image Deblurring and Beyond
ICCV 2025arXiv
0
citations
QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing
ICCV 2025
0
citations
Controllable 3D Outdoor Scene Generation via Scene Graphs
ICCV 2025
0
citations
Generating Synthetic Data for Unsupervised Federated Learning of Cross-Modal Retrieval
AAAI 2025
0
citations
BEV-MAE: Bird’s Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios
AAAI 2024
0
citations
DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
CVPR 2024
0
citations
No More Ambiguity in 360° Room Layout via Bi-Layout Estimation
CVPR 2024
0
citations
Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
CVPR 2024
0
citations
RTracker: Recoverable Tracking via PN Tree Structured Memory
CVPR 2024
0
citations
Text-Driven Image Editing via Learnable Regions
CVPR 2024
0
citations
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
CVPR 2024
0
citations
Weakly Supervised Video Individual Counting
CVPR 2024
0
citations