Ming Yang

22

Papers

381

Total Citations

Papers (22)

SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery

Animate-X: Universal Character Image Animation with Enhanced Motion Representation

StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

Mimir: Improving Video Diffusion Models for Precise Text Understanding

MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation

Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots

NeurIPS 2025arXiv

EcoMatcher: Efficient Clustering Oriented Matcher for Detector-free Image Matching

Unified Video Generation via Next-Set Prediction in Continuous Domain

Social Debiasing for Fair Multi-modal LLMs

Engage for All: Making Ordinary Image Descriptions Appealing Again!

CasP: Improving Semi-Dense Feature Matching Pipeline Leveraging Cascaded Correspondence Priors for Guidance

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding

HomoMatcher: Achieving Dense Feature Matching with Semi-Dense Efficiency by Homography Estimation

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling

Towards Better Vision-Inspired Vision-Language Models

Reversing Flow for Image Restoration

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

Stability and Generalization of Stochastic Compositional Gradient Descent Algorithms

DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection