Mingyu Ding

29
Papers
259
Total Citations

Papers (29)

Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation

ECCV 2020
133
citations

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

CVPR 2024
64
citations

UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling

ICLR 2024
54
citations

X-Drive: Cross-modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios

ICLR 2025
8
citations

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

ICML 2024
0
citations

Face-Focused Cross-Stream Network for Deception Detection in Videos

CVPR 2019
0
citations

Learning Depth-Guided Convolutions for Monocular 3D Object Detection

CVPR 2020arXiv
0
citations

HR-NAS: Searching Efficient High-Resolution Neural Architectures With Lightweight Transformers

CVPR 2021
0
citations

L2M-GAN: Learning To Manipulate Latent Space Semantics for Facial Attribute Editing

CVPR 2021
0
citations

Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners

CVPR 2023
0
citations

Visual Dependency Transformers: Dependency Tree Emerges From Reversed Attention

CVPR 2023arXiv
0
citations

EC2: Emergent Communication for Embodied Control

CVPR 2023
0
citations

CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization

ICCV 2019
0
citations

TextPSG: Panoptic Scene Graph Generation from Textual Descriptions

ICCV 2023
0
citations

Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking

ECCV 2020
0
citations

Segmenting Transparent Objects in the Wild

ECCV 2020
0
citations

DaViT: Dual Attention Vision Transformers

ECCV 2022
0
citations

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

CVPR 2025
0
citations

CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians

CVPR 2025
0
citations

RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins

CVPR 2025
0
citations

Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos

ICCV 2025
0
citations

Domain-Invariant Projection Learning for Zero-Shot Recognition

NeurIPS 2018
0
citations

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language

NeurIPS 2021
0
citations

Compressed Video Contrastive Learning

NeurIPS 2021
0
citations

LGDN: Language-Guided Denoising Network for Video-Language Modeling

NeurIPS 2022
0
citations

Towards Free Data Selection with General-Purpose Models

NeurIPS 2023
0
citations

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

NeurIPS 2023
0
citations

Doubly-Robust Self-Training

NeurIPS 2023
0
citations

Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties

NeurIPS 2023
0
citations