CVPR Papers

5,589 papers found • Page 33 of 112

MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal Output

Yanyuan Chen, Dexuan Xu, Yu Huang et al.

CVPR 2025posterarXiv:2510.10011
10
citations

MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling

Yifang Men, Yuan Yao, Miaomiao Cui et al.

CVPR 2025posterarXiv:2409.16160

Minding Fuzzy Regions: A Data-driven Alternating Learning Paradigm for Stable Lesion Segmentation

Lexin Fang, Yunyang Xu, Xiang Ma et al.

CVPR 2025posterarXiv:2503.11140

Mind the Gap: Confidence Discrepancy Can Guide Federated Semi-Supervised Learning Across Pseudo-Mismatch

Yijie Liu, Xinyi Shang, Yiqun Zhang et al.

CVPR 2025posterarXiv:2503.13227

Mind the Gap: Detecting Black-box Adversarial Attacks in the Making through Query Update Analysis

Jeonghwan Park, Niall McLaughlin, Ihsen Alouani

CVPR 2025posterarXiv:2503.02986

Mind the Time: Temporally-Controlled Multi-Event Video Generation

Ziyi Wu, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2025posterarXiv:2412.05263
22
citations

Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking

Junxi Chen, Junhao Dong, Xiaohua Xie

CVPR 2025highlightarXiv:2504.05838
5
citations

Minimal Interaction Seperated Tuning: A New Paradigm for Visual Adaptation

Ningyuan Tang, Minghao Fu, Jianxin Wu

CVPR 2025poster
1
citations

MINIMA: Modality Invariant Image Matching

Jiangwei Ren, Xingyu Jiang, Zizhuo Li et al.

CVPR 2025posterarXiv:2412.19412

Minimizing Labeled, Maximizing Unlabeled: An Image-Driven Approach for Video Instance Segmentation

Fangyun Wei, Jinjing Zhao, Kun Yan et al.

CVPR 2025poster

Minority-Focused Text-to-Image Generation via Prompt Optimization

Soobin Um, Jong Chul Ye

CVPR 2025posterarXiv:2410.07838

MIRE: Matched Implicit Neural Representations

Dhananjaya Jayasundara, Heng Zhao, Demetrio Labate et al.

CVPR 2025poster
6
citations

MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World

Ankit Dhiman, Manan Shah, R. Venkatesh Babu

CVPR 2025posterarXiv:2504.15397
1
citations

Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jing Yu, Keke Gai et al.

CVPR 2025posterarXiv:2503.17109
14
citations

Mitigating Ambiguities in 3D Classification with Gaussian Splatting

Ruiqi Zhang, Hao Zhu, Jingyi Zhao et al.

CVPR 2025posterarXiv:2503.08352
2
citations

Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key

Zhihe Yang, Xufang Luo, Dongqi Han et al.

CVPR 2025posterarXiv:2501.09695
29
citations

Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Wenbin An, Feng Tian, Sicong Leng et al.

CVPR 2025posterarXiv:2406.12718

Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

Jiaming Zhou, Teli Ma, Kun-Yu Lin et al.

CVPR 2025posterarXiv:2406.14235
17
citations

MITracker: Multi-View Integration for Visual Object Tracking

Mengjie Xu, Yitao Zhu, Haotian Jiang et al.

CVPR 2025highlightarXiv:2502.20111

MixerMDM: Learnable Composition of Human Motion Diffusion Models

Pablo Ruiz-Ponce, German Barquero, Cristina Palmero et al.

CVPR 2025posterarXiv:2504.01019
4
citations

Mixture of Submodules for Domain Adaptive Person Search

Minsu Kim, Seungryong Kim, Kwanghoon Sohn

CVPR 2025poster

MLLM-as-a-Judge for Image Safety without Human Labeling

Zhenting Wang, Shuming Hu, Shiyu Zhao et al.

CVPR 2025highlightarXiv:2501.00192
16
citations

M-LLM Based Video Frame Selection for Efficient Video Understanding

Kai Hu, Feng Gao, Xiaohan Nie et al.

CVPR 2025posterarXiv:2502.19680
46
citations

MLVU: Benchmarking Multi-task Long Video Understanding

Junjie Zhou, Yan Shu, Bo Zhao et al.

CVPR 2025posterarXiv:2406.04264
93
citations

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Jian Yang, Dacheng Yin, Yizhou Zhou et al.

CVPR 2025posterarXiv:2410.10798
13
citations

MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Ho Kei Cheng, Masato Ishii, Akio Hayakawa et al.

CVPR 2025posterarXiv:2412.15322

MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments

Ege Özsoy, Chantal Pellegrini, Tobias Czempiel et al.

CVPR 2025posterarXiv:2503.02579

MMRL: Multi-Modal Representation Learning for Vision-Language Models

Yuncheng Guo, Xiaodong Gu

CVPR 2025posterarXiv:2503.08497

MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception

Wenzhuo Liu, Wenshuo Wang, Yicheng Qiao et al.

CVPR 2025posterarXiv:2504.02264
6
citations

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Yilun Zhao, Lujing Xie, Haowei Zhang et al.

CVPR 2025posterarXiv:2501.12380
74
citations

MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots

Tianchen Deng, Guole Shen, Chen Xun et al.

CVPR 2025poster
35
citations

MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data

Zifan Wang, Ziqing Chen, Junyu Chen et al.

CVPR 2025posterarXiv:2501.04595

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

Haoyang He, Jiangning Zhang, Yuxuan Cai et al.

CVPR 2025posterarXiv:2411.15941

MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices

Jianwen Jiang, Gaojie Lin, Zhengkun Rong et al.

CVPR 2025posterarXiv:2407.05712

MODA: Motion-Drift Augmentation for Inertial Human Motion Analysis

Yinghao Wu, Shihui Guo, Yipeng Qin

CVPR 2025poster

MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting

Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong et al.

CVPR 2025posterarXiv:2501.03714
12
citations

Model Diagnosis and Correction via Linguistic and Implicit Attribute Editing

Xuanbai Chen, Xiang Xu, Zhihua Li et al.

CVPR 2025poster

Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks

Wei-Jin Huang, Yuan-Ming Li, Zhi-Wei Xia et al.

CVPR 2025posterarXiv:2503.22405

Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification

Jiayu Jiang, Changxing Ding, Wentao Tan et al.

CVPR 2025highlightarXiv:2503.09962
5
citations

Model Poisoning Attacks to Federated Learning via Multi-Round Consistency

Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong

CVPR 2025posterarXiv:2404.15611
24
citations

ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling

Zikang Zhou, Hengjian Zhou, Haibo Hu et al.

CVPR 2025posterarXiv:2411.11911
7
citations

MODfinity: Unsupervised Domain Adaptation with Multimodal Information Flow Intertwining

Shanglin Liu, Jianming Lv, Jingdan Kang et al.

CVPR 2025poster

MoEdit: On Learning Quantity Perception for Multi-object Image Editing

Yanfeng Li, Ka-Hou Chan, Yue Sun et al.

CVPR 2025posterarXiv:2503.10112
4
citations

MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation

Huaize Liu, WenZhang Sun, Donglin Di et al.

CVPR 2025posterarXiv:2501.01808
11
citations

MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation based Distillation

Yuxiang Fu, Qi Yan, Ke Li et al.

CVPR 2025posterarXiv:2503.09950

MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision

Ruicheng Wang, Sicheng Xu, Cassie Lee Dai et al.

CVPR 2025posterarXiv:2410.19115

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

Matt Deitke, Christopher Clark, Sangho Lee et al.

CVPR 2025posterarXiv:2409.17146
96
citations

MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation

Zhenyu Wu, Yuheng Zhou, Xiuwei Xu et al.

CVPR 2025posterarXiv:2503.13446

Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion

Songsong Yu, Yuxin Chen, Zhongang Qi et al.

CVPR 2025posterarXiv:2503.22262
3
citations

Mono3DVLT: Monocular-Video-Based 3D Visual Language Tracking

Hongkai Wei, YANG YANG, Shijie Sun et al.

CVPR 2025poster
4
citations