CVPR Highlight Papers
712 papers found • Page 12 of 15
Matching Anything by Segmenting Anything
Siyuan Li, Lei Ke, Martin Danelljan et al.
MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception
Thien-Minh Nguyen, Shenghai Yuan, Thien Nguyen et al.
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury, Sayan Nag, Joseph K J et al.
MemoNav: Working Memory Model for Visual Navigation
Hongxin Li, Zeyu Wang, Xu Yang et al.
MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers
Yawar Siddiqui, Antonio Alliegro, Alexey Artemov et al.
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Yanhui Wang, Jianmin Bao, Wenming Weng et al.
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
Dewei Zhou, You Li, Fan Ma et al.
MindBridge: A Cross-Subject Brain Decoding Framework
Shizun Wang, Songhua Liu, Zhenxiong Tan et al.
MirageRoom: 3D Scene Segmentation with 2D Pre-trained Models by Mirage Projection
Haowen Sun, Yueqi Duan, Juncheng Yan et al.
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Sicong Leng, Hang Zhang, Guanzheng Chen et al.
MMM: Generative Masked Motion Model
Ekkasit Pinyoanuntapong, Pu Wang, Minwoo Lee et al.
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang, Kevin Lin, Zhengyuan Yang et al.
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Jielin Qiu, Jiacheng Zhu, William Han et al.
M&M VTO: Multi-Garment Virtual Try-On and Editing
Luyang Zhu, Yingwei Li, Nan Liu et al.
Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
Tony C. W. MOK, Zi Li, Yunhao Bai et al.
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Zhang Li, Biao Yang, Qiang Liu et al.
MonoNPHM: Dynamic Head Reconstruction from Monocular Videos
Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos et al.
Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene Affordance
Zan Wang, Yixin Chen, Baoxiong Jia et al.
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye, Haiyang Xu, Jiabo Ye et al.
MTLoRA: Low-Rank Adaptation Approach for Efficient Multi-Task Learning
Ahmed Agiza, Marina Neseem, Sherief Reda
Mudslide: A Universal Nuclear Instance Segmentation Method
Jun Wang
Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning
Jaewoo Jeong, Daehee Park, Kuk-Jin Yoon
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception
Junwen He, Yifan Wang, Lijun Wang et al.
Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Representation Learning
Menghao Zhang, Jingyu Wang, Qi Qi et al.
Multi-view Aggregation Network for Dichotomous Image Segmentation
Qian Yu, Xiaoqi Zhao, Youwei Pang et al.
MuseChat: A Conversational Music Recommendation System for Videos
Zhikang Dong, Bin Chen, Xiulong Liu et al.
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li, Yali Wang, Yinan He et al.
Navigate Beyond Shortcuts: Debiased Learning Through the Lens of Neural Collapse
Yining Wang, Junjie Sun, Chenyue Wang et al.
NC-TTT: A Noise Constrastive Approach for Test-Time Training
David OSOWIECHI, Gustavo Vargas Hakim, Mehrdad Noori et al.
NeISF: Neural Incident Stokes Field for Geometry and Material Estimation
Chenhao Li, Taishi Ono, Takeshi Uemori et al.
NeuRAD: Neural Rendering for Autonomous Driving
Adam Tonderski, Carl Lindström, Georg Hess et al.
Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling
Liwen Wu, Sai Bi, Zexiang Xu et al.
No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
Xiangyang Zhu, Renrui Zhang, Bowei He et al.
Novel Class Discovery for Ultra-Fine-Grained Visual Categorization
Qi Jia, Yaqi Cai, Qi Jia et al.
NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
Yannan He, Garvita Tiwari, Tolga Birdal et al.
Object Pose Estimation via the Aggregation of Diffusion Features
Tianfu Wang, Guosheng Hu, Hongguang Wang
Object Recognition as Next Token Prediction
Kaiyu Yue, Bor-Chun Chen, Jonas Geiping et al.
ODIN: A Single Model for 2D and 3D Segmentation
Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios et al.
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu, Xia Hu, Yaqing Wang et al.
One-dimensional Adapter to Rule Them All: Concepts Diffusion Models and Erasing Applications
Mengyao Lyu, Yuhong Yang, Haiwen Hong et al.
OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
Lingyi Hong, Shilin Yan, Renrui Zhang et al.
On the Estimation of Image-matching Uncertainty in Visual Place Recognition
Mubariz Zaffar, Liangliang Nan, Julian F. P. Kooij
OpenBias: Open-set Bias Detection in Text-to-Image Generative Models
Moreno D', Incà, Elia Peruzzo et al.
OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
Lingdong Kong, Youquan Liu, Lai Xing Ng et al.
Open-Vocabulary 3D Semantic Segmentation with Foundation Models
Li Jiang, Shaoshuai Shi, Bernt Schiele
Open-Vocabulary Object 6D Pose Estimation
Jaime Corsetti, Davide Boscaini, Changjae Oh et al.
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Qidong Huang, Xiaoyi Dong, Pan Zhang et al.
OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning
Noor Ahmed, Anna Kukleva, Bernt Schiele
Orthogonal Adaptation for Modular Customization of Diffusion Models
Ryan Po, Guandao Yang, Kfir Aberman et al.
Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata
Dongsu Zhang, Francis Williams, Žan Gojčič et al.