CVPR Highlight Papers

712 papers found • Page 12 of 15

Matching Anything by Segmenting Anything

Siyuan Li, Lei Ke, Martin Danelljan et al.

CVPR 2024highlightarXiv:2406.04221
49
citations

MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception

Thien-Minh Nguyen, Shenghai Yuan, Thien Nguyen et al.

CVPR 2024highlightarXiv:2403.11496
70
citations

MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models

Sanjoy Chowdhury, Sayan Nag, Joseph K J et al.

CVPR 2024highlightarXiv:2406.04673

MemoNav: Working Memory Model for Visual Navigation

Hongxin Li, Zeyu Wang, Xu Yang et al.

CVPR 2024highlightarXiv:2402.19161
10
citations

MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers

Yawar Siddiqui, Antonio Alliegro, Alexey Artemov et al.

CVPR 2024highlightarXiv:2311.15475

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

Yanhui Wang, Jianmin Bao, Wenming Weng et al.

CVPR 2024highlightarXiv:2311.18829

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

Dewei Zhou, You Li, Fan Ma et al.

CVPR 2024highlightarXiv:2402.05408
109
citations

MindBridge: A Cross-Subject Brain Decoding Framework

Shizun Wang, Songhua Liu, Zhenxiong Tan et al.

CVPR 2024highlightarXiv:2404.07850

MirageRoom: 3D Scene Segmentation with 2D Pre-trained Models by Mirage Projection

Haowen Sun, Yueqi Duan, Juncheng Yan et al.

CVPR 2024highlight

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Sicong Leng, Hang Zhang, Guanzheng Chen et al.

CVPR 2024highlightarXiv:2311.16922
449
citations

MMM: Generative Masked Motion Model

Ekkasit Pinyoanuntapong, Pu Wang, Minwoo Lee et al.

CVPR 2024highlightarXiv:2312.03596

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

Chaoyi Zhang, Kevin Lin, Zhengyuan Yang et al.

CVPR 2024highlightarXiv:2311.17435
49
citations

MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

Jielin Qiu, Jiacheng Zhu, William Han et al.

CVPR 2024highlightarXiv:2306.04216

M&M VTO: Multi-Garment Virtual Try-On and Editing

Luyang Zhu, Yingwei Li, Nan Liu et al.

CVPR 2024highlightarXiv:2406.04542
26
citations

Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration

Tony C. W. MOK, Zi Li, Yunhao Bai et al.

CVPR 2024highlightarXiv:2402.18933

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Zhang Li, Biao Yang, Qiang Liu et al.

CVPR 2024highlightarXiv:2311.06607
384
citations

MonoNPHM: Dynamic Head Reconstruction from Monocular Videos

Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos et al.

CVPR 2024highlightarXiv:2312.06740
34
citations

Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

Zan Wang, Yixin Chen, Baoxiong Jia et al.

CVPR 2024highlightarXiv:2403.18036
78
citations

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Qinghao Ye, Haiyang Xu, Jiabo Ye et al.

CVPR 2024highlightarXiv:2311.04257
601
citations

MTLoRA: Low-Rank Adaptation Approach for Efficient Multi-Task Learning

Ahmed Agiza, Marina Neseem, Sherief Reda

CVPR 2024highlight

Mudslide: A Universal Nuclear Instance Segmentation Method

Jun Wang

CVPR 2024highlight

Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning

Jaewoo Jeong, Daehee Park, Kuk-Jin Yoon

CVPR 2024highlightarXiv:2404.05218

Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception

Junwen He, Yifan Wang, Lijun Wang et al.

CVPR 2024highlightarXiv:2403.02969

Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Representation Learning

Menghao Zhang, Jingyu Wang, Qi Qi et al.

CVPR 2024highlight

Multi-view Aggregation Network for Dichotomous Image Segmentation

Qian Yu, Xiaoqi Zhao, Youwei Pang et al.

CVPR 2024highlightarXiv:2404.07445
38
citations

MuseChat: A Conversational Music Recommendation System for Videos

Zhikang Dong, Bin Chen, Xiulong Liu et al.

CVPR 2024highlightarXiv:2310.06282

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

Kunchang Li, Yali Wang, Yinan He et al.

CVPR 2024highlightarXiv:2311.17005
864
citations

Navigate Beyond Shortcuts: Debiased Learning Through the Lens of Neural Collapse

Yining Wang, Junjie Sun, Chenyue Wang et al.

CVPR 2024highlightarXiv:2405.05587

NC-TTT: A Noise Constrastive Approach for Test-Time Training

David OSOWIECHI, Gustavo Vargas Hakim, Mehrdad Noori et al.

CVPR 2024highlight

NeISF: Neural Incident Stokes Field for Geometry and Material Estimation

Chenhao Li, Taishi Ono, Takeshi Uemori et al.

CVPR 2024highlightarXiv:2311.13187

NeuRAD: Neural Rendering for Autonomous Driving

Adam Tonderski, Carl Lindström, Georg Hess et al.

CVPR 2024highlightarXiv:2311.15260
126
citations

Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling

Liwen Wu, Sai Bi, Zexiang Xu et al.

CVPR 2024highlightarXiv:2405.14847

No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

Xiangyang Zhu, Renrui Zhang, Bowei He et al.

CVPR 2024highlightarXiv:2404.04050
27
citations

Novel Class Discovery for Ultra-Fine-Grained Visual Categorization

Qi Jia, Yaqi Cai, Qi Jia et al.

CVPR 2024highlightarXiv:2405.06283
13
citations

NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors

Yannan He, Garvita Tiwari, Tolga Birdal et al.

CVPR 2024highlightarXiv:2403.03122
21
citations

Object Pose Estimation via the Aggregation of Diffusion Features

Tianfu Wang, Guosheng Hu, Hongguang Wang

CVPR 2024highlightarXiv:2403.18791
16
citations

Object Recognition as Next Token Prediction

Kaiyu Yue, Bor-Chun Chen, Jonas Geiping et al.

CVPR 2024highlightarXiv:2312.02142

ODIN: A Single Model for 2D and 3D Segmentation

Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios et al.

CVPR 2024highlightarXiv:2401.02416
19
citations

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts

Jialin Wu, Xia Hu, Yaqing Wang et al.

CVPR 2024highlightarXiv:2312.00968

One-dimensional Adapter to Rule Them All: Concepts Diffusion Models and Erasing Applications

Mengyao Lyu, Yuhong Yang, Haiwen Hong et al.

CVPR 2024highlightarXiv:2312.16145
112
citations

OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning

Lingyi Hong, Shilin Yan, Renrui Zhang et al.

CVPR 2024highlightarXiv:2403.09634
118
citations

On the Estimation of Image-matching Uncertainty in Visual Place Recognition

Mubariz Zaffar, Liangliang Nan, Julian F. P. Kooij

CVPR 2024highlightarXiv:2404.00546

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

Moreno D&#x27, Incà, Elia Peruzzo et al.

CVPR 2024highlight
69
citations

OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies

Lingdong Kong, Youquan Liu, Lai Xing Ng et al.

CVPR 2024highlightarXiv:2405.05259

Open-Vocabulary 3D Semantic Segmentation with Foundation Models

Li Jiang, Shaoshuai Shi, Bernt Schiele

CVPR 2024highlight

Open-Vocabulary Object 6D Pose Estimation

Jaime Corsetti, Davide Boscaini, Changjae Oh et al.

CVPR 2024highlightarXiv:2312.00690

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Qidong Huang, Xiaoyi Dong, Pan Zhang et al.

CVPR 2024highlightarXiv:2311.17911
365
citations

OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning

Noor Ahmed, Anna Kukleva, Bernt Schiele

CVPR 2024highlightarXiv:2403.18550

Orthogonal Adaptation for Modular Customization of Diffusion Models

Ryan Po, Guandao Yang, Kfir Aberman et al.

CVPR 2024highlightarXiv:2312.02432

Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata

Dongsu Zhang, Francis Williams, Žan Gojčič et al.

CVPR 2024highlightarXiv:2406.08292