2025 Highlight Papers

651 papers found • Page 2 of 14

Boost Your Human Image Generation Model via Direct Preference Optimization

Sanghyeon Na, Yonggyu Kim, Hyunjoon Lee

CVPR 2025highlightarXiv:2405.20216
8
citations

Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation

Jiaer Xia, Bingkui Tong, Yuhang Zang et al.

ICCV 2025highlight
3
citations

Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy

Zesen Cheng, Hang Zhang, Kehan Li et al.

CVPR 2025highlight

BridgeDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment

Tongfan Guan, Jiaxin Guo, Chen Wang et al.

ICCV 2025highlight

BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes

Minkyun Seo, Hyungtae Lim, Kanghee Lee et al.

ICCV 2025highlight
5
citations

BWFormer: Building Wireframe Reconstruction from Airborne LiDAR Point Cloud with Transformer

Yuzhou Liu, Lingjie Zhu, Hanqiao Ye et al.

CVPR 2025highlight
3
citations

CADDreamer: CAD Object Generation from Single-view Images

Yuan Li, Cheng Lin, Yuan Liu et al.

CVPR 2025highlightarXiv:2502.20732
10
citations

Can Generative Video Models Help Pose Estimation?

Ruojin Cai, Jason Y. Zhang, Philipp Henzler et al.

CVPR 2025highlight

Can Machines Understand Composition? Dataset and Benchmark for Photographic Image Composition Embedding and Understanding

Zhaoran Zhao, Peng Lu, Anran Zhang et al.

CVPR 2025highlight

CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image

Jingshun Huang, Haitao Lin, Tianyu Wang et al.

CVPR 2025highlight

CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning

Kuniaki Saito, Donghyun Kim, Kwanyong Park et al.

ICCV 2025highlight

CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction

Yuan Zhou, Qingshan Xu, Jiequan Cui et al.

CVPR 2025highlight

CASAGPT: Cuboid Arrangement and Scene Assembly for Interior Design

Weitao Feng, Hang Zhou, Jing Liao et al.

CVPR 2025highlightarXiv:2504.19478
4
citations

CASP: Compression of Large Multimodal Models Based on Attention Sparsity

Mohsen Gholami, Mohammad Akbari, Kevin Cannons et al.

CVPR 2025highlightarXiv:2503.05936
2
citations

CasP: Improving Semi-Dense Feature Matching Pipeline Leveraging Cascaded Correspondence Priors for Guidance

Peiqi Chen, Lei Yu, Yi Wan et al.

ICCV 2025highlight

CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval

Likai Tian, Jian Zhao, Zechao Hu et al.

CVPR 2025highlight

CH3Depth: Efficient and Flexible Depth Foundation Model with Flow Matching

Jiaqi Li, Yiran Wang, Jinghong Zheng et al.

CVPR 2025highlight

Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective

Duowang Zhu, Xiaohu Huang, Haiyan Huang et al.

CVPR 2025highlight

ChartCap: Mitigating Hallucination of Dense Chart Captioning

Junyoung Lim, Jaewoo Ahn, Gunhee Kim

ICCV 2025highlight
2
citations

CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation

Yuxing Long, Jiyao Zhang, Mingjie Pan et al.

CVPR 2025highlight
2
citations

CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image

Arindam Dutta, Meng Zheng, Zhongpai Gao et al.

ICCV 2025highlight

Circumventing Shortcuts in Audio-visual Deepfake Detection Datasets with Unsupervised Learning

Stefan Smeu, Dragos-Alexandru Boldisor, Dan Oneata et al.

CVPR 2025highlight
9
citations

ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate

Ming Yan, Xincheng Lin, Yuhua Luo et al.

CVPR 2025highlightarXiv:2503.21268

CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering

Tianyu Huai, Jie Zhou, Xingjiao Wu et al.

CVPR 2025highlightarXiv:2503.00413
10
citations

CObL: Toward Zero-Shot Ordinal Layering without User Prompting

Aneel Damaraju, Dean Hazineh, Todd Zickler

ICCV 2025highlight

Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models

Zichen Miao, WEI CHEN, Qiang Qiu

CVPR 2025highlight

CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching

Zizhuo Li, Yifan Lu, Linfeng Tang et al.

ICCV 2025highlight

Combinative Matching for Geometric Shape Assembly

Nahyuk Lee, Juhong Min, Junhong Lee et al.

ICCV 2025highlight

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

Wei Chen, Lin Li, Yongqi Yang et al.

CVPR 2025highlight
12
citations

Compositional Caching for Training-free Open-vocabulary Attribute Detection

Marco Garosi, Alessandro Conti, Gaowen Liu et al.

CVPR 2025highlight
3
citations

Comprehensive Information Bottleneck for Unveiling Universal Attribution to Interpret Vision Transformers

Jung-Ho Hong, Ho-Joong Kim, Kyu-Sung Jeon et al.

CVPR 2025highlight

Confound from All Sides, Distill with Resilience: Multi-Objective Adversarial Paths to Zero-Shot Robustness

Junhao Dong, Jiao Liu, Xinghua Qu et al.

ICCV 2025highlight

Consensus-Driven Active Model Selection

Justin Kay, Grant Horn, Subhransu Maji et al.

ICCV 2025highlightarXiv:2507.23771
2
citations

Contact-Aware Amodal Completion for Human-Object Interaction via Multi-Regional Inpainting

Seunggeun Chi, Pin-Hao Huang, Enna Sachdeva et al.

ICCV 2025highlightarXiv:2508.00427
2
citations

Context-Aware Multimodal Pretraining

Karsten Roth, Zeynep Akata, Dima Damen et al.

CVPR 2025highlight

Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation

Tuna Meral, Enis Simsar, Federico Tombari et al.

ICCV 2025highlight

CoopTrack: Exploring End-to-End Learning for Efficient Cooperative Sequential Perception

Jiaru Zhong, Jiahao Wang, Jiahui Xu et al.

ICCV 2025highlight
8
citations

Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning

Jingjing Jiang, Chao Ma, Xurui Song et al.

ICCV 2025highlightarXiv:2507.07424
7
citations

CoSER: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation

Bonan Li, Zicheng Zhang, Xingyi Yang et al.

CVPR 2025highlight

CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective

Zongheng Tang, Yi Liu, Yifan Sun et al.

ICCV 2025highlight
1
citations

CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos

Nikita Karaev, Iurii Makarov, Jianyuan Wang et al.

ICCV 2025highlight
211
citations

CounterPC: Counterfactual Feature Realignment for Unsupervised Domain Adaptation on Point Clouds

Feng Yang, Yichao Cao, Xiu Su et al.

ICCV 2025highlight

COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts

Jiansheng Li, Xingxuan Zhang, Hao Zou et al.

CVPR 2025highlightarXiv:2504.10158
1
citations

CountSE: Soft Exemplar Open-set Object Counting

Shuai Liu, Peng Zhang, Shiwei Zhang et al.

ICCV 2025highlight

Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting

Hanxi Liu, Yifang Men, Zhouhui Lian

CVPR 2025highlight
1
citations

CRISP: Object Pose and Shape Estimation with Test-Time Adaptation

Jingnan Shi, Rajat Talak, Harry Zhang et al.

CVPR 2025highlight

Cross-Architecture Distillation Made Simple with Redundancy Suppression

Weijia Zhang, Yuehao Liu, Wu Ran et al.

ICCV 2025highlight
2
citations

Cross-modal Causal Relation Alignment for Video Question Grounding

weixing chen, Yang Liu, Binglin Chen et al.

CVPR 2025highlight
7
citations

CrossOver: 3D Scene Cross-Modal Alignment

Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys et al.

CVPR 2025highlightarXiv:2502.15011
7
citations

Cross-View Completion Models are Zero-shot Correspondence Estimators

Honggyu An, Jin Hyeon Kim, Seonghoon Park et al.

CVPR 2025highlight