ICCV 2025 Papers

2,701 papers found • Page 52 of 55

VGMamba: Attribute-to-Location Clue Reasoning for Quantity-Agnostic 3D Visual Grounding

Zhu Yihang, Jinhao Zhang, Yuxuan Wang et al.

ICCV 2025poster

ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis

Onkar Susladkar, Gayatri Deshmukh, Yalcin Tur et al.

ICCV 2025poster
2
citations

Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization

Hao Ju, Shaofei Huang, Si Liu et al.

ICCV 2025poster

VideoAds for Fast-Paced Video Understanding

Zheyuan Zhang, Wanying Dou, Linkai Peng et al.

ICCV 2025poster

VideoAuteur: Towards Long Narrative Video Generation

Junfei Xiao, Feng Cheng, Lu Qi et al.

ICCV 2025posterarXiv:2501.06173

Video Color Grading via Look-Up Table Generation

Seunghyun Shin, Dongmin Shin, Jisu Shin et al.

ICCV 2025poster
1
citations

Video Individual Counting for Moving Drones

Yaowu Fan, Jia Wan, Tao Han et al.

ICCV 2025highlight
3
citations

VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges

Yuxuan Wang, Yiqi Song, Cihang Xie et al.

ICCV 2025poster
3
citations

VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-based Group Relative Policy Optimization

Xinye Cao, Hongcan Guo, Jiawen Qian et al.

ICCV 2025poster

Video Motion Graphs

Haiyang Liu, Zhan Xu, Fating Hong et al.

ICCV 2025highlight

VideoOrion: Tokenizing Object Dynamics in Videos

Yicheng Feng, Yijiang Li, Wanpeng Zhang et al.

ICCV 2025poster

VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling

Hyojun Go, Byeongjun Park, Hyelin Nam et al.

ICCV 2025poster

VideoSetDiff: Identifying and Reasoning Similarities and Differences in Similar Videos

YUE QIU, Yanjun Sun, Takuma Yagi et al.

ICCV 2025poster

Video-T1: Test-time Scaling for Video Generation

Fangfu Liu, Hanyang Wang, Yimo Cai et al.

ICCV 2025poster
15
citations

VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE

Yazhou Xing, Yang Fei, Yingqing He et al.

ICCV 2025poster

Vid-Group: Temporal Video Grounding Pretraining from Unlabeled Videos in the Wild

Peijun Bao, Chenqi Kong, SIYUAN YANG et al.

ICCV 2025poster

ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition

Ronggang Huang, Haoxin Yang, Yan Cai et al.

ICCV 2025poster

VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition Dataset

Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam et al.

ICCV 2025poster

ViLLa: Video Reasoning Segmentation with Large Language Model

rongkun Zheng, Lu Qi, Xi Chen et al.

ICCV 2025poster
16
citations

ViLU: Learning Vision-Language Uncertainties for Failure Prediction

Marc Lafon, Yannis Karmim, Julio Silva-Rodríguez et al.

ICCV 2025poster
2
citations

ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba

Juncan Deng, Shuaiting Li, Zeyu Wang et al.

ICCV 2025posterarXiv:2503.09509

VIPerson: Flexibly Generating Virtual Identity for Person Re-Identification

Xiao-Wen Zhang, Delong Zhang, Yi-Xing Peng et al.

ICCV 2025poster

V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models

Jisoo Kim, Wooseok Seo, Junwan Kim et al.

ICCV 2025poster

VisHall3D: Monocular Semantic Scene Completion from Reconstructing the Visible Regions to Hallucinating the Invisible Regions

Haoang Lu, Yuanqi Su, Xiaoning Zhang et al.

ICCV 2025poster
1
citations

Vision-Language Interactive Relation Mining for Open-Vocabulary Scene Graph Generation

Yukuan Min, Muli Yang, Jinhao Zhang et al.

ICCV 2025poster

Vision-Language Models Can't See the Obvious

YASSER ABDELAZIZ DAHOU DJILALI, Ngoc Huynh, Phúc Lê Khắc et al.

ICCV 2025posterarXiv:2507.04741
7
citations

Vision-Language Neural Graph Featurization for Extracting Retinal Lesions

Taimur Hassan, Anabia Sohail, Muzammal Naseer et al.

ICCV 2025poster

VisionMath: Vision-Form Mathematical Problem-Solving

Zongyang Ma, Yuxin Chen, Ziqi Zhang et al.

ICCV 2025poster

VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models

Taesung Kwon, Jong Ye

ICCV 2025poster
7
citations

VisNumBench: Evaluating Number Sense of Multimodal Large Language Models

Tengjin Weng, Jingyi Wang, Wenhao Jiang et al.

ICCV 2025poster

VISO: Accelerating In-orbit Object Detection with Language-Guided Mask Learning and Sparse Inference

Meiqi Wang, Han Qiu

ICCV 2025poster

ViSpeak: Visual Instruction Feedback in Streaming Videos

Shenghao Fu, Qize Yang, Yuan-Ming Li et al.

ICCV 2025poster
11
citations

VisRL: Intention-Driven Visual Perception via Reinforced Reasoning

Zhangquan Chen, Xufang Luo, Dongsheng Li

ICCV 2025poster

VistaDream: Sampling multiview consistent images for single-view scene reconstruction

Haiping Wang, Yuan Liu, Ziwei Liu et al.

ICCV 2025poster
27
citations

Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images

Boyang Deng, Kyle Genova, Songyou Peng et al.

ICCV 2025highlight

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

Zhong-Yu Li, Ruoyi Du, Juncheng Yan et al.

ICCV 2025poster
20
citations

Visual Intention Grounding for Egocentric Assistants

Pengzhan Sun, Junbin Xiao, Tze Ho Elden Tse et al.

ICCV 2025poster
1
citations

Visual Interestingness Decoded: How GPT-4o Mirrors Human Interests

Fitim Abdullahu, Helmut Grabner

ICCV 2025poster

Visual Modality Prompt for Adapting Vision-Language Object Detectors

Heitor Rapela Medeiros, Atif Belal, Srikanth Muralidharan et al.

ICCV 2025posterarXiv:2412.00622
3
citations

Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models

Zhen Zeng, Leijiang Gu, Xun Yang et al.

ICCV 2025poster
4
citations

Visual Relation Diffusion for Human-Object Interaction Detection

Ping Cao, Yepeng Tang, Chunjie Zhang et al.

ICCV 2025poster
1
citations

Visual-RFT: Visual Reinforcement Fine-Tuning

Ziyu Liu, Zeyi Sun, Yuhang Zang et al.

ICCV 2025poster

Visual Surface Wave Elastography: Revealing Subsurface Physical Properties via Visible Surface Waves

Alexander Ogren, Berthy Feng, Jihoon Ahn et al.

ICCV 2025poster

Visual Test-time Scaling for GUI Agent Grounding

Tiange Luo, Lajanugen Logeswaran, Justin Johnson et al.

ICCV 2025highlight
10
citations

Visual Textualization for Image Prompted Object Detection

Yongjian Wu, Yang Zhou, Jiya Saiyin et al.

ICCV 2025poster

VITAL: More Understandable Feature Visualization through Distribution Alignment and Relevant Information Flow

Ada Görgün, Bernt Schiele, Jonas Fischer

ICCV 2025posterarXiv:2503.22399
1
citations

ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers

Hanwen Cao, Haobo Lu, Xiaosen Wang et al.

ICCV 2025poster

ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models

Guoyizhe Wei, Rama Chellappa

ICCV 2025posterarXiv:2504.00037
2
citations

ViT-Split: Unleashing the Power of Vision Foundation Models via Efficient Splitting Heads

Yifan Li, Xin Li, Tianqin Li et al.

ICCV 2025poster

Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting

Jiaxin Huang, Sheng Miao, Bangbang Yang et al.

ICCV 2025poster
3
citations