ICCV Poster Papers

2,436 papers found • Page 48 of 49

VLM4D: Towards Spatiotemporal Awareness in Vision Language Models

Shijie Zhou, Alexander Vilesov, Xuehai He et al.

ICCV 2025posterarXiv:2508.02095
15
citations

VLR-Driver: Large Vision-Language-Reasoning Models for Embodied Autonomous Driving

Fanjie Kong, Yitong Li, Weihuang Chen et al.

ICCV 2025poster

VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models

JIACHENG RUAN, Wenzhen Yuan, Xian Gao et al.

ICCV 2025posterarXiv:2503.07478
15
citations

VMBench: A Benchmark for Perception-Aligned Video Motion Generation

Xinran Ling, Chen Zhu, Meiqi Wu et al.

ICCV 2025posterarXiv:2503.10076
15
citations

VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

Yash Garg, Saketh Bachu, Arindam Dutta et al.

ICCV 2025posterarXiv:2508.06757

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models

Kim Sung-Bin, Jeongsoo Choi, Puyuan Peng et al.

ICCV 2025posterarXiv:2504.02386
6
citations

VoluMe – Authentic 3D Video Calls from Live Gaussian Splat Prediction

Martin de La Gorce, Charlie Hewitt, Tibor Takács et al.

ICCV 2025posterarXiv:2507.21311
2
citations

VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding

Minchao Jiang, Shunyu Jia, Jiaming Gu et al.

ICCV 2025posterarXiv:2506.22799
3
citations

VOVTrack: Exploring the Potentiality in Raw Videos for Open-Vocabulary Multi-Object Tracking

Zekun Qian, Ruize Han, Junhui Hou et al.

ICCV 2025poster
4
citations

VoxelKP: A Voxel-based Network Architecture for Human Keypoint Estimation in LiDAR Data

Jian Shi, Peter Wonka

ICCV 2025posterarXiv:2312.08871

Voyaging into Perpetual Dynamic Scenes from a Single View

Fengrui Tian, Tianjiao Ding, Jinqi Luo et al.

ICCV 2025posterarXiv:2507.04183
1
citations

VPO: Aligning Text-to-Video Generation Models with Prompt Optimization

Jiale Cheng, Ruiliang Lyu, Xiaotao Gu et al.

ICCV 2025posterarXiv:2503.20491
13
citations

VPR-Cloak: A First Look at Privacy Cloak Against Visual Place Recognition

Shuting Dong, Mingzhi Chen, Feng Lu et al.

ICCV 2025poster

VQ-SGen: A Vector Quantized Stroke Representation for Creative Sketch Generation

Jiawei Wang, Zhiming Cui, Changjian Li

ICCV 2025posterarXiv:2411.16446
1
citations

VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers

Yating Wang, Haoyi Zhu, Mingyu Liu et al.

ICCV 2025posterarXiv:2507.01016
16
citations

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

Jiashuo Yu, Yue Wu, Meng Chu et al.

ICCV 2025posterarXiv:2506.10857
9
citations

VSC: Visual Search Compositional Text-to-Image Diffusion Model

Do Dat, Nam Hyeon-Woo, Po-Yuan Mao et al.

ICCV 2025posterarXiv:2505.01104
2
citations

VSP: Diagnosing the Dual Challenges of Perception and Reasoning in Spatial Planning Tasks for MLLMs

Qiucheng Wu, Handong Zhao, Michael Saxon et al.

ICCV 2025poster
18
citations

VSRM: A Robust Mamba-Based Framework for Video Super-Resolution

Phu Tran Dinh, Hung Dao, Daeyoung Kim

ICCV 2025posterarXiv:2506.22762
1
citations

VSSD: Vision Mamba with Non-Causal State Space Duality

Yuheng Shi, Mingjia Li, Minjing Dong et al.

ICCV 2025posterarXiv:2407.18559
24
citations

VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning

Jinglei Zhang, Yuanfan Guo, Rolandos Alexandros Potamias et al.

ICCV 2025posterarXiv:2510.14672
2
citations

Vulnerability-Aware Spatio-Temporal Learning for Generalizable Deepfake Video Detection

Dat NGUYEN, Marcella Astrid, Anis Kacem et al.

ICCV 2025posterarXiv:2501.01184

WalkVLM: Aid Visually Impaired People Walking by Vision Language Model

Zhiqiang Yuan, Ting Zhang, Yeshuang Zhu et al.

ICCV 2025poster

WarpHE4D: Dense 4D Head Map toward Full Head Reconstruction

Jongseob Yun, Yong-Hoon Kwon, Min-Gyu Park et al.

ICCV 2025poster

Wavelet Policy: Lifting Scheme for Policy Learning in Long-Horizon Tasks

Hao Huang, Shuaihang Yuan, Geeta Chandra Raju Bethala et al.

ICCV 2025posterarXiv:2507.04331

Wave-MambaAD: Wavelet-driven State Space Model for Multi-class Unsupervised Anomaly Detection

Qiao Zhang, Mingwen Shao, Xinyuan Chen et al.

ICCV 2025poster

WaveMamba: Wavelet-Driven Mamba Fusion for RGB-Infrared Object Detection

Haodong Zhu, Wenhao Dong, Linlin Yang et al.

ICCV 2025posterarXiv:2507.18173
5
citations

WAVE: Warp-Based View Guidance for Consistent Novel View Synthesis Using a Single Image

Jiwoo Park, Tae Choi, Youngjun Jun et al.

ICCV 2025posterarXiv:2506.23518

Weakly-Supervised Learning of Dense Functional Correspondences

Stefan Stojanov, Linan Zhao, Yunzhi Zhang et al.

ICCV 2025posterarXiv:2509.03893

Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning

Yafei Zhang, Lingqi Kong, Huafeng Li et al.

ICCV 2025posterarXiv:2507.12942
3
citations

Web Artifact Attacks Disrupt Vision Language Models

Maan Qraitem, Piotr Teterwak, Kate Saenko et al.

ICCV 2025posterarXiv:2503.13652
2
citations

What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning

Chi-Hsi Kung, Frangil Ramirez, Juhyung Ha et al.

ICCV 2025posterarXiv:2503.21055
2
citations

What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models

Lorenzo Baraldi, Davide Bucciarelli, Federico Betti et al.

ICCV 2025posterarXiv:2505.20405
2
citations

What If: Understanding Motion Through Sparse Interactions

Stefan A. Baumann, Nick Stracke, Timy Phan et al.

ICCV 2025poster

What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?

Jinhong Ni, Chang-Bin Zhang, Qiang Zhang et al.

ICCV 2025posterarXiv:2505.22129

What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

Xavier Thomas, Deepti Ghadiyaram

ICCV 2025posterarXiv:2503.06698
2
citations

What's Making That Sound Right Now? Video-centric Audio-Visual Localization

hahyeon choi, Junhoo Lee, Nojun Kwak

ICCV 2025posterarXiv:2507.04667

What we need is explicit controllability: Training 3D gaze estimator using only facial images

Tingwei Li, Jun Bao, Zhenzhong Kuang et al.

ICCV 2025poster

What You Have is What You Track: Adaptive and Robust Multimodal Tracking

Yuedong Tan, Jiawei Shao, Eduard Zamfir et al.

ICCV 2025posterarXiv:2507.05899
3
citations

When Anchors Meet Cold Diffusion: A Multi-Stage Approach to Lane Detection

Bo-Lun Huang, Tzu-Hsiang Ni, Feng-Kai Huang et al.

ICCV 2025poster

When and Where do Data Poisons Attack Textual Inversion?

Jeremy Styborski, Mingzhi Lyu, Jiayou Lu et al.

ICCV 2025posterarXiv:2507.10578

When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning

Junwei Luo, Yingying Zhang, Xue Yang et al.

ICCV 2025posterarXiv:2503.07588
14
citations

When Lighting Deceives: Exposing Vision-Language Models' Illumination Vulnerability Through Illumination Transformation Attack

Hanqing Liu, Shouwei Ruan, Yao Huang et al.

ICCV 2025posterarXiv:2503.06903

When Pixel Difference Patterns Meet ViT: PiDiViT for Few-Shot Object Detection

Hongliang Zhou, Yongxiang Liu, Canyu Mo et al.

ICCV 2025poster

When Schrödinger Bridge Meets Real-World Image Dehazing with Unpaired Training

Yunwei Lan, Zhigao Cui, Xin Luo et al.

ICCV 2025posterarXiv:2507.09524
2
citations

Where am I? Cross-View Geo-localization with Natural Language Descriptions

Junyan Ye, Honglin Lin, Leyan Ou et al.

ICCV 2025posterarXiv:2412.17007
16
citations

Who Controls the Authorization? Invertible Networks for Copyright Protection in Text-to-Image Synthesis

Baoyue Hu, Yang Wei, Junhao Xiao et al.

ICCV 2025poster

Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads

Yingjie Zhou, Jiezhang Cao, Zicheng Zhang et al.

ICCV 2025posterarXiv:2507.23343
2
citations

Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context

Ge Zheng, Jiaye Qian, Jiajin Tang et al.

ICCV 2025posterarXiv:2510.20229
6
citations

Wide2Long: Learning Lens Compression and Perspective Adjustment for Wide-Angle to Telephoto Translation

Soumyadipta Banerjee, Jiaul Paik, Debashis Sen

ICCV 2025poster