ICCV Poster Papers
2,436 papers found • Page 47 of 49
VGGSounder: Audio-Visual Evaluations for Foundation Models
Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu et al.
VGMamba: Attribute-to-Location Clue Reasoning for Quantity-Agnostic 3D Visual Grounding
Zhu Yihang, Jinhao Zhang, Yuxuan Wang et al.
ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis
Onkar Susladkar, Gayatri Deshmukh, Yalcin Tur et al.
Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization
Hao Ju, Shaofei Huang, Si Liu et al.
VideoAds for Fast-Paced Video Understanding
Zheyuan Zhang, Wanying Dou, Linkai Peng et al.
VideoAuteur: Towards Long Narrative Video Generation
Junfei Xiao, Feng Cheng, Lu Qi et al.
Video Color Grading via Look-Up Table Generation
Seunghyun Shin, Dongmin Shin, Jisu Shin et al.
VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges
Yuxuan Wang, Yiqi Song, Cihang Xie et al.
VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-based Group Relative Policy Optimization
Xinye Cao, Hongcan Guo, Jiawen Qian et al.
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng, Yijiang Li, Wanpeng Zhang et al.
VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling
Hyojun Go, Byeongjun Park, Hyelin Nam et al.
VideoSetDiff: Identifying and Reasoning Similarities and Differences in Similar Videos
YUE QIU, Yanjun Sun, Takuma Yagi et al.
Video-T1: Test-time Scaling for Video Generation
Fangfu Liu, Hanyang Wang, Yimo Cai et al.
VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE
Yazhou Xing, Yang Fei, Yingqing He et al.
Vid-Group: Temporal Video Grounding Pretraining from Unlabeled Videos in the Wild
Peijun Bao, Chenqi Kong, SIYUAN YANG et al.
ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition
Ronggang Huang, Haoxin Yang, Yan Cai et al.
VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition Dataset
Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam et al.
ViLLa: Video Reasoning Segmentation with Large Language Model
rongkun Zheng, Lu Qi, Xi Chen et al.
ViLU: Learning Vision-Language Uncertainties for Failure Prediction
Marc Lafon, Yannis Karmim, Julio Silva-Rodríguez et al.
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng, Shuaiting Li, Zeyu Wang et al.
VIPerson: Flexibly Generating Virtual Identity for Person Re-Identification
Xiao-Wen Zhang, Delong Zhang, Yi-Xing Peng et al.
V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models
Jisoo Kim, Wooseok Seo, Junwan Kim et al.
VisHall3D: Monocular Semantic Scene Completion from Reconstructing the Visible Regions to Hallucinating the Invisible Regions
Haoang Lu, Yuanqi Su, Xiaoning Zhang et al.
Vision-Language Interactive Relation Mining for Open-Vocabulary Scene Graph Generation
Yukuan Min, Muli Yang, Jinhao Zhang et al.
Vision-Language Models Can't See the Obvious
YASSER ABDELAZIZ DAHOU DJILALI, Ngoc Huynh, Phúc Lê Khắc et al.
Vision-Language Neural Graph Featurization for Extracting Retinal Lesions
Taimur Hassan, Anabia Sohail, Muzammal Naseer et al.
VisionMath: Vision-Form Mathematical Problem-Solving
Zongyang Ma, Yuxin Chen, Ziqi Zhang et al.
VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models
Taesung Kwon, Jong Ye
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models
Tengjin Weng, Jingyi Wang, Wenhao Jiang et al.
VISO: Accelerating In-orbit Object Detection with Language-Guided Mask Learning and Sparse Inference
Meiqi Wang, Han Qiu
ViSpeak: Visual Instruction Feedback in Streaming Videos
Shenghao Fu, Qize Yang, Yuan-Ming Li et al.
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen, Xufang Luo, Dongsheng Li
VistaDream: Sampling multiview consistent images for single-view scene reconstruction
Haiping Wang, Yuan Liu, Ziwei Liu et al.
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
Zhong-Yu Li, Ruoyi Du, Juncheng Yan et al.
Visual Intention Grounding for Egocentric Assistants
Pengzhan Sun, Junbin Xiao, Tze Ho Elden Tse et al.
Visual Interestingness Decoded: How GPT-4o Mirrors Human Interests
Fitim Abdullahu, Helmut Grabner
Visual Modality Prompt for Adapting Vision-Language Object Detectors
Heitor Rapela Medeiros, Atif Belal, Srikanth Muralidharan et al.
Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
Zhen Zeng, Leijiang Gu, Xun Yang et al.
Visual Relation Diffusion for Human-Object Interaction Detection
Ping Cao, Yepeng Tang, Chunjie Zhang et al.
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu, Zeyi Sun, Yuhang Zang et al.
Visual Surface Wave Elastography: Revealing Subsurface Physical Properties via Visible Surface Waves
Alexander Ogren, Berthy Feng, Jihoon Ahn et al.
Visual Textualization for Image Prompted Object Detection
Yongjian Wu, Yang Zhou, Jiya Saiyin et al.
VITAL: More Understandable Feature Visualization through Distribution Alignment and Relevant Information Flow
Ada Görgün, Bernt Schiele, Jonas Fischer
ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers
Hanwen Cao, Haobo Lu, Xiaosen Wang et al.
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei, Rama Chellappa
ViT-Split: Unleashing the Power of Vision Foundation Models via Efficient Splitting Heads
Yifan Li, Xin Li, Tianqin Li et al.
Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting
Jiaxin Huang, Sheng Miao, Bangbang Yang et al.
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
shiduo zhang, Zhe Xu, Peiju Liu et al.
VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving
Ruifei Zhang, Wei Zhang, Xiao Tan et al.
VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior
Xindi Yang, Baolu Li, Yiming Zhang et al.