Most Cited ICCV "crowdsourcing platforms" Papers

2,701 papers found • Page 5 of 14

#801

Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation

Jungeun Kim, Hyeongwoo Jeon, Jongseong Bae et al.

ICCV 2025arXiv:2411.16789
4
citations
#802

GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning

Kelin Yu, Sheng Zhang, Harshit Soora et al.

ICCV 2025arXiv:2508.11049
4
citations
#803

TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning In Text-to-Image Models

Teng-Fang Hsiao, Bo-Kai Ruan, Yi-Lun Wu et al.

ICCV 2025arXiv:2503.15283
4
citations
#804

DuoLoRA : Cycle-consistent and Rank-disentangled Content-Style Personalization

Aniket Roy, Shubhankar Borse, Shreya Kadambi et al.

ICCV 2025arXiv:2504.13206
4
citations
#805

IDF: Iterative Dynamic Filtering Networks for Generalizable Image Denoising

Dongjin Kim, Jaekyun Ko, Muhammad Kashif Ali et al.

ICCV 2025arXiv:2508.19649
4
citations
#806

Learning Streaming Video Representation via Multitask Training

Yibin Yan, Jilan Xu, Shangzhe Di et al.

ICCV 2025arXiv:2504.20041
4
citations
#807

What You Have is What You Track: Adaptive and Robust Multimodal Tracking

Yuedong Tan, Jiawei Shao, Eduard Zamfir et al.

ICCV 2025arXiv:2507.05899
4
citations
#808

Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description

Anna-Maria Halacheva, Yang Miao, Jan-Nico Zaech et al.

ICCV 2025arXiv:2412.01398
4
citations
#809

EAMamba: Efficient All-Around Vision State Space Model for Image Restoration

Yu-Cheng Lin, Yu-Syuan Xu, Hao-Wei Chen et al.

ICCV 2025arXiv:2506.22246
4
citations
#810

D3: Training-Free AI-Generated Video Detection Using Second-Order Features

Chende Zheng, Ruiqi suo, Chenhao Lin et al.

ICCV 2025arXiv:2508.00701
4
citations
#811

GS-ID: Illumination Decomposition on Gaussian Splatting via Adaptive Light Aggregation and Diffusion-Guided Material Priors

Kang DU, Zhihao Liang, Yulin Shen et al.

ICCV 2025arXiv:2408.08524
4
citations
#812

CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation

Leon Sick, Dominik Engel, Sebastian Hartwig et al.

ICCV 2025arXiv:2411.16319
4
citations
#813

Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation

Tiange Xiang, Kai Li, Chengjiang Long et al.

ICCV 2025arXiv:2503.15877
4
citations
#814

GAP: Gaussianize Any Point Clouds with Text Guidance

Weiqi Zhang, Junsheng Zhou, Haotian Geng et al.

ICCV 2025arXiv:2508.05631
4
citations
#815

Towards Higher Effective Rank in Parameter-Efficient Fine-tuning using Khatri-Rao Product

Paul Albert, Frederic Zhang, Hemanth Saratchandran et al.

ICCV 2025arXiv:2508.00230
4
citations
#816

OuroMamba: A Data-Free Quantization Framework for Vision Mamba

Akshat Ramachandran, Mingyu Lee, Huan Xu et al.

ICCV 2025arXiv:2503.10959
4
citations
#817

Acknowledging Focus Ambiguity in Visual Questions

Chongyan Chen, Yu-Yun Tseng, Zhuoheng Li et al.

ICCV 2025arXiv:2501.02201
4
citations
#818

ARGUS: Hallucination and Omission Evaluation in Video-LLMs

Ruchit Rawal, Reza Shirkavand, Heng Huang et al.

ICCV 2025arXiv:2506.07371
4
citations
#819

OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering

Shiyong Liu, Xiao Tang, Zhihao Li et al.

ICCV 2025arXiv:2503.16177
4
citations
#820

GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology

Saarthak Kapse, Pushpak Pati, Srikar Yellapragada et al.

ICCV 2025highlightarXiv:2504.01009
4
citations
#821

INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

Chenwei Lin, Hanjia Lyu, Xian Xu et al.

ICCV 2025arXiv:2406.09105
4
citations
#822

Test-time Adaptation for Foundation Medical Segmentation Model Without Parametric Updates

Kecheng Chen, Xinyu Luo, Tiexin Qin et al.

ICCV 2025highlightarXiv:2504.02008
4
citations
#823

VGGSounder: Audio-Visual Evaluations for Foundation Models

Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu et al.

ICCV 2025arXiv:2508.08237
4
citations
#824

TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction

Dadong Jiang, Zhi Hou, Zhihui Ke et al.

ICCV 2025arXiv:2411.11941
4
citations
#825

Learning to Generalize without Bias for Open-Vocabulary Action Recognition

Yating Yu, Congqi Cao, Yifan Zhang et al.

ICCV 2025highlightarXiv:2502.20158
4
citations
#826

Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization

Li, Yang Xiao, Jie Ji et al.

ICCV 2025arXiv:2504.09039
4
citations
#827

Self-Calibrated Variance-Stabilizing Transformations for Real-World Image Denoising

Sébastien Herbreteau, Michael Unser

ICCV 2025arXiv:2407.17399
4
citations
#828

MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration

Zhehui Wu, Yong Chen, Naoto Yokoya et al.

ICCV 2025arXiv:2503.09131
4
citations
#829

DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding

Wenwen Yu, Zhibo Yang, Yuliang Liu et al.

ICCV 2025arXiv:2508.08589
4
citations
#830

Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models

Xudong Li, Zihao Huang, Yan Zhang et al.

ICCV 2025arXiv:2409.05381
4
citations
#831

Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering

shanlin sun, Yifan Wang, Hanwen Zhang et al.

ICCV 2025arXiv:2508.14461
4
citations
#832

UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation

Chaitanya Patel, Hiroki Nakamura, Yuta Kyuragi et al.

ICCV 2025arXiv:2508.01126
4
citations
#833

SAM4D: Segment Anything in Camera and LiDAR Streams

Jianyun Xu, Song Wang, Ziqian Ni et al.

ICCV 2025arXiv:2506.21547
4
citations
#834

MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models

Young-Jun Lee, Byung-Kwan Lee, Jianshu Zhang et al.

ICCV 2025arXiv:2510.16641
4
citations
#835

SparseRecon: Neural Implicit Surface Reconstruction from Sparse Views with Feature and Depth Consistencies

Liang Han, Xu Zhang, Haichuan Song et al.

ICCV 2025arXiv:2508.00366
4
citations
#836

BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation

Ruotong Wang, Mingli Zhu, Jiarong Ou et al.

ICCV 2025arXiv:2504.16907
4
citations
#837

A Token-level Text Image Foundation Model for Document Understanding

Tongkun Guan, Zining Wang, Pei Fu et al.

ICCV 2025arXiv:2503.02304
4
citations
#838

Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models

Sangwon Baik, Hyeonwoo Kim, Hanbyul Joo

ICCV 2025arXiv:2503.19914
4
citations
#839

Self-supervised Learning of Hybrid Part-aware 3D Representations of 2D Gaussians and Superquadrics

Zhirui Gao, Renjiao Yi, Yuhang Huang et al.

ICCV 2025arXiv:2408.10789
4
citations
#840

Region-based Cluster Discrimination for Visual Representation Learning

Yin Xie, Kaicheng Yang, Xiang An et al.

ICCV 2025highlightarXiv:2507.20025
4
citations
#841

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

Andreas Engelhardt, Mark Boss, Vikram Voleti et al.

ICCV 2025arXiv:2510.08271
4
citations
#842

Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues

Francesco Taioli, Edoardo Zorzi, Gianni Franchi et al.

ICCV 2025arXiv:2412.01250
4
citations
#843

Controllable 3D Outdoor Scene Generation via Scene Graphs

Yuheng Liu, Xinke Li, Yuning Zhang et al.

ICCV 2025arXiv:2503.07152
4
citations
#844

Multi-View 3D Point Tracking

Frano Rajič, Haofei Xu, Marko Mihajlovic et al.

ICCV 2025arXiv:2508.21060
4
citations
#845

Dynamic Multimodal Prototype Learning in Vision-Language Models

Xingyu Zhu, Shuo Wang, Beier Zhu et al.

ICCV 2025arXiv:2507.03657
4
citations
#846

CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image

Arindam Dutta, Meng Zheng, Zhongpai Gao et al.

ICCV 2025highlightarXiv:2503.15671
4
citations
#847

Not all Views are Created Equal: Analyzing Viewpoint Instabilities in Vision Foundation Models

Mateusz Michalkiewicz, Xinyue Bai, Mahsa Baktashmotlagh et al.

ICCV 2025arXiv:2412.19920
4
citations
#848

SP2T: Sparse Proxy Attention for Dual-stream Point Transformer

Jiaxu Wan, Hong Zhang, Ziqi He et al.

ICCV 2025
4
citations
#849

MonoMobility: Zero-Shot 3D Mobility Analysis from Monocular Videos

Hongyi Zhou, Xiaogang Wang, Yulan Guo et al.

ICCV 2025arXiv:2505.11868
4
citations
#850

QuickSplat: Fast 3D Surface Reconstruction via Learned Gaussian Initialization

Yueh-Cheng Liu, Lukas Höllein, Matthias Nießner et al.

ICCV 2025arXiv:2505.05591
4
citations
#851

BridgeDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment

Tongfan Guan, Jiaxin Guo, Chen Wang et al.

ICCV 2025highlightarXiv:2508.04611
4
citations
#852

CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching

Zizhuo Li, Yifan Lu, Linfeng Tang et al.

ICCV 2025highlightarXiv:2503.23925
4
citations
#853

UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis

Yuanrui Wang, Cong Han, Yafei Li et al.

ICCV 2025arXiv:2507.00992
4
citations
#854

BokehDiff: Neural Lens Blur with One-Step Diffusion

Chengxuan Zhu, Qingnan Fan, Qi Zhang et al.

ICCV 2025arXiv:2507.18060
4
citations
#855

SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation

Wenjia Wang, Liang Pan, Zhiyang Dou et al.

ICCV 2025arXiv:2411.19921
4
citations
#856

ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers

Qianhao Yuan, Qingyu Zhang, yanjiang liu et al.

ICCV 2025arXiv:2504.00502
4
citations
#857

Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing

Taihang Hu, Linxuan Li, Kai Wang et al.

ICCV 2025arXiv:2504.10434
4
citations
#858

LightSwitch: Multi-view Relighting with Material-guided Diffusion

Yehonathan Litman, Fernando De la Torre, Shubham Tulsiani

ICCV 2025arXiv:2508.06494
4
citations
#859

PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Teng Zhou, Xiaoyu Zhang, Yongchuan Tang

ICCV 2025highlightarXiv:2411.15867
4
citations
#860

Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection

Yupeng Hu, Changxing Ding, Chang Sun et al.

ICCV 2025arXiv:2507.06510
4
citations
#861

LayerD: Decomposing Raster Graphic Designs into Layers

Tomoyuki Suzuki, Kang-Jun Liu, Naoto Inoue et al.

ICCV 2025arXiv:2509.25134
4
citations
#862

PriorMotion: Generative Class-Agnostic Motion Prediction with Raster-Vector Motion Field Priors

Kangan Qian, Jinyu Miao, Xinyu Jiao et al.

ICCV 2025
4
citations
#863

ResGS: Residual Densification of 3D Gaussian for Efficient Detail Recovery

Yanzhe Lyu, Kai Cheng, Kang Xin et al.

ICCV 2025arXiv:2412.07494
4
citations
#864

SMoLoRA: Exploring and Defying Dual Catastrophic Forgetting in Continual Visual Instruction Tuning

Ziqi Wang, Chang Che, Qi Wang et al.

ICCV 2025arXiv:2411.13949
4
citations
#865

Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness

Qifan Yu, Zhebei Shen, Zhongqi Yue et al.

ICCV 2025highlightarXiv:2412.06293
4
citations
#866

Inverse 3D Microscopy Rendering for Cell Shape Inference with Active Mesh

Sacha Ichbiah, Anshuman Sinha, Fabrice Delbary et al.

ICCV 2025highlightarXiv:2303.10440
4
citations
#867

Balanced Image Stylization with Style Matching Score

Yuxin Jiang, Liming Jiang, Shuai Yang et al.

ICCV 2025arXiv:2503.07601
4
citations
#868

VOVTrack: Exploring the Potentiality in Raw Videos for Open-Vocabulary Multi-Object Tracking

Zekun Qian, Ruize Han, Junhui Hou et al.

ICCV 2025
4
citations
#869

3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection

Yung-Hsu Yang, Luigi Piccinelli, Mattia Segu et al.

ICCV 2025arXiv:2507.23567
4
citations
#870

Occupancy Learning with Spatiotemporal Memory

Ziyang Leng, Jiawei Yang, Wenlong Yi et al.

ICCV 2025arXiv:2508.04705
4
citations
#871

CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting

Siyu Jiao, Haoye Dong, Yuyang Yin et al.

ICCV 2025arXiv:2412.19142
4
citations
#872

From Image to Video: An Empirical Study of Diffusion Representations

Pedro Vélez, Luisa Polania Cabrera, Yi Yang et al.

ICCV 2025highlightarXiv:2502.07001
4
citations
#873

TrustMark: Robust Watermarking and Watermark Removal for Arbitrary Resolution Images

Tu Bui, Shruti Agarwal, John Collomosse

ICCV 2025
4
citations
#874

X-Capture: An Open-Source Portable Device for Multi-Sensory Learning

Samuel Clarke, Suzannah Wistreich, Yanjie Ze et al.

ICCV 2025arXiv:2504.02318
4
citations
#875

Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion

Aleksandar Jevtić, Christoph Reich, Felix Wimbauer et al.

ICCV 2025arXiv:2507.06230
3
citations
#876

DiMPLe - Disentangled Multi-Modal Prompt Learning: Enhancing Out-Of-Distribution Alignment with Invariant and Spurious Feature Separation

Umaima Rahman, Mohammad Yaqub, Dwarikanath Mahapatra

ICCV 2025arXiv:2506.21237
3
citations
#877

LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs

Hanyu Zhou, Gim Hee Lee

ICCV 2025arXiv:2503.06934
3
citations
#878

Jigsaw++: Imagining Complete Shape Priors for Object Reassembly

Jiaxin Lu, Gang Hua, Qixing Huang

ICCV 2025arXiv:2410.11816
3
citations
#879

Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation

HIroyasu Akada, Jian Wang, Vladislav Golyanik et al.

ICCV 2025arXiv:2503.11652
3
citations
#880

GaRe: Relightable 3D Gaussian Splatting for Outdoor Scenes from Unconstrained Photo Collections

Haiyang Bai, Jiaqi Zhu, Songru Jiang et al.

ICCV 2025arXiv:2507.20512
3
citations
#881

A Differentiable Wave Optics Model for End-to-End Computational Imaging System Optimization

Chi-Jui Ho, Yash Belhe, Steve Rotenberg et al.

ICCV 2025arXiv:2412.09774
3
citations
#882

Predict-Optimize-Distill: A Self-Improving Cycle for 4D Object Understanding

Mingxuan Wu, Huang Huang, Justin Kerr et al.

ICCV 2025arXiv:2504.17441
3
citations
#883

GT-Loc: Unifying When and Where in Images through a Joint Embedding Space

David G. Shatwell, Ishan Rajendrakumar Dave, Swetha Sirnam et al.

ICCV 2025arXiv:2507.10473
3
citations
#884

Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection

Jinglun Li, Kaixun Jiang, Zhaoyu Chen et al.

ICCV 2025highlightarXiv:2507.10225
3
citations
#885

On the Generalization of Representation Uncertainty in Earth Observation

Spyros Kondylatos, Nikolaos Ioannis Bountos, Dimitrios Michail et al.

ICCV 2025arXiv:2503.07082
3
citations
#886

On Large Multimodal Models as Open-World Image Classifiers

Alessandro Conti, Massimiliano Mancini, Enrico Fini et al.

ICCV 2025arXiv:2503.21851
3
citations
#887

Joint Diffusion Models in Continual Learning

Paweł Skierś, Kamil Deja

ICCV 2025arXiv:2411.08224
3
citations
#888

ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling

Jinhyung Park, Javier Romero, Shunsuke Saito et al.

ICCV 2025arXiv:2508.15767
3
citations
#889

Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features

Chancharik Mitra, Brandon Huang, Tianning Chai et al.

ICCV 2025arXiv:2412.00142
3
citations
#890

CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving

Changxing Liu, Genjia Liu, Zijun Wang et al.

ICCV 2025arXiv:2503.08683
3
citations
#891

FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation

Yunpeng Bai, Qixing Huang

ICCV 2025arXiv:2412.00671
3
citations
#892

Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting

Guangben Lu, Yuzhen N/A, Zhimin Sun et al.

ICCV 2025arXiv:2412.03812
3
citations
#893

VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding

Minchao Jiang, Shunyu Jia, Jiaming Gu et al.

ICCV 2025arXiv:2506.22799
3
citations
#894

Online Language Splatting

Saimouli Katragadda, Cho-Ying Wu, Yuliang Guo et al.

ICCV 2025arXiv:2503.09447
3
citations
#895

Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation

Zhenjun Yu, Wenqiang Xu, Pengfei Xie et al.

ICCV 2025arXiv:2411.09572
3
citations
#896

Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation

Junyu Xie, Tengda Han, Max Bain et al.

ICCV 2025arXiv:2504.01020
3
citations
#897

Resonance: Learning to Predict Social-Aware Pedestrian Trajectories as Co-Vibrations

Conghao Wong, Ziqian Zou, Beihao Xia

ICCV 2025arXiv:2412.02447
3
citations
#898

GS-Occ3D: Scaling Vision-only Occupancy Reconstruction with Gaussian Splatting

Baijun Ye, Minghui Qin, Saining Zhang et al.

ICCV 2025arXiv:2507.19451
3
citations
#899

RoboTron-Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction

Yufeng Zhong, Chengjian Feng, Feng yan et al.

ICCV 2025arXiv:2503.18525
3
citations
#900

Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle

Miroslav Purkrabek, Jiri Matas

ICCV 2025arXiv:2412.01562
3
citations
#901

Enhancing Image Restoration Transformer via Adaptive Translation Equivariance

JiaKui Hu, Zhengjian Yao, Lujia Jin et al.

ICCV 2025arXiv:2506.18520
3
citations
#902

SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency

Yangyang Guo, Mohan Kankanhalli

ICCV 2025arXiv:2411.09126
3
citations
#903

Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning

Zihua Zhao, Feng Hong, Mengxi Chen et al.

ICCV 2025arXiv:2507.12998
3
citations
#904

EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device

Gunjan Chhablani, Xiaomeng Ye, Muhammad Zubair Irshad et al.

ICCV 2025arXiv:2509.17430
3
citations
#905

Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations

Hai Huang, Yan Xia, Sashuai Zhou et al.

ICCV 2025arXiv:2507.03304
3
citations
#906

Grouped Speculative Decoding for Autoregressive Image Generation

Junhyuk So, Juncheol Shin, Hyunho Kook et al.

ICCV 2025arXiv:2508.07747
3
citations
#907

Task Vector Quantization for Memory-Efficient Model Merging

Youngeun Kim, Seunghwan Lee, Aecheon Jung et al.

ICCV 2025arXiv:2503.06921
3
citations
#908

AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model

Wenlun Zhang, Yunshan Zhong, Shimpei Ando et al.

ICCV 2025arXiv:2503.03088
3
citations
#909

O-MaMa: Learning Object Mask Matching between Egocentric and Exocentric Views

Lorenzo Mur-Labadia, Maria Santos-Villafranca, Jesus Bermudez-cameo et al.

ICCV 2025arXiv:2506.06026
3
citations
#910

Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling

Chao Zhou, Tianyi Wei, Nenghai Yu

ICCV 2025arXiv:2507.16240
3
citations
#911

Semantic Causality-Aware Vision-Based 3D Occupancy Prediction

Dubing Chen, Huan Zheng, Yucheng Zhou et al.

ICCV 2025arXiv:2509.08388
3
citations
#912

X2-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction

Weihao Yu, Yuanhao Cai, Ruyi Zha et al.

ICCV 2025
3
citations
#913

Dark-ISP: Enhancing RAW Image Processing for Low-Light Object Detection

Jiasheng Guo, Xin Gao, Yuxiang Yan et al.

ICCV 2025arXiv:2509.09183
3
citations
#914

Large-scale Pre-training for Grounded Video Caption Generation

Evangelos Kazakos, Cordelia Schmid, Josef Sivic

ICCV 2025arXiv:2503.10781
3
citations
#915

Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation

Congyi Fan, Jian Guan, Xuanjia Zhao et al.

ICCV 2025arXiv:2503.17340
3
citations
#916

4D Gaussian Splatting SLAM

Yanyan Li, Youxu Fang, Zunjie Zhu et al.

ICCV 2025arXiv:2503.16710
3
citations
#917

Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning

Yafei Zhang, Lingqi Kong, Huafeng Li et al.

ICCV 2025arXiv:2507.12942
3
citations
#918

Integrating Visual Interpretation and Linguistic Reasoning for Geometric Problem Solving

Zixian Guo, Ming Liu, Qilong Wang et al.

ICCV 2025
3
citations
#919

PHATNet: A Physics-guided Haze Transfer Network for Domain-adaptive Real-world Image Dehazing

Fu-Jen Tsai, Yan-Tsung Peng, Yen-Yu Lin et al.

ICCV 2025arXiv:2507.14826
3
citations
#920

TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models

Ziyang Luo, Nian Liu, Xuguang Yang et al.

ICCV 2025arXiv:2506.11436
3
citations
#921

Boosting Adversarial Transferability via Residual Perturbation Attack

Jinjia Peng, Zeze Tao, Huibing Wang et al.

ICCV 2025arXiv:2508.05689
3
citations
#922

Breaking the Encoder Barrier for Seamless Video-Language Understanding

Handong Li, Yiyuan Zhang, Longteng Guo et al.

ICCV 2025arXiv:2503.18422
3
citations
#923

SEAL: Semantic Aware Image Watermarking

Kasra Arabi, R. Teal Witter, Chinmay Hegde et al.

ICCV 2025arXiv:2503.12172
3
citations
#924

TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis

Tri Ton, Ji Woo Hong, Chang Yoo

ICCV 2025arXiv:2504.05684
3
citations
#925

PlugMark: A Plug-in Zero-Watermarking Framework for Diffusion Models

Pengzhen Chen, Yanwei Liu, Xiaoyan Gu et al.

ICCV 2025
3
citations
#926

SHeaP: Self-supervised Head Geometry Predictor Learned via 2D Gaussians

Liam Schoneveld, Zhe Chen, Davide Davoli et al.

ICCV 2025arXiv:2504.12292
3
citations
#927

Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow

Ruyang Liu, Shangkun Sun, Haoran Tang et al.

ICCV 2025arXiv:2510.05836
3
citations
#928

DMesh++: An Efficient Differentiable Mesh for Complex Shapes

Sanghyun Son, Matheus Gadelha, Yang Zhou et al.

ICCV 2025arXiv:2412.16776
3
citations
#929

SemGes: Semantics-aware Co-Speech Gesture Generation using Semantic Coherence and Relevance Learning

Lanmiao Liu, Esam Ghaleb, asli ozyurek et al.

ICCV 2025arXiv:2507.19359
3
citations
#930

Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator

Ronglai Zuo, Rolandos Alexandros Potamias, Evangelos Ververas et al.

ICCV 2025arXiv:2411.17799
3
citations
#931

Amodal Depth Anything: Amodal Depth Estimation in the Wild

Zhenyu Li, Mykola Lavreniuk, Jian Shi et al.

ICCV 2025arXiv:2412.02336
3
citations
#932

Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints

Guanjie Chen, Xinyu Zhao, Yucheng Zhou et al.

ICCV 2025arXiv:2411.17616
3
citations
#933

AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion

Yangyi Huang, Ye Yuan, Xueting Li et al.

ICCV 2025arXiv:2505.24877
3
citations
#934

MVGBench: a Comprehensive Benchmark for Multi-view Generation Models

Xianghui Xie, Jan Lenssen, Gerard Pons-Moll

ICCV 2025
3
citations
#935

ReferEverything: Towards Segmenting Everything We Can Speak of in Videos

Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang et al.

ICCV 2025arXiv:2410.23287
3
citations
#936

StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion

Ziyu Guo, Young-Yoon Lee, Joseph Liu et al.

ICCV 2025arXiv:2503.21775
3
citations
#937

Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning

Zeyu Xi, Haoying Sun, Yaofei Wu et al.

ICCV 2025arXiv:2507.20163
3
citations
#938

UniRes: Universal Image Restoration for Complex Degradations

Mo Zhou, Keren Ye, Mauricio Delbracio et al.

ICCV 2025arXiv:2506.05599
3
citations
#939

PVChat: Personalized Video Chat with One-Shot Learning

YUFEI SHI, Weilong Yan, Gang Xu et al.

ICCV 2025arXiv:2503.17069
3
citations
#940

Efficient Unsupervised Shortcut Learning Detection and Mitigation in Transformers

Lukas Kuhn, sari sadiya, Jörg Schlötterer et al.

ICCV 2025arXiv:2501.00942
3
citations
#941

GenM3: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation

Junyu Shi, Lijiang LIU, Yong Sun et al.

ICCV 2025
3
citations
#942

From Panels to Prose: Generating Literary Narratives from Comics

Ragav Sachdeva, Andrew Zisserman

ICCV 2025arXiv:2503.23344
3
citations
#943

VAFlow: Video-to-Audio Generation with Cross-Modality Flow Matching

Xihua Wang, Xin Cheng, Yuyue Wang et al.

ICCV 2025
3
citations
#944

Self-Reinforcing Prototype Evolution with Dual-Knowledge Cooperation for Semi-Supervised Lifelong Person Re-Identification

Kunlun Xu, Fan Zhuo, Jiangmeng Li et al.

ICCV 2025arXiv:2507.01884
3
citations
#945

GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation

Wentao Hu, Shunkai Li, Ziqiao Peng et al.

ICCV 2025highlightarXiv:2506.21513
3
citations
#946

Visual Modality Prompt for Adapting Vision-Language Object Detectors

Heitor Rapela Medeiros, Atif Belal, Srikanth Muralidharan et al.

ICCV 2025arXiv:2412.00622
3
citations
#947

Memory-Efficient 4-bit Preconditioned Stochastic Optimization

Jingyang Li, Kuangyu Ding, Kim-chuan Toh et al.

ICCV 2025arXiv:2412.10663
3
citations
#948

MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation

Fu Rong, Meng Lan, Qian Zhang et al.

ICCV 2025arXiv:2501.13667
3
citations
#949

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models

Quang-Binh Nguyen, Minh Luu, Quang Nguyen et al.

ICCV 2025arXiv:2507.13984
3
citations
#950

Discretized Gaussian Representation for Tomographic Reconstruction

Shaokai Wu, Yuxiang Lu, Yapan Guo et al.

ICCV 2025arXiv:2411.04844
3
citations
#951

FG-OrIU: Towards Better Forgetting via Feature-Gradient Orthogonality for Incremental Unlearning

qian feng, Jiahang Tu, Mintong Kang et al.

ICCV 2025arXiv:2601.13578
3
citations
#952

Cross-Subject Mind Decoding from Inaccurate Representations

Yangyang Xu, Bangzhen Liu, Wenqi Shao et al.

ICCV 2025arXiv:2507.19071
3
citations
#953

Exploiting Diffusion Prior for Task-driven Image Restoration

Jaeha Kim, Junghun Oh, Kyoung Mu Lee

ICCV 2025arXiv:2507.22459
3
citations
#954

Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image

Jerred Chen, Ronald Clark

ICCV 2025arXiv:2503.17358
3
citations
#955

Progressive Test Time Energy Adaptation for Medical Image Segmentation

Xiaoran Zhang, Byung-Woo Hong, Hyoungseob Park et al.

ICCV 2025highlightarXiv:2503.16616
3
citations
#956

Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval

Dohwan Ko, Ji Soo Lee, Minhyuk Choi et al.

ICCV 2025highlightarXiv:2507.23284
3
citations
#957

Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting

Jiaxin Huang, Sheng Miao, Bangbang Yang et al.

ICCV 2025arXiv:2504.11092
3
citations
#958

Video Individual Counting for Moving Drones

Yaowu Fan, Jia Wan, Tao Han et al.

ICCV 2025highlightarXiv:2503.10701
3
citations
#959

BATCLIP: Bimodal Online Test-Time Adaptation for CLIP

Sarthak Kumar Maharana, Baoming Zhang, Leonid Karlinsky et al.

ICCV 2025arXiv:2412.02837
3
citations
#960

Generalizable Object Re-Identification via Visual In-Context Prompting

Zhizhong Huang, Xiaoming Liu

ICCV 2025arXiv:2508.21222
3
citations
#961

Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads

Yingjie Zhou, Jiezhang Cao, Zicheng Zhang et al.

ICCV 2025arXiv:2507.23343
3
citations
#962

Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation

Qi Guo, Zhen Tian, Minghao Yao et al.

ICCV 2025arXiv:2410.06848
3
citations
#963

OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography

Li Caoshuo, Zengmao Ding, Xiaobin Hu et al.

ICCV 2025arXiv:2506.21101
3
citations
#964

NeRF Is a Valuable Assistant for 3D Gaussian Splatting

Shuangkang Fang, I-Chao Shen, Takeo Igarashi et al.

ICCV 2025arXiv:2507.23374
3
citations
#965

Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval

WonJun Moon, Cheol-Ho Cho, Woojin Jun et al.

ICCV 2025arXiv:2504.13035
3
citations
#966

VideoAds for Fast-Paced Video Understanding

Zheyuan Zhang, Wanying Dou, Linkai Peng et al.

ICCV 2025arXiv:2504.09282
3
citations
#967

SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning

Lin Zhang, Xianfang Zeng, Kangcong Li et al.

ICCV 2025arXiv:2508.06125
3
citations
#968

M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization

Ju-Hyeon Nam, Dong-Hyun Moon, Sang-Chul Lee

ICCV 2025highlightarXiv:2506.20922
3
citations
#969

DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

Maksim Siniukov, Di Chang, Minh Tran et al.

ICCV 2025arXiv:2504.04010
3
citations
#970

ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail

Chandan Yeshwanth, David Rozenberszki, Angela Dai

ICCV 2025arXiv:2503.17044
3
citations
#971

Robust and Efficient 3D Gaussian Splatting for Urban Scene Reconstruction

Zhensheng Yuan, Haozhi Huang, Zhen Xiong et al.

ICCV 2025arXiv:2507.23006
3
citations
#972

Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation

Zhaorui Tan, Xi Yang, Tan Pan et al.

ICCV 2025arXiv:2411.06106
3
citations
#973

Heavy Labels Out! Dataset Distillation with Label Space Lightening

Ruonan Yu, Songhua Liu, Zigeng Chen et al.

ICCV 2025arXiv:2408.08201
3
citations
#974

Dataset Distillation via Vision-Language Category Prototype

YAWEN ZOU, Guang Li, Duo Su et al.

ICCV 2025highlightarXiv:2506.23580
3
citations
#975

ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering

Kaisi Guan, Zhengfeng Lai, Yuchong Sun et al.

ICCV 2025arXiv:2503.16867
3
citations
#976

Hallucinatory Image Tokens: A Training-free EAZY Approach to Detecting and Mitigating Object Hallucinations in LVLMs

Liwei Che, Qingze T Liu, Jing Jia et al.

ICCV 2025arXiv:2503.07772
3
citations
#977

Diving into the Fusion of Monocular Priors for Generalized Stereo Matching

Chengtang Yao, Lidong Yu, Zhidan Liu et al.

ICCV 2025arXiv:2505.14414
3
citations
#978

Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning

Xinyao Liu, Diping Song

ICCV 2025arXiv:2507.17539
3
citations
#979

Stereo Any Video: Temporally Consistent Stereo Matching

Junpeng Jing, Weixun Luo, Ye Mao et al.

ICCV 2025highlightarXiv:2503.05549
3
citations
#980

VAGUE: Visual Contexts Clarify Ambiguous Expressions

Heejeong Nam, Jinwoo Ahn, Keummin Ka et al.

ICCV 2025arXiv:2411.14137
3
citations
#981

Taming Flow Matching with Unbalanced Optimal Transport into Fast Pansharpening

Zihan Cao, Yu Zhong, Liang-Jian Deng

ICCV 2025arXiv:2503.14975
3
citations
#982

Disentangled Clothed Avatar Generation with Layered Representation

Weitian Zhang, Yichao Yan, Sijing Wu et al.

ICCV 2025highlightarXiv:2501.04631
3
citations
#983

Extending Foundational Monocular Depth Estimators to Fisheye Cameras with Calibration Tokens

Suchisrit Gangopadhyay, Jung Hee Kim, Xien Chen et al.

ICCV 2025arXiv:2508.04928
3
citations
#984

HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars

Byungjun Kim, Shunsuke Saito, Giljoo Nam et al.

ICCV 2025arXiv:2507.19481
3
citations
#985

Hierarchical-aware Orthogonal Disentanglement Framework for Fine-grained Skeleton-based Action Recognition

Haochen Chang, Pengfei Ren, Haoyang Zhang et al.

ICCV 2025
3
citations
#986

Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection

Taehoon Kim, Jongwook Choi, Yonghyun Jeong et al.

ICCV 2025highlightarXiv:2507.02398
3
citations
#987

MOVE: Motion-Guided Few-Shot Video Object Segmentation

Kaining Ying, Hengrui Hu, Henghui Ding

ICCV 2025arXiv:2507.22061
3
citations
#988

DiffDoctor: Diagnosing Image Diffusion Models Before Treating

Yiyang Wang, Xi Chen, Xiaogang Xu et al.

ICCV 2025arXiv:2501.12382
3
citations
#989

ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models

Guoyizhe Wei, Rama Chellappa

ICCV 2025arXiv:2504.00037
3
citations
#990

Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA

Zhixuan Li, Hyunse Yoon, Sanghoon Lee et al.

ICCV 2025arXiv:2503.10225
3
citations
#991

AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving

Ruifei Zhang, Junlin Xie, Wei Zhang et al.

ICCV 2025arXiv:2511.06253
3
citations
#992

Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models

Zerui Tao, Yuhta Takida, Naoki Murata et al.

ICCV 2025arXiv:2501.08727
3
citations
#993

F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration

Lu Liu, Huiyu Duan, Qiang Hu et al.

ICCV 2025highlightarXiv:2412.13155
3
citations
#994

Joint Self-Supervised Video Alignment and Action Segmentation

Ali Shah Ali, Syed Ahmed Mahmood, Mubin Saeed et al.

ICCV 2025arXiv:2503.16832
3
citations
#995

Revisiting Pool-based Prompt Learning for Few-shot Class-incremental Learning

Yongwei Jiang, Yixiong Zou, Yuhua Li et al.

ICCV 2025arXiv:2507.09183
3
citations
#996

Generate, Refine, and Encode: Leveraging Synthesized Novel Samples for On-the-Fly Fine-Grained Category Discovery

Xiao Liu, Nan Pu, Haiyang Zheng et al.

ICCV 2025arXiv:2507.04051
3
citations
#997

EA-Vit: Efficient Adaptation for Elastic Vision Transformer

Chen Zhu, Wangbo Zhao, Huiwen Zhang et al.

ICCV 2025arXiv:2507.19360
3
citations
#998

LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion

Fangfu Liu, Hao Li, Jiawei Chi et al.

ICCV 2025arXiv:2507.02813
3
citations
#999

DAViD: Data-efficient and Accurate Vision Models from Synthetic Data

Fatemeh Saleh, Sadegh Aliakbarian, Charlie Hewitt et al.

ICCV 2025arXiv:2507.15365
3
citations
#1000

A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets

Zexi Jia, Chuanwei Huang, Yeshuang Zhu et al.

ICCV 2025arXiv:2507.04699
3
citations