Most Cited ICCV "representation spaces" Papers

2,701 papers found • Page 4 of 14

#601

O-MaMa: Learning Object Mask Matching between Egocentric and Exocentric Views

Lorenzo Mur-Labadia, Maria Santos-Villafranca, Jesus Bermudez-cameo et al.

ICCV 2025posterarXiv:2506.06026
3
citations
#602

SceneMI: Motion In-betweening for Modeling Human-Scene Interaction

Inwoo Hwang, Bing Zhou, Young Min Kim et al.

ICCV 2025highlightarXiv:2503.16289
3
citations
#603

PVChat: Personalized Video Chat with One-Shot Learning

YUFEI SHI, Weilong Yan, Gang Xu et al.

ICCV 2025posterarXiv:2503.17069
3
citations
#604

FROSS: Faster-Than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images

Hao-Yu Hou, Chun-Yi Lee, Motoharu Sonogashira et al.

ICCV 2025posterarXiv:2507.19993
3
citations
#605

Integrating Visual Interpretation and Linguistic Reasoning for Geometric Problem Solving

Zixian Guo, Ming Liu, Qilong Wang et al.

ICCV 2025poster
3
citations
#606

Joint Self-Supervised Video Alignment and Action Segmentation

Ali Shah Ali, Syed Ahmed Mahmood, Mubin Saeed et al.

ICCV 2025posterarXiv:2503.16832
3
citations
#607

Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA

Zhixuan Li, Hyunse Yoon, Sanghoon Lee et al.

ICCV 2025posterarXiv:2503.10225
3
citations
#608

RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case

Baihui Xiao, Chengjian Feng, Zhijian Huang et al.

ICCV 2025posterarXiv:2508.04642
3
citations
#609

PlugMark: A Plug-in Zero-Watermarking Framework for Diffusion Models

Pengzhen Chen, Yanwei Liu, Xiaoyan Gu et al.

ICCV 2025poster
3
citations
#610

CuMPerLay: Learning Cubical Multiparameter Persistence Vectorizations

Caner Korkmaz, Brighton Nuwagira, Baris Coskunuzer et al.

ICCV 2025posterarXiv:2510.12795
3
citations
#611

SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World

Chen Chen, Zhirui Wang, Taowei Sheng et al.

ICCV 2025posterarXiv:2503.16399
3
citations
#612

TriDi: Trilateral Diffusion of 3D Humans, Objects, and Interactions

Ilya A. Petrov, Riccardo Marin, Julian Chibane et al.

ICCV 2025posterarXiv:2412.06334
3
citations
#613

GaRe: Relightable 3D Gaussian Splatting for Outdoor Scenes from Unconstrained Photo Collections

Haiyang Bai, Jiaqi Zhu, Songru Jiang et al.

ICCV 2025posterarXiv:2507.20512
3
citations
#614

NeRF Is a Valuable Assistant for 3D Gaussian Splatting

Shuangkang Fang, I-Chao Shen, Takeo Igarashi et al.

ICCV 2025posterarXiv:2507.23374
3
citations
#615

GS-Occ3D: Scaling Vision-only Occupancy Reconstruction with Gaussian Splatting

Baijun Ye, Minghui Qin, Saining Zhang et al.

ICCV 2025posterarXiv:2507.19451
3
citations
#616

Efficient Unsupervised Shortcut Learning Detection and Mitigation in Transformers

Lukas Kuhn, sari sadiya, Jörg Schlötterer et al.

ICCV 2025posterarXiv:2501.00942
3
citations
#617

TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition

Xingsong Ye, Yongkun Du, Yunbo Tao et al.

ICCV 2025posterarXiv:2412.01137
3
citations
#618

Visual Modality Prompt for Adapting Vision-Language Object Detectors

Heitor Rapela Medeiros, Atif Belal, Srikanth Muralidharan et al.

ICCV 2025posterarXiv:2412.00622
3
citations
#619

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models

Quang-Binh Nguyen, Minh Luu, Quang Nguyen et al.

ICCV 2025posterarXiv:2507.13984
3
citations
#620

A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets

Zexi Jia, Chuanwei Huang, Yeshuang Zhu et al.

ICCV 2025posterarXiv:2507.04699
3
citations
#621

Spatial-Temporal Aware Visuomotor Diffusion Policy Learning

Zhenyang Liu, Yikai Wang, Kuanning Wang et al.

ICCV 2025posterarXiv:2507.06710
3
citations
#622

Semantic Causality-Aware Vision-Based 3D Occupancy Prediction

Dubing Chen, Huan Zheng, Yucheng Zhou et al.

ICCV 2025posterarXiv:2509.08388
3
citations
#623

Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description

Mahmoud Ahmed, Junjie Fei, Jian Ding et al.

ICCV 2025posterarXiv:2405.18937
3
citations
#624

INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling

Xin Dong, Shichao Dong, Jin Wang et al.

ICCV 2025posterarXiv:2507.05056
3
citations
#625

From Panels to Prose: Generating Literary Narratives from Comics

Ragav Sachdeva, Andrew Zisserman

ICCV 2025posterarXiv:2503.23344
3
citations
#626

Video Individual Counting for Moving Drones

Yaowu Fan, Jia Wan, Tao Han et al.

ICCV 2025highlightarXiv:2503.10701
3
citations
#627

X2-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction

Weihao Yu, Yuanhao Cai, Ruyi Zha et al.

ICCV 2025poster
3
citations
#628

4D Gaussian Splatting SLAM

Yanyan Li, Youxu Fang, Zunjie Zhu et al.

ICCV 2025posterarXiv:2503.16710
3
citations
#629

Large-scale Pre-training for Grounded Video Caption Generation

Evangelos Kazakos, Cordelia Schmid, Josef Sivic

ICCV 2025posterarXiv:2503.10781
3
citations
#630

What You Have is What You Track: Adaptive and Robust Multimodal Tracking

Yuedong Tan, Jiawei Shao, Eduard Zamfir et al.

ICCV 2025posterarXiv:2507.05899
3
citations
#631

DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

Maksim Siniukov, Di Chang, Minh Tran et al.

ICCV 2025posterarXiv:2504.04010
3
citations
#632

SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning

Lin Zhang, Xianfang Zeng, Kangcong Li et al.

ICCV 2025posterarXiv:2508.06125
3
citations
#633

End-to-End Multi-Modal Diffusion Mamba

Chunhao Lu, Qiang Lu, Meichen Dong et al.

ICCV 2025posterarXiv:2510.13253
3
citations
#634

Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow

Ruyang Liu, Shangkun Sun, Haoran Tang et al.

ICCV 2025posterarXiv:2510.05836
3
citations
#635

FREE-Merging: Fourier Transform for Efficient Model Merging

Shenghe Zheng, Hongzhi Wang

ICCV 2025posterarXiv:2411.16815
3
citations
#636

SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders

Jiahui Geng, Qing Li

ICCV 2025posterarXiv:2503.14530
3
citations
#637

Hallucinatory Image Tokens: A Training-free EAZY Approach to Detecting and Mitigating Object Hallucinations in LVLMs

Liwei Che, Qingze T Liu, Jing Jia et al.

ICCV 2025posterarXiv:2503.07772
3
citations
#638

Stable Diffusion Models are Secretly Good at Visual In-Context Learning

Trevine Oorloff, Vishwanath Sindagi, Wele Gedara Chaminda Bandara et al.

ICCV 2025posterarXiv:2508.09949
3
citations
#639

Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens

Qihang Fan, Huaibo Huang, Mingrui Chen et al.

ICCV 2025posterarXiv:2405.13337
3
citations
#640

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

Jiahui Wang, Zuyan Liu, Yongming Rao et al.

ICCV 2025posterarXiv:2506.05344
3
citations
#641

MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration

Zhehui Wu, Yong Chen, Naoto Yokoya et al.

ICCV 2025posterarXiv:2503.09131
3
citations
#642

PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination

Ming Dai, Wenxuan Cheng, Jiedong Zhuang et al.

ICCV 2025posterarXiv:2509.04833
3
citations
#643

HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly

Chang Liu, Yunfan Ye, Fan Zhang et al.

ICCV 2025posterarXiv:2507.19924
3
citations
#644

PriOr-Flow: Enhancing Primitive Panoramic Optical Flow with Orthogonal View

Longliang Liu, Miaojie Feng, Junda Cheng et al.

ICCV 2025highlightarXiv:2506.23897
3
citations
#645

Disentangled Clothed Avatar Generation with Layered Representation

Weitian Zhang, Yichao Yan, Sijing Wu et al.

ICCV 2025highlightarXiv:2501.04631
3
citations
#646

TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation

Zonglin Lyu, Chen Chen

ICCV 2025posterarXiv:2507.04984
3
citations
#647

Monocular Semantic Scene Completion via Masked Recurrent Networks

Xuzhi Wang, Xinran Wu, Song Wang et al.

ICCV 2025posterarXiv:2507.17661
3
citations
#648

Hierarchical-aware Orthogonal Disentanglement Framework for Fine-grained Skeleton-based Action Recognition

Haochen Chang, Pengfei Ren, Haoyang Zhang et al.

ICCV 2025poster
3
citations
#649

Progressive Test Time Energy Adaptation for Medical Image Segmentation

Xiaoran Zhang, Byung-Woo Hong, Hyoungseob Park et al.

ICCV 2025highlightarXiv:2503.16616
3
citations
#650

Generalizable Object Re-Identification via Visual In-Context Prompting

Zhizhong Huang, Xiaoming Liu

ICCV 2025posterarXiv:2508.21222
3
citations
#651

Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping

Jingyi Lu, Kai Han

ICCV 2025posterarXiv:2509.04582
3
citations
#652

Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation

Jungeun Kim, Hyeongwoo Jeon, Jongseong Bae et al.

ICCV 2025posterarXiv:2411.16789
3
citations
#653

ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction

Danhui Chen, Ziquan Liu, Chuxi Yang et al.

ICCV 2025posterarXiv:2507.15803
3
citations
#654

How Can Objects Help Video-Language Understanding?

Zitian Tang, Shijie Wang, Junho Cho et al.

ICCV 2025posterarXiv:2504.07454
3
citations
#655

Open-ended Hierarchical Streaming Video Understanding with Vision Language Models

Hyolim Kang, Yunsu Park, Youngbeom Yoo et al.

ICCV 2025posterarXiv:2509.12145
3
citations
#656

Underwater Visual SLAM with Depth Uncertainty and Medium Modeling

Rui Liu, Sheng Fan, Wenguan Wang et al.

ICCV 2025highlight
3
citations
#657

ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail

Chandan Yeshwanth, David Rozenberszki, Angela Dai

ICCV 2025posterarXiv:2503.17044
3
citations
#658

GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models

Jonathan Roberts, Kai Han, Samuel Albanie

ICCV 2025posterarXiv:2408.11817
3
citations
#659

Generate, Refine, and Encode: Leveraging Synthesized Novel Samples for On-the-Fly Fine-Grained Category Discovery

Xiao Liu, Nan Pu, Haiyang Zheng et al.

ICCV 2025posterarXiv:2507.04051
3
citations
#660

ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering

Kaisi Guan, Zhengfeng Lai, Yuchong Sun et al.

ICCV 2025posterarXiv:2503.16867
3
citations
#661

From Imitation to Innovation: The Emergence of AI's Unique Artistic Styles and the Challenge of Copyright Protection

Zexi Jia, Chuanwei Huang, Hongyan Fei et al.

ICCV 2025posterarXiv:2507.04769
3
citations
#662

ARGUS: Hallucination and Omission Evaluation in Video-LLMs

Ruchit Rawal, Reza Shirkavand, Heng Huang et al.

ICCV 2025posterarXiv:2506.07371
3
citations
#663

PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection

Xiao Li, Yiming Zhu, Yifan Huang et al.

ICCV 2025posterarXiv:2506.23581
3
citations
#664

Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting

Guangben Lu, Yuzhen N/A, Zhimin Sun et al.

ICCV 2025posterarXiv:2412.03812
3
citations
#665

Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation

Junyu Xie, Tengda Han, Max Bain et al.

ICCV 2025posterarXiv:2504.01020
3
citations
#666

I Am Big, You Are Little; I Am Right, You Are Wrong

David A Kelly, Akchunya Chanchal, Nathan Blake

ICCV 2025posterarXiv:2507.23509
3
citations
#667

ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers

Nicholas DiBrita, Jason Han, Tirthak Patel

ICCV 2025posterarXiv:2506.21537
3
citations
#668

Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation

Xiuyu Yang, Shuhan Tan, Philipp Kraehenbuehl

ICCV 2025posterarXiv:2506.17213
3
citations
#669

Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle

Miroslav Purkrabek, Jiri Matas

ICCV 2025posterarXiv:2412.01562
3
citations
#670

FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation

Yunpeng Bai, Qixing Huang

ICCV 2025posterarXiv:2412.00671
3
citations
#671

RapVerse: Coherent Vocals and Whole-Body Motion Generation from Text

Jiaben Chen, Xin Yan, Yihang Chen et al.

ICCV 2025posterarXiv:2405.20336
3
citations
#672

VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition Dataset

Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam et al.

ICCV 2025poster
3
citations
#673

EA-KD: Entropy-based Adaptive Knowledge Distillation

Chi-Ping Su, Ching-Hsun Tseng, Bin Pu et al.

ICCV 2025posterarXiv:2311.13621
3
citations
#674

Rethinking Layered Graphic Design Generation with a Top-Down Approach

Jingye Chen, Zhaowen Wang, Nanxuan Zhao et al.

ICCV 2025posterarXiv:2507.05601
3
citations
#675

CAP: Evaluation of Persuasive and Creative Image Generation

Aysan Aghazadeh, Adriana Kovashka

ICCV 2025posterarXiv:2412.10426
3
citations
#676

Self-supervised Learning of Hybrid Part-aware 3D Representations of 2D Gaussians and Superquadrics

Zhirui Gao, Renjiao Yi, Yuhang Huang et al.

ICCV 2025posterarXiv:2408.10789
3
citations
#677

EDiT: Efficient Diffusion Transformers with Linear Compressed Attention

Philipp Becker, Abhinav Mehrotra, Ruchika Chavhan et al.

ICCV 2025posterarXiv:2503.16726
3
citations
#678

Details Matter for Indoor Open-vocabulary 3D Instance Segmentation

Sanghun Jung, Jingjing Zheng, Ke Zhang et al.

ICCV 2025posterarXiv:2507.23134
3
citations
#679

ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering

Duong T. Tran, Trung-Kien Tran, Manfred Hauswirth et al.

ICCV 2025posterarXiv:2507.16403
3
citations
#680

VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions

Marko Mihajlovic, Siwei Zhang, Gen Li et al.

ICCV 2025highlightarXiv:2506.23236
3
citations
#681

Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations

Hai Huang, Yan Xia, Sashuai Zhou et al.

ICCV 2025posterarXiv:2507.03304
3
citations
#682

Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin

Fangyikang Wang, Hubery Yin, Lei Qian et al.

ICCV 2025posterarXiv:2505.24222
3
citations
#683

Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues

Francesco Taioli, Edoardo Zorzi, Gianni Franchi et al.

ICCV 2025posterarXiv:2412.01250
3
citations
#684

Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation

Zhenjun Yu, Wenqiang Xu, Pengfei Xie et al.

ICCV 2025posterarXiv:2411.09572
3
citations
#685

How To Make Your Cell Tracker Say "I dunno!"

Richard D Paul, Johannes Seiffarth, David Rügamer et al.

ICCV 2025poster
3
citations
#686

Dark-ISP: Enhancing RAW Image Processing for Low-Light Object Detection

Jiasheng Guo, Xin Gao, Yuxiang Yan et al.

ICCV 2025posterarXiv:2509.09183
3
citations
#687

GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes

Pradyumn Goyal, Dmitrii Petrov, Sheldon Andrews et al.

ICCV 2025posterarXiv:2504.02747
3
citations
#688

Sparse Fine-Tuning of Transformers for Generative Tasks

Wei Chen, Jingxi Yu, Zichen Miao et al.

ICCV 2025posterarXiv:2507.10855
3
citations
#689

ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation

Sherry Chen, Yi Wei, Luowei Zhou et al.

ICCV 2025posterarXiv:2507.07317
3
citations
#690

FairGen: Enhancing Fairness in Text-to-Image Diffusion Models via Self-Discovering Latent Directions

Yilei Jiang, Wei-Hong Li, Yiyuan Zhang et al.

ICCV 2025posterarXiv:2412.18810
3
citations
#691

LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs

Hanyu Zhou, Gim Hee Lee

ICCV 2025posterarXiv:2503.06934
3
citations
#692

DMesh++: An Efficient Differentiable Mesh for Complex Shapes

Sanghyun Son, Matheus Gadelha, Yang Zhou et al.

ICCV 2025posterarXiv:2412.16776
3
citations
#693

Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention

Jeonghoon Park, Juyoung Lee, Chaeyeon Chung et al.

ICCV 2025posterarXiv:2506.13298
3
citations
#694

LayerD: Decomposing Raster Graphic Designs into Layers

Tomoyuki Suzuki, Kang-Jun Liu, Naoto Inoue et al.

ICCV 2025posterarXiv:2509.25134
3
citations
#695

From Image to Video: An Empirical Study of Diffusion Representations

Pedro Vélez, Luisa Polania Cabrera, Yi Yang et al.

ICCV 2025highlightarXiv:2502.07001
3
citations
#696

Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints

Guanjie Chen, Xinyu Zhao, Yucheng Zhou et al.

ICCV 2025posterarXiv:2411.17616
3
citations
#697

TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation

Amin Karimi Monsefi, Mridul Khurana, Rajiv Ramnath et al.

ICCV 2025posterarXiv:2506.01923
3
citations
#698

SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency

Yangyang Guo, Mohan Kankanhalli

ICCV 2025posterarXiv:2411.09126
3
citations
#699

4D Visual Pre-training for Robot Learning

Chengkai Hou, Yanjie Ze, Yankai Fu et al.

ICCV 2025posterarXiv:2508.17230
3
citations
#700

On Large Multimodal Models as Open-World Image Classifiers

Alessandro Conti, Massimiliano Mancini, Enrico Fini et al.

ICCV 2025posterarXiv:2503.21851
3
citations
#701

LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling

Li Huaqiu, Yong Wang, Tongwen Huang et al.

ICCV 2025posterarXiv:2507.00790
3
citations
#702

Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning

Yafei Zhang, Lingqi Kong, Huafeng Li et al.

ICCV 2025posterarXiv:2507.12942
3
citations
#703

Task Vector Quantization for Memory-Efficient Model Merging

Youngeun Kim, Seunghwan Lee, Aecheon Jung et al.

ICCV 2025posterarXiv:2503.06921
3
citations
#704

Enhancing Image Restoration Transformer via Adaptive Translation Equivariance

JiaKui Hu, Zhengjian Yao, Lujia Jin et al.

ICCV 2025posterarXiv:2506.18520
3
citations
#705

VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting

Hao Chen, Tao Han, Song Guo et al.

ICCV 2025posterarXiv:2412.02503
3
citations
#706

Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation

Jiaer Xia, Bingkui Tong, Yuhang Zang et al.

ICCV 2025highlightarXiv:2507.02859
3
citations
#707

Breaking the Encoder Barrier for Seamless Video-Language Understanding

Handong Li, Yiyuan Zhang, Longteng Guo et al.

ICCV 2025posterarXiv:2503.18422
3
citations
#708

Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation

Congyi Fan, Jian Guan, Xuanjia Zhao et al.

ICCV 2025posterarXiv:2503.17340
3
citations
#709

Cross-Subject Mind Decoding from Inaccurate Representations

Yangyang Xu, Bangzhen Liu, Wenqi Shao et al.

ICCV 2025posterarXiv:2507.19071
3
citations
#710

Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting

Jiaxin Huang, Sheng Miao, Bangbang Yang et al.

ICCV 2025posterarXiv:2504.11092
3
citations
#711

Grouped Speculative Decoding for Autoregressive Image Generation

Junhyuk So, Juncheol Shin, Hyunho Kook et al.

ICCV 2025posterarXiv:2508.07747
3
citations
#712

SHeaP: Self-supervised Head Geometry Predictor Learned via 2D Gaussians

Liam Schoneveld, Zhe Chen, Davide Davoli et al.

ICCV 2025posterarXiv:2504.12292
3
citations
#713

TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis

Tri Ton, Ji Woo Hong, Chang Yoo

ICCV 2025posterarXiv:2504.05684
3
citations
#714

Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator

Ronglai Zuo, Rolandos Alexandros Potamias, Evangelos Ververas et al.

ICCV 2025posterarXiv:2411.17799
3
citations
#715

3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection

Yung-Hsu Yang, Luigi Piccinelli, Mattia Segu et al.

ICCV 2025posterarXiv:2507.23567
3
citations
#716

Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation

Qi Guo, Zhen Tian, Minghao Yao et al.

ICCV 2025posterarXiv:2410.06848
3
citations
#717

AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion

Yangyi Huang, Ye Yuan, Xueting Li et al.

ICCV 2025posterarXiv:2505.24877
3
citations
#718

Learning Streaming Video Representation via Multitask Training

Yibin Yan, Jilan Xu, Shangzhe Di et al.

ICCV 2025posterarXiv:2504.20041
3
citations
#719

BATCLIP: Bimodal Online Test-Time Adaptation for CLIP

Sarthak Kumar Maharana, Baoming Zhang, Leonid Karlinsky et al.

ICCV 2025posterarXiv:2412.02837
3
citations
#720

Boosting Adversarial Transferability via Residual Perturbation Attack

Jinjia Peng, Zeze Tao, Huibing Wang et al.

ICCV 2025posterarXiv:2508.05689
3
citations
#721

StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion

Ziyu Guo, Young-Yoon Lee, Joseph Liu et al.

ICCV 2025posterarXiv:2503.21775
3
citations
#722

MVGBench: a Comprehensive Benchmark for Multi-view Generation Models

Xianghui Xie, Jan Lenssen, Gerard Pons-Moll

ICCV 2025poster
3
citations
#723

Colors See Colors Ignore: Clothes Changing ReID with Color Disentanglement

Priyank Pathak, Yogesh Rawat

ICCV 2025posterarXiv:2507.07230
3
citations
#724

GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation

Wentao Hu, Shunkai Li, Ziqiao Peng et al.

ICCV 2025highlightarXiv:2506.21513
3
citations
#725

You Think, You ACT: The New Task of Arbitrary Text to Motion Generation

Runqi Wang, Caoyuan Ma, Guopeng Li et al.

ICCV 2025posterarXiv:2404.14745
3
citations
#726

GenM3: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation

Junyu Shi, Lijiang LIU, Yong Sun et al.

ICCV 2025poster
3
citations
#727

TokensGen: Harnessing Condensed Tokens for Long Video Generation

Wenqi Ouyang, Zeqi Xiao, Danni Yang et al.

ICCV 2025posterarXiv:2507.15728
3
citations
#728

Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding

Huy Ta, Duy Anh Huynh, Yutong Xie et al.

ICCV 2025highlightarXiv:2505.15123
2
citations
#729

RePoseD: Efficient Relative Pose Estimation With Known Depth Information

Yaqing Ding, Viktor Kocur, VACLAV VAVRA et al.

ICCV 2025posterarXiv:2501.07742
2
citations
#730

LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders

Ilan Naiman, Emanuel Baruch Baruch, Oron Anschel et al.

ICCV 2025posterarXiv:2504.03501
2
citations
#731

Adaptive Articulated Object Manipulation On The Fly with Foundation Model Reasoning and Part Grounding

Xiaojie Zhang, Yuanfei Wang, Ruihai Wu et al.

ICCV 2025posterarXiv:2507.18276
2
citations
#732

MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers

Yang Tian, Zheng Lu, Mingqi Gao et al.

ICCV 2025posterarXiv:2503.16856
2
citations
#733

CLIP-Adapted Region-to-Text Learning for Generative Open-Vocabulary Semantic Segmentation

Jiannan Ge, Lingxi Xie, Hongtao Xie et al.

ICCV 2025poster
2
citations
#734

Hybrid-grained Feature Aggregation with Coare-to-fine Language Guidance for Self-supervised Monocular Depth Estimation

Wenyao Zhang, Hongsi Liu, Bohan Li et al.

ICCV 2025poster
2
citations
#735

Understanding Co-speech Gestures in-the-wild

Sindhu Hegde, K R Prajwal, Taein Kwon et al.

ICCV 2025posterarXiv:2503.22668
2
citations
#736

Differentiable Room Acoustic Rendering with Multi-View Vision Priors

Derong Jin, Ruohan Gao

ICCV 2025posterarXiv:2504.21847
2
citations
#737

HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction for Large-Scale Aerial Scenes

Mai Su, Zhongtao Wang, Huishan Au et al.

ICCV 2025posterarXiv:2504.16606
2
citations
#738

MR-FIQA: Face Image Quality Assessment with Multi-Reference Representations from Synthetic Data Generation

Fu-Zhao Ou, Chongyi Li, Shiqi Wang et al.

ICCV 2025poster
2
citations
#739

Gait-X: Exploring X modality for Generalized Gait Recognition

Zengbin Wang, Saihui Hou, Junjie Li et al.

ICCV 2025poster
2
citations
#740

Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views

Xiangdong Zhang, Shaofeng Zhang, Junchi Yan

ICCV 2025posterarXiv:2509.01250
2
citations
#741

AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model

Wenlun Zhang, Yunshan Zhong, Shimpei Ando et al.

ICCV 2025posterarXiv:2503.03088
2
citations
#742

LINR-PCGC: Lossless Implicit Neural Representations for Point Cloud Geometry Compression

Wenjie Huang, Qi Yang, Shuting Xia et al.

ICCV 2025posterarXiv:2507.15686
2
citations
#743

Improving SAM for Camouflaged Object Detection via Dual Stream Adapters

Jiaming Liu, Linghe Kong, Guihai Chen

ICCV 2025posterarXiv:2503.06042
2
citations
#744

Demeter: A Parametric Model of Crop Plant Morphology from the Real World

Tianhang Cheng, Albert Zhai, Evan Chen et al.

ICCV 2025posterarXiv:2510.16377
2
citations
#745

Robust 3D Object Detection using Probabilistic Point Clouds from Single-Photon LiDARs

Bhavya Goyal, Felipe Gutierrez-Barragan, Wei Lin et al.

ICCV 2025posterarXiv:2508.00169
2
citations
#746

CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling

Trong-Thang Pham, AKASH AWASTHI, Saba Khan et al.

ICCV 2025highlightarXiv:2507.12591
2
citations
#747

ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment

Chong Xia, Shengjun Zhang, Fangfu Liu et al.

ICCV 2025posterarXiv:2507.19058
2
citations
#748

SAS: Segment Any 3D Scene with Integrated 2D Priors

Zhuoyuan Li, Jiahao Lu, Jiacheng Deng et al.

ICCV 2025posterarXiv:2503.08512
2
citations
#749

From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning

Yuhui Zeng, Haoxiang Wu, Wenjie Nie et al.

ICCV 2025posterarXiv:2502.05843
2
citations
#750

EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba

Quang Nguyen, Nhat Le, Baoru Huang et al.

ICCV 2025posterarXiv:2508.10522
2
citations
#751

Generative Active Learning for Long-tail Trajectory Prediction via Controllable Diffusion Model

Daehee Park, Monu Surana, Pranav Desai et al.

ICCV 2025posterarXiv:2507.22615
2
citations
#752

DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image

Jijun Xiang, Xuan Zhu, Xianqi Wang et al.

ICCV 2025posterarXiv:2504.01596
2
citations
#753

ETA: Energy-based Test-time Adaptation for Depth Completion

Younjoon Chung, Hyoungseob Park, Patrick Rim et al.

ICCV 2025posterarXiv:2508.05989
2
citations
#754

Preacher: Paper-to-Video Agentic System

Jingwei Liu, Ling Yang, Hao Luo et al.

ICCV 2025posterarXiv:2508.09632
2
citations
#755

Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models

Young Kyun Jang, Ser-Nam Lim

ICCV 2025posterarXiv:2405.14715
2
citations
#756

MOSAIC: Generating Consistent, Privacy-Preserving Scenes from Multiple Depth Views in Multi-Room Environments

Zhixuan Liu, Haokun Zhu, Rui Chen et al.

ICCV 2025posterarXiv:2503.13816
2
citations
#757

Improving Rectified Flow with Boundary Conditions

Xixi Hu, Runlong Liao, Bo Liu et al.

ICCV 2025posterarXiv:2506.15864
2
citations
#758

Leveraging Local Patch Alignment to Seam-cutting for Large Parallax Image Stitching

Tianli Liao, Chenyang Zhao, Lei Li et al.

ICCV 2025posterarXiv:2311.18564
2
citations
#759

MGSR: 2D/3D Mutual-boosted Gaussian Splatting for High-fidelity Surface Reconstruction under Various Light Conditions

Qingyuan Zhou, Yuehu Gong, Weidong Yang et al.

ICCV 2025posterarXiv:2503.05182
2
citations
#760

Stable-Sim2Real: Exploring Simulation of Real-Captured 3D Data with Two-Stage Depth Diffusion

Mutian Xu, Chongjie Ye, Haolin Liu et al.

ICCV 2025highlightarXiv:2507.23483
2
citations
#761

Towards Open-World Generation of Stereo Images and Unsupervised Matching

Feng Qiao, Zhexiao Xiong, Eric Xing et al.

ICCV 2025posterarXiv:2503.12720
2
citations
#762

SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications

Yana Hasson, Pauline Luc, Liliane Momeni et al.

ICCV 2025posterarXiv:2507.03578
2
citations
#763

CoMoGaussian: Continuous Motion-Aware Gaussian Splatting from Motion-Blurred Images

Jungho Lee, DongHyeong Kim, Dogyoon Lee et al.

ICCV 2025posterarXiv:2503.05332
2
citations
#764

Acknowledging Focus Ambiguity in Visual Questions

Chongyan Chen, Yu-Yun Tseng, Zhuoheng Li et al.

ICCV 2025posterarXiv:2501.02201
2
citations
#765

Degradation-Modeled Multipath Diffusion for Tunable Metalens Photography

Jianing Zhang, Jiayi Zhu, Feiyu Ji et al.

ICCV 2025highlightarXiv:2506.22753
2
citations
#766

Noise2Score3D: Tweedie's Approach for Unsupervised Point Cloud Denoising

Xiangbin Wei, Yuanfeng Wang, Ao XU et al.

ICCV 2025posterarXiv:2503.09283
2
citations
#767

SKALD: Learning-Based Shot Assembly for Coherent Multi-Shot Video Creation

Chen Yi Lu, Mehrab Tanjim, Ishita Dasgupta et al.

ICCV 2025posterarXiv:2503.08010
2
citations
#768

EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow

Yixiang Chen, Peiyan Li, Yan Huang et al.

ICCV 2025posterarXiv:2507.06224
2
citations
#769

Self-Reinforcing Prototype Evolution with Dual-Knowledge Cooperation for Semi-Supervised Lifelong Person Re-Identification

Kunlun Xu, Fan Zhuo, Jiangmeng Li et al.

ICCV 2025posterarXiv:2507.01884
2
citations
#770

What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

Xavier Thomas, Deepti Ghadiyaram

ICCV 2025posterarXiv:2503.06698
2
citations
#771

SketchSplat: 3D Edge Reconstruction via Differentiable Multi-view Sketch Splatting

Haiyang Ying, Matthias Zwicker

ICCV 2025posterarXiv:2503.14786
2
citations
#772

DAViD: Data-efficient and Accurate Vision Models from Synthetic Data

Fatemeh Saleh, Sadegh Aliakbarian, Charlie Hewitt et al.

ICCV 2025posterarXiv:2507.15365
2
citations
#773

Global Regulation and Excitation via Attention Tuning for Stereo Matching

Jiahao LI, Xinhong Chen, Zhengmin JIANG et al.

ICCV 2025posterarXiv:2509.15891
2
citations
#774

GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion

Li-Heng Chen, Zi-Xin Zou, Chang Liu et al.

ICCV 2025posterarXiv:2503.22349
2
citations
#775

Synchronization of Multiple Videos

Avihai Naaman, Ron Shapira Weber, Oren Freifeld

ICCV 2025posterarXiv:2510.14051
2
citations
#776

General Compression Framework for Efficient Transformer Object Tracking

Lingyi Hong, Jinglun Li, Xinyu Zhou et al.

ICCV 2025posterarXiv:2409.17564
2
citations
#777

ODP-Bench: Benchmarking Out-of-Distribution Performance Prediction

Han Yu, Kehan Li, Dongbai Li et al.

ICCV 2025posterarXiv:2510.27263
2
citations
#778

Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion

Enyu Liu, En Yu, Sijia Chen et al.

ICCV 2025posterarXiv:2507.08555
2
citations
#779

AnyPortal: Zero-Shot Consistent Video Background Replacement

Wenshuo Gao, Xicheng Lan, Shuai Yang

ICCV 2025posterarXiv:2509.07472
2
citations
#780

What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning

Chi-Hsi Kung, Frangil Ramirez, Juhyung Ha et al.

ICCV 2025posterarXiv:2503.21055
2
citations
#781

Contact-Aware Amodal Completion for Human-Object Interaction via Multi-Regional Inpainting

Seunggeun Chi, Pin-Hao Huang, Enna Sachdeva et al.

ICCV 2025highlightarXiv:2508.00427
2
citations
#782

LookOut: Real-World Humanoid Egocentric Navigation

Boxiao Pan, Adam Harley, Francis Engelmann et al.

ICCV 2025posterarXiv:2508.14466
2
citations
#783

ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition

Sanjoy Kundu, Shanmukha Vellamcheti, Sathyanarayanan Aakur

ICCV 2025posterarXiv:2504.03948
2
citations
#784

Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads

Yingjie Zhou, Jiezhang Cao, Zicheng Zhang et al.

ICCV 2025posterarXiv:2507.23343
2
citations
#785

IDF: Iterative Dynamic Filtering Networks for Generalizable Image Denoising

Dongjin Kim, Jaekyun Ko, Muhammad Kashif Ali et al.

ICCV 2025posterarXiv:2508.19649
2
citations
#786

Identity Preserving 3D Head Stylization with Multiview Score Distillation

Bahri Batuhan Bilecen, Ahmet Berke Gokmen, Furkan Güzelant et al.

ICCV 2025posterarXiv:2411.13536
2
citations
#787

Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation

Yong Liu, Song-Li Wu, Sule Bai et al.

ICCV 2025posterarXiv:2506.16058
2
citations
#788

TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models

Ziyang Luo, Nian Liu, Xuguang Yang et al.

ICCV 2025posterarXiv:2506.11436
2
citations
#789

Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation

Shaowei Liu, chuan guo, Bing Zhou et al.

ICCV 2025posterarXiv:2510.14976
2
citations
#790

GLEAM: Enhanced Transferable Adversarial Attacks for Vision-Language Pre-training Models via Global-Local Transformations

Yunqi Liu, Xiaohui Cui, Ouyang Xue

ICCV 2025poster
2
citations
#791

SMGDiff: Soccer Motion Generation using Diffusion Probabilistic Models

Hongdi Yang, Chengyang Li, Zhenxuan Wu et al.

ICCV 2025posterarXiv:2411.16216
2
citations
#792

Object-level Correlation for Few-Shot Segmentation

chunlin wen, Yu Zhang, Jie Fan et al.

ICCV 2025posterarXiv:2509.07917
2
citations
#793

FG-OrIU: Towards Better Forgetting via Feature-Gradient Orthogonality for Incremental Unlearning

qian feng, Jiahang Tu, Mintong Kang et al.

ICCV 2025posterarXiv:2601.13578
2
citations
#794

Generate, Transduct, Adapt: Iterative Transduction with VLMs

Oindrila Saha, Logan Lawrence, Grant Horn et al.

ICCV 2025posterarXiv:2501.06031
2
citations
#795

Generative Adversarial Diffusion

U-Chae Jun, Jaeeun Ko, Jiwoo Kang

ICCV 2025poster
2
citations
#796

Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization

Kangle Deng, Hsueh-Ti Derek Liu, Yiheng Zhu et al.

ICCV 2025posterarXiv:2504.02817
2
citations
#797

EvolvingGrasp: Evolutionary Grasp Generation via Efficient Preference Alignment

Yufei Zhu, Yiming Zhong, Zemin Yang et al.

ICCV 2025posterarXiv:2503.14329
2
citations
#798

FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image

Fei Yin, Mallikarjun Reddy, Chun-Han Yao et al.

ICCV 2025posterarXiv:2504.15179
2
citations
#799

Sequential keypoint density estimator: an overlooked baseline of skeleton-based video anomaly detection

Anja Delić, Matej Grcic, Siniša Šegvić

ICCV 2025highlightarXiv:2506.18368
2
citations
#800

PseudoMapTrainer: Learning Online Mapping without HD Maps

Christian Löwens, Thorben Funke, Jingchao Xie et al.

ICCV 2025posterarXiv:2508.18788
2
citations