Most Cited ICCV "discrete motion token map" Papers

2,701 papers found • Page 4 of 14

Filters:Most Cited ICCV discrete motion token map Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#601

O-MaMa: Learning Object Mask Matching between Egocentric and Exocentric Views

Lorenzo Mur-Labadia, Maria Santos-Villafranca, Jesus Bermudez-cameo et al.

ICCV 2025posterarXiv:2506.06026

citations

#602

SceneMI: Motion In-betweening for Modeling Human-Scene Interaction

Inwoo Hwang, Bing Zhou, Young Min Kim et al.

ICCV 2025highlightarXiv:2503.16289

citations

#603

PVChat: Personalized Video Chat with One-Shot Learning

YUFEI SHI, Weilong Yan, Gang Xu et al.

ICCV 2025posterarXiv:2503.17069

citations

#604

FROSS: Faster-Than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images

Hao-Yu Hou, Chun-Yi Lee, Motoharu Sonogashira et al.

ICCV 2025posterarXiv:2507.19993

citations

#605

Integrating Visual Interpretation and Linguistic Reasoning for Geometric Problem Solving

Zixian Guo, Ming Liu, Qilong Wang et al.

ICCV 2025poster

citations

#606

Joint Self-Supervised Video Alignment and Action Segmentation

Ali Shah Ali, Syed Ahmed Mahmood, Mubin Saeed et al.

ICCV 2025posterarXiv:2503.16832

citations

#607

Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA

Zhixuan Li, Hyunse Yoon, Sanghoon Lee et al.

ICCV 2025posterarXiv:2503.10225

citations

#608

RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case

Baihui Xiao, Chengjian Feng, Zhijian Huang et al.

ICCV 2025posterarXiv:2508.04642

citations

#609

PlugMark: A Plug-in Zero-Watermarking Framework for Diffusion Models

Pengzhen Chen, Yanwei Liu, Xiaoyan Gu et al.

ICCV 2025poster

citations

#610

CuMPerLay: Learning Cubical Multiparameter Persistence Vectorizations

Caner Korkmaz, Brighton Nuwagira, Baris Coskunuzer et al.

ICCV 2025posterarXiv:2510.12795

citations

#611

SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World

Chen Chen, Zhirui Wang, Taowei Sheng et al.

ICCV 2025posterarXiv:2503.16399

citations

#612

TriDi: Trilateral Diffusion of 3D Humans, Objects, and Interactions

Ilya A. Petrov, Riccardo Marin, Julian Chibane et al.

ICCV 2025posterarXiv:2412.06334

citations

#613

GaRe: Relightable 3D Gaussian Splatting for Outdoor Scenes from Unconstrained Photo Collections

Haiyang Bai, Jiaqi Zhu, Songru Jiang et al.

ICCV 2025posterarXiv:2507.20512

citations

#614

NeRF Is a Valuable Assistant for 3D Gaussian Splatting

Shuangkang Fang, I-Chao Shen, Takeo Igarashi et al.

ICCV 2025posterarXiv:2507.23374

citations

#615

GS-Occ3D: Scaling Vision-only Occupancy Reconstruction with Gaussian Splatting

Baijun Ye, Minghui Qin, Saining Zhang et al.

ICCV 2025posterarXiv:2507.19451

citations

#616

Efficient Unsupervised Shortcut Learning Detection and Mitigation in Transformers

Lukas Kuhn, sari sadiya, Jörg Schlötterer et al.

ICCV 2025posterarXiv:2501.00942

citations

#617

TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition

Xingsong Ye, Yongkun Du, Yunbo Tao et al.

ICCV 2025posterarXiv:2412.01137

citations

#618

Visual Modality Prompt for Adapting Vision-Language Object Detectors

Heitor Rapela Medeiros, Atif Belal, Srikanth Muralidharan et al.

ICCV 2025posterarXiv:2412.00622

citations

#619

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models

Quang-Binh Nguyen, Minh Luu, Quang Nguyen et al.

ICCV 2025posterarXiv:2507.13984

citations

#620

A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets

Zexi Jia, Chuanwei Huang, Yeshuang Zhu et al.

ICCV 2025posterarXiv:2507.04699

citations

#621

Spatial-Temporal Aware Visuomotor Diffusion Policy Learning

Zhenyang Liu, Yikai Wang, Kuanning Wang et al.

ICCV 2025posterarXiv:2507.06710

citations

#622

Semantic Causality-Aware Vision-Based 3D Occupancy Prediction

Dubing Chen, Huan Zheng, Yucheng Zhou et al.

ICCV 2025posterarXiv:2509.08388

citations

#623

Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description

Mahmoud Ahmed, Junjie Fei, Jian Ding et al.

ICCV 2025posterarXiv:2405.18937

citations

#624

INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling

Xin Dong, Shichao Dong, Jin Wang et al.

ICCV 2025posterarXiv:2507.05056

citations

#625

From Panels to Prose: Generating Literary Narratives from Comics

Ragav Sachdeva, Andrew Zisserman

ICCV 2025posterarXiv:2503.23344

citations

#626

Video Individual Counting for Moving Drones

Yaowu Fan, Jia Wan, Tao Han et al.

ICCV 2025highlightarXiv:2503.10701

citations

#627

X2-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction

Weihao Yu, Yuanhao Cai, Ruyi Zha et al.

ICCV 2025poster

citations

#628

4D Gaussian Splatting SLAM

Yanyan Li, Youxu Fang, Zunjie Zhu et al.

ICCV 2025posterarXiv:2503.16710

citations

#629

Large-scale Pre-training for Grounded Video Caption Generation

Evangelos Kazakos, Cordelia Schmid, Josef Sivic

ICCV 2025posterarXiv:2503.10781

citations

#630

What You Have is What You Track: Adaptive and Robust Multimodal Tracking

Yuedong Tan, Jiawei Shao, Eduard Zamfir et al.

ICCV 2025posterarXiv:2507.05899

citations

#631

DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

Maksim Siniukov, Di Chang, Minh Tran et al.

ICCV 2025posterarXiv:2504.04010

citations

#632

SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning

Lin Zhang, Xianfang Zeng, Kangcong Li et al.

ICCV 2025posterarXiv:2508.06125

citations

#633

End-to-End Multi-Modal Diffusion Mamba

Chunhao Lu, Qiang Lu, Meichen Dong et al.

ICCV 2025posterarXiv:2510.13253

citations

#634

Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow

Ruyang Liu, Shangkun Sun, Haoran Tang et al.

ICCV 2025posterarXiv:2510.05836

citations

#635

FREE-Merging: Fourier Transform for Efficient Model Merging

Shenghe Zheng, Hongzhi Wang

ICCV 2025posterarXiv:2411.16815

citations

#636

SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders

Jiahui Geng, Qing Li

ICCV 2025posterarXiv:2503.14530

citations

#637

Hallucinatory Image Tokens: A Training-free EAZY Approach to Detecting and Mitigating Object Hallucinations in LVLMs

Liwei Che, Qingze T Liu, Jing Jia et al.

ICCV 2025posterarXiv:2503.07772

citations

#638

Stable Diffusion Models are Secretly Good at Visual In-Context Learning

Trevine Oorloff, Vishwanath Sindagi, Wele Gedara Chaminda Bandara et al.

ICCV 2025posterarXiv:2508.09949

citations

#639

Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens

Qihang Fan, Huaibo Huang, Mingrui Chen et al.

ICCV 2025posterarXiv:2405.13337

citations

#640

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

Jiahui Wang, Zuyan Liu, Yongming Rao et al.

ICCV 2025posterarXiv:2506.05344

citations

#641

MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration

Zhehui Wu, Yong Chen, Naoto Yokoya et al.

ICCV 2025posterarXiv:2503.09131

citations

#642

PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination

Ming Dai, Wenxuan Cheng, Jiedong Zhuang et al.

ICCV 2025posterarXiv:2509.04833

citations

#643

HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly

Chang Liu, Yunfan Ye, Fan Zhang et al.

ICCV 2025posterarXiv:2507.19924

citations

#644

PriOr-Flow: Enhancing Primitive Panoramic Optical Flow with Orthogonal View

Longliang Liu, Miaojie Feng, Junda Cheng et al.

ICCV 2025highlightarXiv:2506.23897

citations

#645

Disentangled Clothed Avatar Generation with Layered Representation

Weitian Zhang, Yichao Yan, Sijing Wu et al.

ICCV 2025highlightarXiv:2501.04631

citations

#646

TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation

Zonglin Lyu, Chen Chen

ICCV 2025posterarXiv:2507.04984

citations

#647

Monocular Semantic Scene Completion via Masked Recurrent Networks

Xuzhi Wang, Xinran Wu, Song Wang et al.

ICCV 2025posterarXiv:2507.17661

citations

#648

Hierarchical-aware Orthogonal Disentanglement Framework for Fine-grained Skeleton-based Action Recognition

Haochen Chang, Pengfei Ren, Haoyang Zhang et al.

ICCV 2025poster

citations

#649

Progressive Test Time Energy Adaptation for Medical Image Segmentation

Xiaoran Zhang, Byung-Woo Hong, Hyoungseob Park et al.

ICCV 2025highlightarXiv:2503.16616

citations

#650

Generalizable Object Re-Identification via Visual In-Context Prompting

Zhizhong Huang, Xiaoming Liu

ICCV 2025posterarXiv:2508.21222

citations

#651

Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping

Jingyi Lu, Kai Han

ICCV 2025posterarXiv:2509.04582

citations

#652

Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation

Jungeun Kim, Hyeongwoo Jeon, Jongseong Bae et al.

ICCV 2025posterarXiv:2411.16789

citations

#653

ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction

Danhui Chen, Ziquan Liu, Chuxi Yang et al.

ICCV 2025posterarXiv:2507.15803

citations

#654

How Can Objects Help Video-Language Understanding?

Zitian Tang, Shijie Wang, Junho Cho et al.

ICCV 2025posterarXiv:2504.07454

citations

#655

Open-ended Hierarchical Streaming Video Understanding with Vision Language Models

Hyolim Kang, Yunsu Park, Youngbeom Yoo et al.

ICCV 2025posterarXiv:2509.12145

citations

#656

Underwater Visual SLAM with Depth Uncertainty and Medium Modeling

Rui Liu, Sheng Fan, Wenguan Wang et al.

ICCV 2025highlight

citations

#657

ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail

Chandan Yeshwanth, David Rozenberszki, Angela Dai

ICCV 2025posterarXiv:2503.17044

citations

#658

GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models

Jonathan Roberts, Kai Han, Samuel Albanie

ICCV 2025posterarXiv:2408.11817

citations

#659

Generate, Refine, and Encode: Leveraging Synthesized Novel Samples for On-the-Fly Fine-Grained Category Discovery

Xiao Liu, Nan Pu, Haiyang Zheng et al.

ICCV 2025posterarXiv:2507.04051

citations

#660

ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering

Kaisi Guan, Zhengfeng Lai, Yuchong Sun et al.

ICCV 2025posterarXiv:2503.16867

citations

#661

From Imitation to Innovation: The Emergence of AI's Unique Artistic Styles and the Challenge of Copyright Protection

Zexi Jia, Chuanwei Huang, Hongyan Fei et al.

ICCV 2025posterarXiv:2507.04769

citations

#662

ARGUS: Hallucination and Omission Evaluation in Video-LLMs

Ruchit Rawal, Reza Shirkavand, Heng Huang et al.

ICCV 2025posterarXiv:2506.07371

citations

#663

PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection

Xiao Li, Yiming Zhu, Yifan Huang et al.

ICCV 2025posterarXiv:2506.23581

citations

#664

Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting

Guangben Lu, Yuzhen N/A, Zhimin Sun et al.

ICCV 2025posterarXiv:2412.03812

citations

#665

Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation

Junyu Xie, Tengda Han, Max Bain et al.

ICCV 2025posterarXiv:2504.01020

citations

#666

I Am Big, You Are Little; I Am Right, You Are Wrong

David A Kelly, Akchunya Chanchal, Nathan Blake

ICCV 2025posterarXiv:2507.23509

citations

#667

ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers

Nicholas DiBrita, Jason Han, Tirthak Patel

ICCV 2025posterarXiv:2506.21537

citations

#668

Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation

Xiuyu Yang, Shuhan Tan, Philipp Kraehenbuehl

ICCV 2025posterarXiv:2506.17213

citations

#669

Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle

Miroslav Purkrabek, Jiri Matas

ICCV 2025posterarXiv:2412.01562

citations

#670

FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation

Yunpeng Bai, Qixing Huang

ICCV 2025posterarXiv:2412.00671

citations

#671

RapVerse: Coherent Vocals and Whole-Body Motion Generation from Text

Jiaben Chen, Xin Yan, Yihang Chen et al.

ICCV 2025posterarXiv:2405.20336

citations

#672

VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition Dataset

Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam et al.

ICCV 2025poster

citations

#673

EA-KD: Entropy-based Adaptive Knowledge Distillation

Chi-Ping Su, Ching-Hsun Tseng, Bin Pu et al.

ICCV 2025posterarXiv:2311.13621

citations

#674

Rethinking Layered Graphic Design Generation with a Top-Down Approach

Jingye Chen, Zhaowen Wang, Nanxuan Zhao et al.

ICCV 2025posterarXiv:2507.05601

citations

#675

CAP: Evaluation of Persuasive and Creative Image Generation

Aysan Aghazadeh, Adriana Kovashka

ICCV 2025posterarXiv:2412.10426

citations

#676

Self-supervised Learning of Hybrid Part-aware 3D Representations of 2D Gaussians and Superquadrics

Zhirui Gao, Renjiao Yi, Yuhang Huang et al.

ICCV 2025posterarXiv:2408.10789

citations

#677

EDiT: Efficient Diffusion Transformers with Linear Compressed Attention

Philipp Becker, Abhinav Mehrotra, Ruchika Chavhan et al.

ICCV 2025posterarXiv:2503.16726

citations

#678

Details Matter for Indoor Open-vocabulary 3D Instance Segmentation

Sanghun Jung, Jingjing Zheng, Ke Zhang et al.

ICCV 2025posterarXiv:2507.23134

citations

#679

ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering

Duong T. Tran, Trung-Kien Tran, Manfred Hauswirth et al.

ICCV 2025posterarXiv:2507.16403

citations

#680

VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions

Marko Mihajlovic, Siwei Zhang, Gen Li et al.

ICCV 2025highlightarXiv:2506.23236

citations

#681

Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations

Hai Huang, Yan Xia, Sashuai Zhou et al.

ICCV 2025posterarXiv:2507.03304

citations

#682

Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin

Fangyikang Wang, Hubery Yin, Lei Qian et al.

ICCV 2025posterarXiv:2505.24222

citations

#683

Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues

Francesco Taioli, Edoardo Zorzi, Gianni Franchi et al.

ICCV 2025posterarXiv:2412.01250

citations

#684

Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation

Zhenjun Yu, Wenqiang Xu, Pengfei Xie et al.

ICCV 2025posterarXiv:2411.09572

citations

#685

How To Make Your Cell Tracker Say "I dunno!"

Richard D Paul, Johannes Seiffarth, David Rügamer et al.

ICCV 2025poster

citations

#686

Dark-ISP: Enhancing RAW Image Processing for Low-Light Object Detection

Jiasheng Guo, Xin Gao, Yuxiang Yan et al.

ICCV 2025posterarXiv:2509.09183

citations

#687

GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes

Pradyumn Goyal, Dmitrii Petrov, Sheldon Andrews et al.

ICCV 2025posterarXiv:2504.02747

citations

#688

Sparse Fine-Tuning of Transformers for Generative Tasks

Wei Chen, Jingxi Yu, Zichen Miao et al.

ICCV 2025posterarXiv:2507.10855

citations

#689

ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation

Sherry Chen, Yi Wei, Luowei Zhou et al.

ICCV 2025posterarXiv:2507.07317

citations

#690

FairGen: Enhancing Fairness in Text-to-Image Diffusion Models via Self-Discovering Latent Directions

Yilei Jiang, Wei-Hong Li, Yiyuan Zhang et al.

ICCV 2025posterarXiv:2412.18810

citations

#691

LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs

Hanyu Zhou, Gim Hee Lee

ICCV 2025posterarXiv:2503.06934

citations

#692

DMesh++: An Efficient Differentiable Mesh for Complex Shapes

Sanghyun Son, Matheus Gadelha, Yang Zhou et al.

ICCV 2025posterarXiv:2412.16776

citations

#693

Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention

Jeonghoon Park, Juyoung Lee, Chaeyeon Chung et al.

ICCV 2025posterarXiv:2506.13298

citations

#694

LayerD: Decomposing Raster Graphic Designs into Layers

Tomoyuki Suzuki, Kang-Jun Liu, Naoto Inoue et al.

ICCV 2025posterarXiv:2509.25134

citations

#695

From Image to Video: An Empirical Study of Diffusion Representations

Pedro Vélez, Luisa Polania Cabrera, Yi Yang et al.

ICCV 2025highlightarXiv:2502.07001

citations

#696

Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints

Guanjie Chen, Xinyu Zhao, Yucheng Zhou et al.

ICCV 2025posterarXiv:2411.17616

citations

#697

TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation

Amin Karimi Monsefi, Mridul Khurana, Rajiv Ramnath et al.

ICCV 2025posterarXiv:2506.01923

citations

#698

SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency

Yangyang Guo, Mohan Kankanhalli

ICCV 2025posterarXiv:2411.09126

citations

#699

4D Visual Pre-training for Robot Learning

Chengkai Hou, Yanjie Ze, Yankai Fu et al.

ICCV 2025posterarXiv:2508.17230

citations

#700

On Large Multimodal Models as Open-World Image Classifiers

Alessandro Conti, Massimiliano Mancini, Enrico Fini et al.

ICCV 2025posterarXiv:2503.21851

citations

#701

LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling

Li Huaqiu, Yong Wang, Tongwen Huang et al.

ICCV 2025posterarXiv:2507.00790

citations

#702

Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning

Yafei Zhang, Lingqi Kong, Huafeng Li et al.

ICCV 2025posterarXiv:2507.12942

citations

#703

Task Vector Quantization for Memory-Efficient Model Merging

Youngeun Kim, Seunghwan Lee, Aecheon Jung et al.

ICCV 2025posterarXiv:2503.06921

citations

#704

Enhancing Image Restoration Transformer via Adaptive Translation Equivariance

JiaKui Hu, Zhengjian Yao, Lujia Jin et al.

ICCV 2025posterarXiv:2506.18520

citations

#705

VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting

Hao Chen, Tao Han, Song Guo et al.

ICCV 2025posterarXiv:2412.02503

citations

#706

Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation

Jiaer Xia, Bingkui Tong, Yuhang Zang et al.

ICCV 2025highlightarXiv:2507.02859

citations

#707

Breaking the Encoder Barrier for Seamless Video-Language Understanding

Handong Li, Yiyuan Zhang, Longteng Guo et al.

ICCV 2025posterarXiv:2503.18422

citations

#708

Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation

Congyi Fan, Jian Guan, Xuanjia Zhao et al.

ICCV 2025posterarXiv:2503.17340

citations

#709

Cross-Subject Mind Decoding from Inaccurate Representations

Yangyang Xu, Bangzhen Liu, Wenqi Shao et al.

ICCV 2025posterarXiv:2507.19071

citations

#710

Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting

Jiaxin Huang, Sheng Miao, Bangbang Yang et al.

ICCV 2025posterarXiv:2504.11092

citations

#711

Grouped Speculative Decoding for Autoregressive Image Generation

Junhyuk So, Juncheol Shin, Hyunho Kook et al.

ICCV 2025posterarXiv:2508.07747

citations

#712

SHeaP: Self-supervised Head Geometry Predictor Learned via 2D Gaussians

Liam Schoneveld, Zhe Chen, Davide Davoli et al.

ICCV 2025posterarXiv:2504.12292

citations

#713

TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis

Tri Ton, Ji Woo Hong, Chang Yoo

ICCV 2025posterarXiv:2504.05684

citations

#714

Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator

Ronglai Zuo, Rolandos Alexandros Potamias, Evangelos Ververas et al.

ICCV 2025posterarXiv:2411.17799

citations

#715

3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection

Yung-Hsu Yang, Luigi Piccinelli, Mattia Segu et al.

ICCV 2025posterarXiv:2507.23567

citations

#716

Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation

Qi Guo, Zhen Tian, Minghao Yao et al.

ICCV 2025posterarXiv:2410.06848

citations

#717

AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion

Yangyi Huang, Ye Yuan, Xueting Li et al.

ICCV 2025posterarXiv:2505.24877

citations

#718

Learning Streaming Video Representation via Multitask Training

Yibin Yan, Jilan Xu, Shangzhe Di et al.

ICCV 2025posterarXiv:2504.20041

citations

#719

BATCLIP: Bimodal Online Test-Time Adaptation for CLIP

Sarthak Kumar Maharana, Baoming Zhang, Leonid Karlinsky et al.

ICCV 2025posterarXiv:2412.02837

citations

#720

Boosting Adversarial Transferability via Residual Perturbation Attack

Jinjia Peng, Zeze Tao, Huibing Wang et al.

ICCV 2025posterarXiv:2508.05689

citations

#721

StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion

Ziyu Guo, Young-Yoon Lee, Joseph Liu et al.

ICCV 2025posterarXiv:2503.21775

citations

#722

MVGBench: a Comprehensive Benchmark for Multi-view Generation Models

Xianghui Xie, Jan Lenssen, Gerard Pons-Moll

ICCV 2025poster

citations

#723

Colors See Colors Ignore: Clothes Changing ReID with Color Disentanglement

Priyank Pathak, Yogesh Rawat

ICCV 2025posterarXiv:2507.07230

citations

#724

GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation

Wentao Hu, Shunkai Li, Ziqiao Peng et al.

ICCV 2025highlightarXiv:2506.21513

citations

#725

You Think, You ACT: The New Task of Arbitrary Text to Motion Generation

Runqi Wang, Caoyuan Ma, Guopeng Li et al.

ICCV 2025posterarXiv:2404.14745

citations

#726

GenM3: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation

Junyu Shi, Lijiang LIU, Yong Sun et al.

ICCV 2025poster

citations

#727

TokensGen: Harnessing Condensed Tokens for Long Video Generation

Wenqi Ouyang, Zeqi Xiao, Danni Yang et al.

ICCV 2025posterarXiv:2507.15728

citations

#728

Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding

Huy Ta, Duy Anh Huynh, Yutong Xie et al.

ICCV 2025highlightarXiv:2505.15123

citations

#729

RePoseD: Efficient Relative Pose Estimation With Known Depth Information

Yaqing Ding, Viktor Kocur, VACLAV VAVRA et al.

ICCV 2025posterarXiv:2501.07742

citations

#730

LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders

Ilan Naiman, Emanuel Baruch Baruch, Oron Anschel et al.

ICCV 2025posterarXiv:2504.03501

citations

#731

Adaptive Articulated Object Manipulation On The Fly with Foundation Model Reasoning and Part Grounding

Xiaojie Zhang, Yuanfei Wang, Ruihai Wu et al.

ICCV 2025posterarXiv:2507.18276

citations

#732

MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers

Yang Tian, Zheng Lu, Mingqi Gao et al.

ICCV 2025posterarXiv:2503.16856

citations

#733

CLIP-Adapted Region-to-Text Learning for Generative Open-Vocabulary Semantic Segmentation

Jiannan Ge, Lingxi Xie, Hongtao Xie et al.

ICCV 2025poster

citations

#734

Hybrid-grained Feature Aggregation with Coare-to-fine Language Guidance for Self-supervised Monocular Depth Estimation

Wenyao Zhang, Hongsi Liu, Bohan Li et al.

ICCV 2025poster

citations

#735

Understanding Co-speech Gestures in-the-wild

Sindhu Hegde, K R Prajwal, Taein Kwon et al.

ICCV 2025posterarXiv:2503.22668

citations

#736

Differentiable Room Acoustic Rendering with Multi-View Vision Priors

Derong Jin, Ruohan Gao

ICCV 2025posterarXiv:2504.21847

citations

#737

HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction for Large-Scale Aerial Scenes

Mai Su, Zhongtao Wang, Huishan Au et al.

ICCV 2025posterarXiv:2504.16606

citations

#738

MR-FIQA: Face Image Quality Assessment with Multi-Reference Representations from Synthetic Data Generation

Fu-Zhao Ou, Chongyi Li, Shiqi Wang et al.

ICCV 2025poster

citations

#739

Gait-X: Exploring X modality for Generalized Gait Recognition

Zengbin Wang, Saihui Hou, Junjie Li et al.

ICCV 2025poster

citations

#740

Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views

Xiangdong Zhang, Shaofeng Zhang, Junchi Yan

ICCV 2025posterarXiv:2509.01250

citations

#741

AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model

Wenlun Zhang, Yunshan Zhong, Shimpei Ando et al.

ICCV 2025posterarXiv:2503.03088

citations

#742

LINR-PCGC: Lossless Implicit Neural Representations for Point Cloud Geometry Compression

Wenjie Huang, Qi Yang, Shuting Xia et al.

ICCV 2025posterarXiv:2507.15686

citations

#743

Improving SAM for Camouflaged Object Detection via Dual Stream Adapters

Jiaming Liu, Linghe Kong, Guihai Chen

ICCV 2025posterarXiv:2503.06042

citations

#744

Demeter: A Parametric Model of Crop Plant Morphology from the Real World

Tianhang Cheng, Albert Zhai, Evan Chen et al.

ICCV 2025posterarXiv:2510.16377

citations

#745

Robust 3D Object Detection using Probabilistic Point Clouds from Single-Photon LiDARs

Bhavya Goyal, Felipe Gutierrez-Barragan, Wei Lin et al.

ICCV 2025posterarXiv:2508.00169

citations

#746

CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling

Trong-Thang Pham, AKASH AWASTHI, Saba Khan et al.

ICCV 2025highlightarXiv:2507.12591

citations

#747

ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment

Chong Xia, Shengjun Zhang, Fangfu Liu et al.

ICCV 2025posterarXiv:2507.19058

citations

#748

SAS: Segment Any 3D Scene with Integrated 2D Priors

Zhuoyuan Li, Jiahao Lu, Jiacheng Deng et al.

ICCV 2025posterarXiv:2503.08512

citations

#749

From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning

Yuhui Zeng, Haoxiang Wu, Wenjie Nie et al.

ICCV 2025posterarXiv:2502.05843

citations

#750

EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba

Quang Nguyen, Nhat Le, Baoru Huang et al.

ICCV 2025posterarXiv:2508.10522

citations

#751

Generative Active Learning for Long-tail Trajectory Prediction via Controllable Diffusion Model

Daehee Park, Monu Surana, Pranav Desai et al.

ICCV 2025posterarXiv:2507.22615

citations

#752

DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image

Jijun Xiang, Xuan Zhu, Xianqi Wang et al.

ICCV 2025posterarXiv:2504.01596

citations

#753

ETA: Energy-based Test-time Adaptation for Depth Completion

Younjoon Chung, Hyoungseob Park, Patrick Rim et al.

ICCV 2025posterarXiv:2508.05989

citations

#754

Preacher: Paper-to-Video Agentic System

Jingwei Liu, Ling Yang, Hao Luo et al.

ICCV 2025posterarXiv:2508.09632

citations

#755

Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models

Young Kyun Jang, Ser-Nam Lim

ICCV 2025posterarXiv:2405.14715

citations

#756

MOSAIC: Generating Consistent, Privacy-Preserving Scenes from Multiple Depth Views in Multi-Room Environments

Zhixuan Liu, Haokun Zhu, Rui Chen et al.

ICCV 2025posterarXiv:2503.13816

citations

#757

Improving Rectified Flow with Boundary Conditions

Xixi Hu, Runlong Liao, Bo Liu et al.

ICCV 2025posterarXiv:2506.15864

citations

#758

Leveraging Local Patch Alignment to Seam-cutting for Large Parallax Image Stitching

Tianli Liao, Chenyang Zhao, Lei Li et al.

ICCV 2025posterarXiv:2311.18564

citations

#759

MGSR: 2D/3D Mutual-boosted Gaussian Splatting for High-fidelity Surface Reconstruction under Various Light Conditions

Qingyuan Zhou, Yuehu Gong, Weidong Yang et al.

ICCV 2025posterarXiv:2503.05182

citations

#760

Stable-Sim2Real: Exploring Simulation of Real-Captured 3D Data with Two-Stage Depth Diffusion

Mutian Xu, Chongjie Ye, Haolin Liu et al.

ICCV 2025highlightarXiv:2507.23483

citations

#761

Towards Open-World Generation of Stereo Images and Unsupervised Matching

Feng Qiao, Zhexiao Xiong, Eric Xing et al.

ICCV 2025posterarXiv:2503.12720

citations

#762

SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications

Yana Hasson, Pauline Luc, Liliane Momeni et al.

ICCV 2025posterarXiv:2507.03578

citations

#763

CoMoGaussian: Continuous Motion-Aware Gaussian Splatting from Motion-Blurred Images

Jungho Lee, DongHyeong Kim, Dogyoon Lee et al.

ICCV 2025posterarXiv:2503.05332

citations

#764

Acknowledging Focus Ambiguity in Visual Questions

Chongyan Chen, Yu-Yun Tseng, Zhuoheng Li et al.

ICCV 2025posterarXiv:2501.02201

citations

#765

Degradation-Modeled Multipath Diffusion for Tunable Metalens Photography

Jianing Zhang, Jiayi Zhu, Feiyu Ji et al.

ICCV 2025highlightarXiv:2506.22753

citations

#766

Noise2Score3D: Tweedie's Approach for Unsupervised Point Cloud Denoising

Xiangbin Wei, Yuanfeng Wang, Ao XU et al.

ICCV 2025posterarXiv:2503.09283

citations

#767

SKALD: Learning-Based Shot Assembly for Coherent Multi-Shot Video Creation

Chen Yi Lu, Mehrab Tanjim, Ishita Dasgupta et al.

ICCV 2025posterarXiv:2503.08010

citations

#768

EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow

Yixiang Chen, Peiyan Li, Yan Huang et al.

ICCV 2025posterarXiv:2507.06224

citations

#769

Self-Reinforcing Prototype Evolution with Dual-Knowledge Cooperation for Semi-Supervised Lifelong Person Re-Identification

Kunlun Xu, Fan Zhuo, Jiangmeng Li et al.

ICCV 2025posterarXiv:2507.01884

citations

#770

What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

Xavier Thomas, Deepti Ghadiyaram

ICCV 2025posterarXiv:2503.06698

citations

#771

SketchSplat: 3D Edge Reconstruction via Differentiable Multi-view Sketch Splatting

Haiyang Ying, Matthias Zwicker

ICCV 2025posterarXiv:2503.14786

citations

#772

DAViD: Data-efficient and Accurate Vision Models from Synthetic Data

Fatemeh Saleh, Sadegh Aliakbarian, Charlie Hewitt et al.

ICCV 2025posterarXiv:2507.15365

citations

#773

Global Regulation and Excitation via Attention Tuning for Stereo Matching

Jiahao LI, Xinhong Chen, Zhengmin JIANG et al.

ICCV 2025posterarXiv:2509.15891

citations

#774

GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion

Li-Heng Chen, Zi-Xin Zou, Chang Liu et al.

ICCV 2025posterarXiv:2503.22349

citations

#775

Synchronization of Multiple Videos

Avihai Naaman, Ron Shapira Weber, Oren Freifeld

ICCV 2025posterarXiv:2510.14051

citations

#776

General Compression Framework for Efficient Transformer Object Tracking

Lingyi Hong, Jinglun Li, Xinyu Zhou et al.

ICCV 2025posterarXiv:2409.17564

citations

#777

ODP-Bench: Benchmarking Out-of-Distribution Performance Prediction

Han Yu, Kehan Li, Dongbai Li et al.

ICCV 2025posterarXiv:2510.27263

citations

#778

Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion

Enyu Liu, En Yu, Sijia Chen et al.

ICCV 2025posterarXiv:2507.08555

citations

#779

AnyPortal: Zero-Shot Consistent Video Background Replacement

Wenshuo Gao, Xicheng Lan, Shuai Yang

ICCV 2025posterarXiv:2509.07472

citations

#780

What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning

Chi-Hsi Kung, Frangil Ramirez, Juhyung Ha et al.

ICCV 2025posterarXiv:2503.21055

citations

#781

Contact-Aware Amodal Completion for Human-Object Interaction via Multi-Regional Inpainting

Seunggeun Chi, Pin-Hao Huang, Enna Sachdeva et al.

ICCV 2025highlightarXiv:2508.00427

citations

#782

LookOut: Real-World Humanoid Egocentric Navigation

Boxiao Pan, Adam Harley, Francis Engelmann et al.

ICCV 2025posterarXiv:2508.14466

citations

#783

ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition

Sanjoy Kundu, Shanmukha Vellamcheti, Sathyanarayanan Aakur

ICCV 2025posterarXiv:2504.03948

citations

#784

Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads

Yingjie Zhou, Jiezhang Cao, Zicheng Zhang et al.

ICCV 2025posterarXiv:2507.23343

citations

#785

IDF: Iterative Dynamic Filtering Networks for Generalizable Image Denoising

Dongjin Kim, Jaekyun Ko, Muhammad Kashif Ali et al.

ICCV 2025posterarXiv:2508.19649

citations

#786

Identity Preserving 3D Head Stylization with Multiview Score Distillation

Bahri Batuhan Bilecen, Ahmet Berke Gokmen, Furkan Güzelant et al.

ICCV 2025posterarXiv:2411.13536

citations

#787

Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation

Yong Liu, Song-Li Wu, Sule Bai et al.

ICCV 2025posterarXiv:2506.16058

citations

#788

TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models

Ziyang Luo, Nian Liu, Xuguang Yang et al.

ICCV 2025posterarXiv:2506.11436

citations

#789

Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation

Shaowei Liu, chuan guo, Bing Zhou et al.

ICCV 2025posterarXiv:2510.14976

citations

#790

GLEAM: Enhanced Transferable Adversarial Attacks for Vision-Language Pre-training Models via Global-Local Transformations

Yunqi Liu, Xiaohui Cui, Ouyang Xue

ICCV 2025poster

citations

#791

SMGDiff: Soccer Motion Generation using Diffusion Probabilistic Models

Hongdi Yang, Chengyang Li, Zhenxuan Wu et al.

ICCV 2025posterarXiv:2411.16216

citations

#792

Object-level Correlation for Few-Shot Segmentation

chunlin wen, Yu Zhang, Jie Fan et al.

ICCV 2025posterarXiv:2509.07917

citations

#793

FG-OrIU: Towards Better Forgetting via Feature-Gradient Orthogonality for Incremental Unlearning

qian feng, Jiahang Tu, Mintong Kang et al.

ICCV 2025posterarXiv:2601.13578

citations

#794

Generate, Transduct, Adapt: Iterative Transduction with VLMs

Oindrila Saha, Logan Lawrence, Grant Horn et al.

ICCV 2025posterarXiv:2501.06031

citations

#795

Generative Adversarial Diffusion

U-Chae Jun, Jaeeun Ko, Jiwoo Kang

ICCV 2025poster

citations

#796

Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization

Kangle Deng, Hsueh-Ti Derek Liu, Yiheng Zhu et al.

ICCV 2025posterarXiv:2504.02817

citations

#797

EvolvingGrasp: Evolutionary Grasp Generation via Efficient Preference Alignment

Yufei Zhu, Yiming Zhong, Zemin Yang et al.

ICCV 2025posterarXiv:2503.14329

citations

#798

FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image

Fei Yin, Mallikarjun Reddy, Chun-Han Yao et al.

ICCV 2025posterarXiv:2504.15179

citations

#799

Sequential keypoint density estimator: an overlooked baseline of skeleton-based video anomaly detection

Anja Delić, Matej Grcic, Siniša Šegvić

ICCV 2025highlightarXiv:2506.18368

citations

#800

PseudoMapTrainer: Learning Online Mapping without HD Maps

Christian Löwens, Thorben Funke, Jingchao Xie et al.

ICCV 2025posterarXiv:2508.18788

citations

← Previous

1 2 3 4 5 6...14