Most Cited ICCV "quantum state preparation" Papers

2,701 papers found • Page 6 of 14

#1001

End-to-End Multi-Modal Diffusion Mamba

Chunhao Lu, Qiang Lu, Meichen Dong et al.

ICCV 2025arXiv:2510.13253
3
citations
#1002

RapVerse: Coherent Vocals and Whole-Body Motion Generation from Text

Jiaben Chen, Xin Yan, Yihang Chen et al.

ICCV 2025arXiv:2405.20336
3
citations
#1003

VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition Dataset

Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam et al.

ICCV 2025
3
citations
#1004

EA-KD: Entropy-based Adaptive Knowledge Distillation

Chi-Ping Su, Ching-Hsun Tseng, Bin Pu et al.

ICCV 2025arXiv:2311.13621
3
citations
#1005

EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device

Gunjan Chhablani, Xiaomeng Ye, Muhammad Zubair Irshad et al.

ICCV 2025arXiv:2509.17430
3
citations
#1006

FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization

Hao Chen, Shell Xu Hu, Wayne Luk et al.

ICCV 2025arXiv:2503.12649
3
citations
#1007

ATCTrack: Aligning Target-Context Cues with Dynamic Target States for Robust Vision-Language Tracking

Xiaokun Feng, Shiyu Hu, Xuchen Li et al.

ICCV 2025highlightarXiv:2507.19875
3
citations
#1008

Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction

Haonan Wang, Qixiang ZHANG, Lehan Wang et al.

ICCV 2025arXiv:2503.11167
3
citations
#1009

CAP: Evaluation of Persuasive and Creative Image Generation

Aysan Aghazadeh, Adriana Kovashka

ICCV 2025arXiv:2412.10426
3
citations
#1010

Details Matter for Indoor Open-vocabulary 3D Instance Segmentation

Sanghun Jung, Jingjing Zheng, Ke Zhang et al.

ICCV 2025arXiv:2507.23134
3
citations
#1011

Teaching VLMs to Localize Specific Objects from In-context Examples

Sivan Doveh, Nimrod Shabtay, Eli Schwartz et al.

ICCV 2025arXiv:2411.13317
3
citations
#1012

VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions

Marko Mihajlovic, Siwei Zhang, Gen Li et al.

ICCV 2025highlightarXiv:2506.23236
3
citations
#1013

Local Dense Logit Relations for Enhanced Knowledge Distillation

Liuchi Xu, Kang Liu, Jinshuai Liu et al.

ICCV 2025arXiv:2507.15911
3
citations
#1014

PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization

Bing Fan, Yunhe Feng, Yapeng Tian et al.

ICCV 2025arXiv:2502.07707
3
citations
#1015

LocalDyGS: Multi-view Global Dynamic Scene Modeling via Adaptive Local Implicit Feature Decoupling

Jiahao Wu, Rui Peng, Jianbo Jiao et al.

ICCV 2025arXiv:2507.02363
3
citations
#1016

HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly

Chang Liu, Yunfan Ye, Fan Zhang et al.

ICCV 2025arXiv:2507.19924
3
citations
#1017

Semantic Causality-Aware Vision-Based 3D Occupancy Prediction

Dubing Chen, Huan Zheng, Yucheng Zhou et al.

ICCV 2025arXiv:2509.08388
3
citations
#1018

GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes

Pradyumn Goyal, Dmitrii Petrov, Sheldon Andrews et al.

ICCV 2025arXiv:2504.02747
3
citations
#1019

GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models

Jonathan Roberts, Kai Han, Samuel Albanie

ICCV 2025arXiv:2408.11817
3
citations
#1020

FREE-Merging: Fourier Transform for Efficient Model Merging

Shenghe Zheng, Hongzhi Wang

ICCV 2025arXiv:2411.16815
3
citations
#1021

Open-set Cross Modal Generalization via Multimodal Unified Representation

Hai Huang, Yan Xia, Shulei Wang et al.

ICCV 2025arXiv:2507.14935
3
citations
#1022

Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation

Xiuyu Yang, Shuhan Tan, Philipp Kraehenbuehl

ICCV 2025arXiv:2506.17213
3
citations
#1023

Social Debiasing for Fair Multi-modal LLMs

Harry Cheng, Yangyang Guo, Qingpei Guo et al.

ICCV 2025arXiv:2408.06569
3
citations
#1024

TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation

Zonglin Lyu, Chen Chen

ICCV 2025arXiv:2507.04984
3
citations
#1025

X2-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction

Weihao Yu, Yuanhao Cai, Ruyi Zha et al.

ICCV 2025
3
citations
#1026

A Hidden Stumbling Block in Generalized Category Discovery: Distracted Attention

Qiyu Xu, Zhanxuan Hu, Yu Duan et al.

ICCV 2025arXiv:2507.14315
3
citations
#1027

4D Gaussian Splatting SLAM

Yanyan Li, Youxu Fang, Zunjie Zhu et al.

ICCV 2025arXiv:2503.16710
3
citations
#1028

Large-scale Pre-training for Grounded Video Caption Generation

Evangelos Kazakos, Cordelia Schmid, Josef Sivic

ICCV 2025arXiv:2503.10781
3
citations
#1029

Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy

Yiting Yang, Hao Luo, Yuan Sun et al.

ICCV 2025arXiv:2507.13260
3
citations
#1030

AGO: Adaptive Grounding for Open World 3D Occupancy Prediction

Peizheng Li, Shuxiao Ding, You Zhou et al.

ICCV 2025arXiv:2504.10117
3
citations
#1031

Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation

Yusuke Hirota, Ryo Hachiuma, Boyi Li et al.

ICCV 2025arXiv:2509.07596
3
citations
#1032

Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning

Zeyu Xi, Haoying Sun, Yaofei Wu et al.

ICCV 2025arXiv:2507.20163
3
citations
#1033

ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models

Guoyizhe Wei, Rama Chellappa

ICCV 2025arXiv:2504.00037
3
citations
#1034

TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation

Amin Karimi Monsefi, Mridul Khurana, Rajiv Ramnath et al.

ICCV 2025arXiv:2506.01923
3
citations
#1035

TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models

Ziyang Luo, Nian Liu, Xuguang Yang et al.

ICCV 2025arXiv:2506.11436
3
citations
#1036

PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection

Xiao Li, Yiming Zhu, Yifan Huang et al.

ICCV 2025arXiv:2506.23581
3
citations
#1037

TemCoCo: Temporally Consistent Multi-modal Video Fusion with Visual-Semantic Collaboration

Gong Meiqi, Hao Zhang, Xunpeng Yi et al.

ICCV 2025arXiv:2508.17817
3
citations
#1038

Moderating the Generalization of Score-based Generative Model

Wan Jiang, He Wang, Xin Zhang et al.

ICCV 2025arXiv:2412.07229
3
citations
#1039

Uncertainty-Aware Gradient Stabilization for Small Object Detection

Huixin Sun, Yanjing Li, Linlin Yang et al.

ICCV 2025arXiv:2303.01803
3
citations
#1040

ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers

Nicholas DiBrita, Jason Han, Tirthak Patel

ICCV 2025arXiv:2506.21537
3
citations
#1041

ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction

Danhui Chen, Ziquan Liu, Chuxi Yang et al.

ICCV 2025arXiv:2507.15803
3
citations
#1042

TokensGen: Harnessing Condensed Tokens for Long Video Generation

Wenqi Ouyang, Zeqi Xiao, Danni Yang et al.

ICCV 2025arXiv:2507.15728
3
citations
#1043

Revisiting Image Fusion for Multi-Illuminant White-Balance Correction

David Serrano, Aditya Arora, Luis Herranz et al.

ICCV 2025arXiv:2503.14774
3
citations
#1044

Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow

Ruyang Liu, Shangkun Sun, Haoran Tang et al.

ICCV 2025arXiv:2510.05836
3
citations
#1045

Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation

HIroyasu Akada, Jian Wang, Vladislav Golyanik et al.

ICCV 2025arXiv:2503.11652
3
citations
#1046

LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding

Amirhossein Kazerouni, Soroush Mehraban, Michael Brudno et al.

ICCV 2025arXiv:2503.15420
3
citations
#1047

Underwater Visual SLAM with Depth Uncertainty and Medium Modeling

Rui Liu, Sheng Fan, Wenguan Wang et al.

ICCV 2025highlight
3
citations
#1048

ReferEverything: Towards Segmenting Everything We Can Speak of in Videos

Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang et al.

ICCV 2025arXiv:2410.23287
3
citations
#1049

Hierarchical Material Recognition from Local Appearance

Matthew Beveridge, Shree Nayar

ICCV 2025highlightarXiv:2505.22911
3
citations
#1050

Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection

Jinglun Li, Kaixun Jiang, Zhaoyu Chen et al.

ICCV 2025highlightarXiv:2507.10225
3
citations
#1051

Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation

Junyu Xie, Tengda Han, Max Bain et al.

ICCV 2025arXiv:2504.01020
3
citations
#1052

FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation

Yunpeng Bai, Qixing Huang

ICCV 2025arXiv:2412.00671
3
citations
#1053

LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs

Hanyu Zhou, Gim Hee Lee

ICCV 2025arXiv:2503.06934
3
citations
#1054

Memory-Efficient 4-bit Preconditioned Stochastic Optimization

Jingyang Li, Kuangyu Ding, Kim-chuan Toh et al.

ICCV 2025arXiv:2412.10663
3
citations
#1055

DiMPLe - Disentangled Multi-Modal Prompt Learning: Enhancing Out-Of-Distribution Alignment with Invariant and Spurious Feature Separation

Umaima Rahman, Mohammad Yaqub, Dwarikanath Mahapatra

ICCV 2025arXiv:2506.21237
3
citations
#1056

Generalizable Object Re-Identification via Visual In-Context Prompting

Zhizhong Huang, Xiaoming Liu

ICCV 2025arXiv:2508.21222
3
citations
#1057

Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting

Guangben Lu, Yuzhen N/A, Zhimin Sun et al.

ICCV 2025arXiv:2412.03812
3
citations
#1058

Beyond Isolated Words: Diffusion Brush for Handwritten Text-Line Generation

Gang Dai, Yifan Zhang, Yutao Qin et al.

ICCV 2025arXiv:2508.03256
3
citations
#1059

VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting

Hao Chen, Tao Han, Song Guo et al.

ICCV 2025arXiv:2412.02503
3
citations
#1060

Online Language Splatting

Saimouli Katragadda, Cho-Ying Wu, Yuliang Guo et al.

ICCV 2025arXiv:2503.09447
3
citations
#1061

Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation

Jiaer Xia, Bingkui Tong, Yuhang Zang et al.

ICCV 2025highlightarXiv:2507.02859
3
citations
#1062

Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle

Miroslav Purkrabek, Jiri Matas

ICCV 2025arXiv:2412.01562
3
citations
#1063

ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering

Duong T. Tran, Trung-Kien Tran, Manfred Hauswirth et al.

ICCV 2025arXiv:2507.16403
3
citations
#1064

Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning

Zihua Zhao, Feng Hong, Mengxi Chen et al.

ICCV 2025arXiv:2507.12998
3
citations
#1065

Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation

Zhenjun Yu, Wenqiang Xu, Pengfei Xie et al.

ICCV 2025arXiv:2411.09572
3
citations
#1066

Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations

Hai Huang, Yan Xia, Sashuai Zhou et al.

ICCV 2025arXiv:2507.03304
3
citations
#1067

Sparse Fine-Tuning of Transformers for Generative Tasks

Wei Chen, Jingxi Yu, Zichen Miao et al.

ICCV 2025arXiv:2507.10855
3
citations
#1068

Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning

Wenxuan Bao, Ruxi Deng, Ruizhong Qiu et al.

ICCV 2025arXiv:2507.21494
3
citations
#1069

Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval

Dohwan Ko, Ji Soo Lee, Minhyuk Choi et al.

ICCV 2025highlightarXiv:2507.23284
3
citations
#1070

OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography

Li Caoshuo, Zengmao Ding, Xiaobin Hu et al.

ICCV 2025arXiv:2506.21101
3
citations
#1071

FairGen: Enhancing Fairness in Text-to-Image Diffusion Models via Self-Discovering Latent Directions

Yilei Jiang, Wei-Hong Li, Yiyuan Zhang et al.

ICCV 2025arXiv:2412.18810
3
citations
#1072

ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation

Sherry Chen, Yi Wei, Luowei Zhou et al.

ICCV 2025arXiv:2507.07317
3
citations
#1073

How Can Objects Help Video-Language Understanding?

Zitian Tang, Shijie Wang, Junho Cho et al.

ICCV 2025arXiv:2504.07454
3
citations
#1074

SA-LUT: Spatial Adaptive 4D Look-Up Table for Photorealistic Style Transfer

Zerui Gong, Zhonghua Wu, Qingyi Tao et al.

ICCV 2025arXiv:2506.13465
3
citations
#1075

Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation

Zhaorui Tan, Xi Yang, Tan Pan et al.

ICCV 2025arXiv:2411.06106
3
citations
#1076

SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency

Yangyang Guo, Mohan Kankanhalli

ICCV 2025arXiv:2411.09126
3
citations
#1077

Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval

WonJun Moon, Cheol-Ho Cho, Woojin Jun et al.

ICCV 2025arXiv:2504.13035
3
citations
#1078

Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning

Yafei Zhang, Lingqi Kong, Huafeng Li et al.

ICCV 2025arXiv:2507.12942
3
citations
#1079

VideoAds for Fast-Paced Video Understanding

Zheyuan Zhang, Wanying Dou, Linkai Peng et al.

ICCV 2025arXiv:2504.09282
3
citations
#1080

ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail

Chandan Yeshwanth, David Rozenberszki, Angela Dai

ICCV 2025arXiv:2503.17044
3
citations
#1081

Dark-ISP: Enhancing RAW Image Processing for Low-Light Object Detection

Jiasheng Guo, Xin Gao, Yuxiang Yan et al.

ICCV 2025arXiv:2509.09183
3
citations
#1082

Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation

Congyi Fan, Jian Guan, Xuanjia Zhao et al.

ICCV 2025arXiv:2503.17340
3
citations
#1083

HyTIP: Hybrid Temporal Information Propagation for Masked Conditional Residual Video Coding

Yi-Hsin Chen, Yi-Chen Yao, Kuan-Wei Ho et al.

ICCV 2025arXiv:2508.02072
3
citations
#1084

ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges

Jiaxin Ai, Pengfei Zhou, xu Pan et al.

ICCV 2025arXiv:2503.06553
3
citations
#1085

ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering

Kaisi Guan, Zhengfeng Lai, Yuchong Sun et al.

ICCV 2025arXiv:2503.16867
3
citations
#1086

Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention

Jeonghoon Park, Juyoung Lee, Chaeyeon Chung et al.

ICCV 2025arXiv:2506.13298
3
citations
#1087

Color Matching Using Hypernetwork-Based Kolmogorov-Arnold Networks

Artem Nikonorov, Georgy Perevozchikov, Andrei Korepanov et al.

ICCV 2025arXiv:2503.11781
3
citations
#1088

Colors See Colors Ignore: Clothes Changing ReID with Color Disentanglement

Priyank Pathak, Yogesh Rawat

ICCV 2025arXiv:2507.07230
3
citations
#1089

Joint Self-Supervised Video Alignment and Action Segmentation

Ali Shah Ali, Syed Ahmed Mahmood, Mubin Saeed et al.

ICCV 2025arXiv:2503.16832
3
citations
#1090

A Differentiable Wave Optics Model for End-to-End Computational Imaging System Optimization

Chi-Jui Ho, Yash Belhe, Steve Rotenberg et al.

ICCV 2025arXiv:2412.09774
3
citations
#1091

OmniVTON: Training-Free Universal Virtual Try-On

Zhaotong Yang, Yuhui Li, Shengfeng He et al.

ICCV 2025arXiv:2507.15037
3
citations
#1092

Task Vector Quantization for Memory-Efficient Model Merging

Youngeun Kim, Seunghwan Lee, Aecheon Jung et al.

ICCV 2025arXiv:2503.06921
3
citations
#1093

COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation

Siqi Zhang, Yanyuan Qiao, Qunbo Wang et al.

ICCV 2025arXiv:2503.24065
3
citations
#1094

Simultaneous Motion And Noise Estimation with Event Cameras

Shintaro Shiba, Yoshimitsu Aoki, Guillermo Gallego

ICCV 2025arXiv:2504.04029
2
citations
#1095

SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting

Shengjie Lin, Jiading Fang, Muhammad Zubair Irshad et al.

ICCV 2025arXiv:2506.03594
2
citations
#1096

Efficient Multi-Person Motion Prediction by Lightweight Spatial and Temporal Interactions

Yuanhong Zheng, Ruixuan Yu, Jian Sun

ICCV 2025arXiv:2507.09446
2
citations
#1097

Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models

Hyungjin Kim, Seokho Ahn, Young-Duk Seo

ICCV 2025arXiv:2508.03481
2
citations
#1098

LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders

Ilan Naiman, Emanuel Baruch Baruch, Oron Anschel et al.

ICCV 2025arXiv:2504.03501
2
citations
#1099

MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics

Bowei Guo, Shengkun Tang, Cong Zeng et al.

ICCV 2025arXiv:2510.11962
2
citations
#1100

Holistic Tokenizer for Autoregressive Image Generation

Anlin Zheng, Haochen Wang, Yucheng Zhao et al.

ICCV 2025arXiv:2507.02358
2
citations
#1101

PLA: Prompt Learning Attack against Text-to-Image Generative Models

XINQI LYU, Yihao LIU, Yanjie Li et al.

ICCV 2025arXiv:2508.03696
2
citations
#1102

Seeing and Seeing Through the Glass: Real and Synthetic Data for Multi-Layer Depth Estimation

Hongyu Wen, Yiming Zuo, Venkat Subramanian et al.

ICCV 2025arXiv:2503.11633
2
citations
#1103

MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers

Yang Tian, Zheng Lu, Mingqi Gao et al.

ICCV 2025arXiv:2503.16856
2
citations
#1104

Describe, Don’t Dictate: Semantic Image Editing with Natural Language Intent

En Ci, Shanyan Guan, Yanhao Ge et al.

ICCV 2025
2
citations
#1105

CLIP-Adapted Region-to-Text Learning for Generative Open-Vocabulary Semantic Segmentation

Jiannan Ge, Lingxi Xie, Hongtao Xie et al.

ICCV 2025
2
citations
#1106

Hybrid-grained Feature Aggregation with Coare-to-fine Language Guidance for Self-supervised Monocular Depth Estimation

Wenyao Zhang, Hongsi Liu, Bohan Li et al.

ICCV 2025
2
citations
#1107

Backdoor Attacks on Neural Networks via One-Bit Flip

Xiang Li, Lannan Luo, Qiang Zeng

ICCV 2025
2
citations
#1108

DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models

Revant Teotia, Candace Ross, Karen Ullrich et al.

ICCV 2025arXiv:2506.05108
2
citations
#1109

Differentiable Room Acoustic Rendering with Multi-View Vision Priors

Derong Jin, Ruohan Gao

ICCV 2025arXiv:2504.21847
2
citations
#1110

What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models

Lorenzo Baraldi, Davide Bucciarelli, Federico Betti et al.

ICCV 2025arXiv:2505.20405
2
citations
#1111

CVPT: Cross Visual Prompt Tuning

Lingyun Huang, Jianxu Mao, Junfei YI et al.

ICCV 2025arXiv:2408.14961
2
citations
#1112

Consensus-Driven Active Model Selection

Justin Kay, Grant Horn, Subhransu Maji et al.

ICCV 2025highlightarXiv:2507.23771
2
citations
#1113

FB-Diff: Fourier Basis-guided Diffusion for Temporal Interpolation of 4D Medical Imaging

Xin You, Runze Yang, Chuyan Zhang et al.

ICCV 2025arXiv:2507.04547
2
citations
#1114

SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting

Paschalis Giakoumoglou, Dimitrios Karageorgiou, Symeon Papadopoulos et al.

ICCV 2025arXiv:2502.06593
2
citations
#1115

Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion

Yidi Liu, Dong Li, Yuxin Ma et al.

ICCV 2025arXiv:2503.12764
2
citations
#1116

OpenAnimals: Revisiting Person Re-Identification for Animals Towards Better Generalization

Saihui Hou, Panjian Huang, Zengbin Wang et al.

ICCV 2025arXiv:2410.00204
2
citations
#1117

Addressing Text Embedding Leakage in Diffusion-based Image Editing

Sunung Mun, Jinhwan Nam, Sunghyun Cho et al.

ICCV 2025arXiv:2412.04715
2
citations
#1118

CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models

Junho Kim, Hyungjin Chung, Byung-Hoon Kim

ICCV 2025arXiv:2411.06869
2
citations
#1119

Improving SAM for Camouflaged Object Detection via Dual Stream Adapters

Jiaming Liu, Linghe Kong, Guihai Chen

ICCV 2025arXiv:2503.06042
2
citations
#1120

CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling

Trong-Thang Pham, AKASH AWASTHI, Saba Khan et al.

ICCV 2025highlightarXiv:2507.12591
2
citations
#1121

MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion

Fei Peng, Junqiang Wu, Yan Li et al.

ICCV 2025arXiv:2508.14440
2
citations
#1122

SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions

Jessica Bader, Leander Girrbach, Stephan Alaniz et al.

ICCV 2025arXiv:2507.23784
2
citations
#1123

StyleKeeper: Prevent Content Leakage using Negative Visual Query Guidance

Jaeseok Jeong, Junho Kim, Youngjung Uh et al.

ICCV 2025arXiv:2510.06827
2
citations
#1124

SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark

Alex Costanzino, Pierluigi Zama Ramirez, Luigi Lella et al.

ICCV 2025arXiv:2506.21549
2
citations
#1125

Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data

Zeyi Sun, Tong Wu, Pan Zhang et al.

ICCV 2025arXiv:2406.00093
2
citations
#1126

RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions

Bimsara Pathiraja, Maitreya Patel, Shivam Singh et al.

ICCV 2025arXiv:2506.03448
2
citations
#1127

Perspective-Aware Teaching: Adapting Knowledge for Heterogeneous Distillation

Jhe-Hao Lin, Yi Yao, Chan-Feng Hsu et al.

ICCV 2025arXiv:2501.08885
2
citations
#1128

SPADE: Spatial-Aware Denoising Network for Open-vocabulary Panoptic Scene Graph Generation with Long- and Local-range Context Reasoning

XIN Hu, Ke Qin, Guiduo Duan et al.

ICCV 2025arXiv:2507.05798
2
citations
#1129

SEGA: A Stepwise Evolution Paradigm for Content-Aware Layout Generation with Design Prior

Bo Zhao, Haoran Wang, Jinghui Wang et al.

ICCV 2025highlightarXiv:2510.15749
2
citations
#1130

SAS: Segment Any 3D Scene with Integrated 2D Priors

Zhuoyuan Li, Jiahao Lu, Jiacheng Deng et al.

ICCV 2025arXiv:2503.08512
2
citations
#1131

RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS

Chuanyu Fu, Yuqi Zhang, Kunbin Yao et al.

ICCV 2025arXiv:2506.02751
2
citations
#1132

Multi-Modal Few-Shot Temporal Action Segmentation

Zijia Lu, Ehsan Elhamifar

ICCV 2025
2
citations
#1133

SDMatte: Grafting Diffusion Models for Interactive Matting

Longfei Huang, Yu Liang, Hao Zhang et al.

ICCV 2025arXiv:2508.00443
2
citations
#1134

LoRAverse: A Submodular Framework to Retrieve Diverse Adapters for Diffusion Models

Mert Sonmezer, Matthew Zheng, Pinar Yanardag

ICCV 2025arXiv:2510.15022
2
citations
#1135

Structure Matters: Revisiting Boundary Refinement in Video Object Segmentation

Guanyi Qin, Ziyue Wang, Daiyun Shen et al.

ICCV 2025highlightarXiv:2507.18944
2
citations
#1136

Autoregressive Denoising Score Matching is a Good Video Anomaly Detector

hanwen Zhang, Congqi Cao, Qinyi Lv et al.

ICCV 2025arXiv:2506.23282
2
citations
#1137

Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding

Huy Ta, Duy Anh Huynh, Yutong Xie et al.

ICCV 2025highlightarXiv:2505.15123
2
citations
#1138

MatchDiffusion: Training-free Generation of Match-Cuts

Alejandro Pardo, Fabio Pizzati, Tong Zhang et al.

ICCV 2025arXiv:2411.18677
2
citations
#1139

DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image

Jijun Xiang, Xuan Zhu, Xianqi Wang et al.

ICCV 2025arXiv:2504.01596
2
citations
#1140

EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba

Quang Nguyen, Nhat Le, Baoru Huang et al.

ICCV 2025arXiv:2508.10522
2
citations
#1141

IntroStyle: Training-Free Introspective Style Attribution using Diffusion Features

Anand Kumar, Jiteng Mu, Nuno Vasconcelos

ICCV 2025arXiv:2412.14432
2
citations
#1142

ETA: Energy-based Test-time Adaptation for Depth Completion

Younjoon Chung, Hyoungseob Park, Patrick Rim et al.

ICCV 2025arXiv:2508.05989
2
citations
#1143

Consistency Trajectory Matching for One-Step Generative Super-Resolution

Weiyi You, Mingyang Zhang, Leheng Zhang et al.

ICCV 2025arXiv:2503.20349
2
citations
#1144

SAFER: Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers

Bhavna Gopal, Huanrui Yang, Mark Horton et al.

ICCV 2025arXiv:2501.01529
2
citations
#1145

Improving Rectified Flow with Boundary Conditions

Xixi Hu, Runlong Liao, Bo Liu et al.

ICCV 2025arXiv:2506.15864
2
citations
#1146

RePoseD: Efficient Relative Pose Estimation With Known Depth Information

Yaqing Ding, Viktor Kocur, VACLAV VAVRA et al.

ICCV 2025arXiv:2501.07742
2
citations
#1147

Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling

Hayeon Kim, Ji Ha Jang, Se Young Chun

ICCV 2025arXiv:2507.11061
2
citations
#1148

LUT-Fuse: Towards Extremely Fast Infrared and Visible Image Fusion via Distillation to Learnable Look-Up Tables

Xunpeng Yi, yibing zhang, Xinyu Xiang et al.

ICCV 2025arXiv:2509.00346
2
citations
#1149

DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior

Junzhe Lu, Jing Lin, Hongkun Dou et al.

ICCV 2025arXiv:2508.00599
2
citations
#1150

Adaptive Articulated Object Manipulation On The Fly with Foundation Model Reasoning and Part Grounding

Xiaojie Zhang, Yuanfei Wang, Ruihai Wu et al.

ICCV 2025arXiv:2507.18276
2
citations
#1151

Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling

LI XIAOJIE, Ronghui Li, Shukai Fang et al.

ICCV 2025arXiv:2507.14915
2
citations
#1152

One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory

Chenhao Zheng, Jieyu Zhang, Mohammadreza Salehi et al.

ICCV 2025highlightarXiv:2505.23617
2
citations
#1153

Towards Explicit Exoskeleton for the Reconstruction of Complicated 3D Human Avatars

Yifan Zhan, Qingtian Zhu, Muyao Niu et al.

ICCV 2025arXiv:2410.08082
2
citations
#1154

MR-FIQA: Face Image Quality Assessment with Multi-Reference Representations from Synthetic Data Generation

Fu-Zhao Ou, Chongyi Li, Shiqi Wang et al.

ICCV 2025
2
citations
#1155

Gait-X: Exploring X modality for Generalized Gait Recognition

Zengbin Wang, Saihui Hou, Junjie Li et al.

ICCV 2025
2
citations
#1156

On-Device Diffusion Transformer Policy for Efficient Robot Manipulation

Yiming Wu, Huan Wang, Zhenghao Chen et al.

ICCV 2025arXiv:2508.00697
2
citations
#1157

Metric Convolutions: A Unifying Theory to Adaptive Image Convolutions

Thomas Dagès, Michael Lindenbaum, Alfred Bruckstein

ICCV 2025arXiv:2406.05400
2
citations
#1158

Understanding Co-speech Gestures in-the-wild

Sindhu Hegde, K R Prajwal, Taein Kwon et al.

ICCV 2025arXiv:2503.22668
2
citations
#1159

VoxelKP: A Voxel-based Network Architecture for Human Keypoint Estimation in LiDAR Data

Jian Shi, Peter Wonka

ICCV 2025arXiv:2312.08871
2
citations
#1160

HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction for Large-Scale Aerial Scenes

Mai Su, Zhongtao Wang, Huishan Au et al.

ICCV 2025arXiv:2504.16606
2
citations
#1161

CarGait: Cross-Attention based Re-ranking for Gait recognition

Gavriel Habib, Noa Barzilay, Or Shimshi et al.

ICCV 2025arXiv:2503.03501
2
citations
#1162

VoluMe – Authentic 3D Video Calls from Live Gaussian Splat Prediction

Martin de La Gorce, Charlie Hewitt, Tibor Takács et al.

ICCV 2025arXiv:2507.21311
2
citations
#1163

Preacher: Paper-to-Video Agentic System

Jingwei Liu, Ling Yang, Hao Luo et al.

ICCV 2025arXiv:2508.09632
2
citations
#1164

Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views

Xiangdong Zhang, Shaofeng Zhang, Junchi Yan

ICCV 2025arXiv:2509.01250
2
citations
#1165

Demeter: A Parametric Model of Crop Plant Morphology from the Real World

Tianhang Cheng, Albert Zhai, Evan Chen et al.

ICCV 2025arXiv:2510.16377
2
citations
#1166

Generative Modeling of Shape-Dependent Self-Contact Human Poses

Takehiko Ohkawa, Jihyun Lee, Shunsuke Saito et al.

ICCV 2025arXiv:2509.23393
2
citations
#1167

LINR-PCGC: Lossless Implicit Neural Representations for Point Cloud Geometry Compression

Wenjie Huang, Qi Yang, Shuting Xia et al.

ICCV 2025arXiv:2507.15686
2
citations
#1168

Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration

Katie Luo, Minh-Quan Dao, Zhenzhen Liu et al.

ICCV 2025arXiv:2502.14156
2
citations
#1169

Trokens: Semantic-Aware Relational Trajectory Tokens for Few-Shot Action Recognition

Pulkit Kumar, Shuaiyi Huang, Matthew Walmer et al.

ICCV 2025arXiv:2508.03695
2
citations
#1170

DisenQ: Disentangling Q-Former for Activity-Biometrics

Shehreen Azad, Yogesh Rawat

ICCV 2025highlightarXiv:2507.07262
2
citations
#1171

From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning

Yuhui Zeng, Haoxiang Wu, Wenjie Nie et al.

ICCV 2025arXiv:2502.05843
2
citations
#1172

Representing 3D Shapes With 64 Latent Vectors for 3D Diffusion Models

In Cho, Youngbeom Yoo, Subin Jeon et al.

ICCV 2025arXiv:2503.08737
2
citations
#1173

SVG-Head: Hybrid Surface-Volumetric Gaussians for High-Fidelity Head Reconstruction and Real-Time Editing

Heyi Sun, Cong Wang, Tian-Xing Xu et al.

ICCV 2025arXiv:2508.09597
2
citations
#1174

Stable-Sim2Real: Exploring Simulation of Real-Captured 3D Data with Two-Stage Depth Diffusion

Mutian Xu, Chongjie Ye, Haolin Liu et al.

ICCV 2025highlightarXiv:2507.23483
2
citations
#1175

ChartCap: Mitigating Hallucination of Dense Chart Captioning

Junyoung Lim, Jaewoo Ahn, Gunhee Kim

ICCV 2025highlightarXiv:2508.03164
2
citations
#1176

Robust 3D Object Detection using Probabilistic Point Clouds from Single-Photon LiDARs

Bhavya Goyal, Felipe Gutierrez-Barragan, Wei Lin et al.

ICCV 2025arXiv:2508.00169
2
citations
#1177

SKALD: Learning-Based Shot Assembly for Coherent Multi-Shot Video Creation

Chen Yi Lu, Mehrab Tanjim, Ishita Dasgupta et al.

ICCV 2025arXiv:2503.08010
2
citations
#1178

RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction

Johannes Künzel, Anna Hilsmann, Peter Eisert

ICCV 2025arXiv:2507.04839
2
citations
#1179

What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

Xavier Thomas, Deepti Ghadiyaram

ICCV 2025arXiv:2503.06698
2
citations
#1180

IMoRe: Implicit Program-Guided Reasoning for Human Motion Q&A

Chen Li, Chinthani Sugandhika, Ee Yeo Keat et al.

ICCV 2025arXiv:2508.01984
2
citations
#1181

Blind2Sound: Self-Supervised Image Denoising without Residual Noise

Jiazheng Liu, Zejin Wang, Bohao Chen et al.

ICCV 2025arXiv:2303.05183
2
citations
#1182

Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images

Boyang Deng, Kyle Genova, Songyou Peng et al.

ICCV 2025highlightarXiv:2504.08727
2
citations
#1183

Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-shot Skeleton-based Action Recognition

Jeonghyeok Do, Munchurl Kim

ICCV 2025arXiv:2411.10745
2
citations
#1184

PERSONA: Personalized Whole-Body 3D Avatar with Pose-Driven Deformations from a Single Image

Geonhee Sim, Gyeongsik Moon

ICCV 2025arXiv:2508.09973
2
citations
#1185

Synchronization of Multiple Videos

Avihai Naaman, Ron Shapira Weber, Oren Freifeld

ICCV 2025arXiv:2510.14051
2
citations
#1186

Generative Active Learning for Long-tail Trajectory Prediction via Controllable Diffusion Model

Daehee Park, Monu Surana, Pranav Desai et al.

ICCV 2025arXiv:2507.22615
2
citations
#1187

Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models

Young Kyun Jang, Ser-Nam Lim

ICCV 2025arXiv:2405.14715
2
citations
#1188

ODP-Bench: Benchmarking Out-of-Distribution Performance Prediction

Han Yu, Kehan Li, Dongbai Li et al.

ICCV 2025arXiv:2510.27263
2
citations
#1189

MagShield: Towards Better Robustness in Sparse Inertial Motion Capture Under Magnetic Disturbances

Yunzhe Shao, Xinyu Yi, Lu Yin et al.

ICCV 2025arXiv:2506.22907
2
citations
#1190

What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning

Chi-Hsi Kung, Frangil Ramirez, Juhyung Ha et al.

ICCV 2025arXiv:2503.21055
2
citations
#1191

Contact-Aware Amodal Completion for Human-Object Interaction via Multi-Regional Inpainting

Seunggeun Chi, Pin-Hao Huang, Enna Sachdeva et al.

ICCV 2025highlightarXiv:2508.00427
2
citations
#1192

2D Gaussian Splatting-based Sparse-view Transparent Object Depth Reconstruction via Physics Simulation for Scene Update

Jeongyun Kim, Seunghoon Jeong, Giseop Kim et al.

ICCV 2025arXiv:2507.11069
2
citations
#1193

Identity Preserving 3D Head Stylization with Multiview Score Distillation

Bahri Batuhan Bilecen, Ahmet Berke Gokmen, Furkan Güzelant et al.

ICCV 2025arXiv:2411.13536
2
citations
#1194

Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation

Shaowei Liu, chuan guo, Bing Zhou et al.

ICCV 2025arXiv:2510.14976
2
citations
#1195

MGSR: 2D/3D Mutual-boosted Gaussian Splatting for High-fidelity Surface Reconstruction under Various Light Conditions

Qingyuan Zhou, Yuehu Gong, Weidong Yang et al.

ICCV 2025arXiv:2503.05182
2
citations
#1196

Leveraging Local Patch Alignment to Seam-cutting for Large Parallax Image Stitching

Tianli Liao, Chenyang Zhao, Lei Li et al.

ICCV 2025arXiv:2311.18564
2
citations
#1197

SMGDiff: Soccer Motion Generation using Diffusion Probabilistic Models

Hongdi Yang, Chengyang Li, Zhenxuan Wu et al.

ICCV 2025arXiv:2411.16216
2
citations
#1198

SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications

Yana Hasson, Pauline Luc, Liliane Momeni et al.

ICCV 2025arXiv:2507.03578
2
citations
#1199

Generate, Transduct, Adapt: Iterative Transduction with VLMs

Oindrila Saha, Logan Lawrence, Grant Horn et al.

ICCV 2025arXiv:2501.06031
2
citations
#1200

Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization

Kangle Deng, Hsueh-Ti Derek Liu, Yiheng Zhu et al.

ICCV 2025arXiv:2504.02817
2
citations