Most Cited ICCV "autoregressive multimodal model" Papers

2,701 papers found • Page 7 of 14

#1201

UniConvNet: Expanding Effective Receptive Field while Maintaining Asymptotically Gaussian Distribution for ConvNets of Any Scale

Yuhao Wang, Wei Xi

ICCV 2025arXiv:2508.09000
2
citations
#1202

Supercharging Floorplan Localization with Semantic Rays

Yuval Grader, Hadar Averbuch-Elor

ICCV 2025arXiv:2507.09291
2
citations
#1203

Consistency Trajectory Matching for One-Step Generative Super-Resolution

Weiyi You, Mingyang Zhang, Leheng Zhang et al.

ICCV 2025arXiv:2503.20349
2
citations
#1204

AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?

Shouwei Ruan, Hanqing Liu, Yao Huang et al.

ICCV 2025highlightarXiv:2412.03002
2
citations
#1205

CABLD: Contrast-Agnostic Brain Landmark Detection with Consistency-Based Regularization

Soorena Salari, Arash Harirpoush, Hassan Rivaz et al.

ICCV 2025arXiv:2411.17845
2
citations
#1206

GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects

Yidi Shao, Mu Huang, Chen Change Loy et al.

ICCV 2025arXiv:2412.17804
2
citations
#1207

Learnable Feature Patches and Vectors for Boosting Low-light Image Enhancement without External Knowledge

Xiaogang Xu, Jiafei Wu, Qingsen Yan et al.

ICCV 2025
2
citations
#1208

MGSR: 2D/3D Mutual-boosted Gaussian Splatting for High-fidelity Surface Reconstruction under Various Light Conditions

Qingyuan Zhou, Yuehu Gong, Weidong Yang et al.

ICCV 2025arXiv:2503.05182
2
citations
#1209

CVPT: Cross Visual Prompt Tuning

Lingyun Huang, Jianxu Mao, Junfei YI et al.

ICCV 2025arXiv:2408.14961
2
citations
#1210

Addressing Text Embedding Leakage in Diffusion-based Image Editing

Sunung Mun, Jinhwan Nam, Sunghyun Cho et al.

ICCV 2025arXiv:2412.04715
2
citations
#1211

Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing

Chengxu Liu, Lu Qi, Jinshan Pan et al.

ICCV 2025arXiv:2507.01275
2
citations
#1212

Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models

Vahid Balazadeh, Mohammadmehdi Ataei, Hyunmin Cheong et al.

ICCV 2025arXiv:2412.08619
2
citations
#1213

Leveraging Local Patch Alignment to Seam-cutting for Large Parallax Image Stitching

Tianli Liao, Chenyang Zhao, Lei Li et al.

ICCV 2025arXiv:2311.18564
2
citations
#1214

Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models

Xiao Liang, Di Wang, Zhicheng Jiao et al.

ICCV 2025arXiv:2507.09209
2
citations
#1215

Efficient Multi-Person Motion Prediction by Lightweight Spatial and Temporal Interactions

Yuanhong Zheng, Ruixuan Yu, Jian Sun

ICCV 2025arXiv:2507.09446
2
citations
#1216

CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation

Jianyu Wu, Yizhou Wang, Xiangyu Yue et al.

ICCV 2025arXiv:2504.20830
2
citations
#1217

Bridging 3D Anomaly Localization and Repair via High-Quality Continuous Geometric Representation

Bozhong Zheng, Jinye Gan, Xiaohao Xu et al.

ICCV 2025arXiv:2505.24431
2
citations
#1218

Simultaneous Motion And Noise Estimation with Event Cameras

Shintaro Shiba, Yoshimitsu Aoki, Guillermo Gallego

ICCV 2025arXiv:2504.04029
2
citations
#1219

CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models

Junho Kim, Hyungjin Chung, Byung-Hoon Kim

ICCV 2025arXiv:2411.06869
2
citations
#1220

Seeing and Seeing Through the Glass: Real and Synthetic Data for Multi-Layer Depth Estimation

Hongyu Wen, Yiming Zuo, Venkat Subramanian et al.

ICCV 2025arXiv:2503.11633
2
citations
#1221

Hybrid-grained Feature Aggregation with Coare-to-fine Language Guidance for Self-supervised Monocular Depth Estimation

Wenyao Zhang, Hongsi Liu, Bohan Li et al.

ICCV 2025
2
citations
#1222

Guiding Diffusion-Based Articulated Object Generation by Partial Point Cloud Alignment and Physical Plausibility Constraints

Jens U. Kreber, Joerg Stueckler

ICCV 2025highlightarXiv:2508.00558
2
citations
#1223

Multi-Modal Few-Shot Temporal Action Segmentation

Zijia Lu, Ehsan Elhamifar

ICCV 2025
2
citations
#1224

Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions

Liang Xu, Chengqun Yang, Zili Lin et al.

ICCV 2025arXiv:2508.04681
2
citations
#1225

Describe, Don’t Dictate: Semantic Image Editing with Natural Language Intent

En Ci, Shanyan Guan, Yanhao Ge et al.

ICCV 2025
2
citations
#1226

Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection

Yehao Lu, Minghe Weng, Zekang Xiao et al.

ICCV 2025arXiv:2507.17436
2
citations
#1227

DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image

Jijun Xiang, Xuan Zhu, Xianqi Wang et al.

ICCV 2025arXiv:2504.01596
2
citations
#1228

Online Generic Event Boundary Detection

Hyung Rok Jung, Daneul Kim, Seunggyun Lim et al.

ICCV 2025arXiv:2510.06855
2
citations
#1229

ETA: Energy-based Test-time Adaptation for Depth Completion

Younjoon Chung, Hyoungseob Park, Patrick Rim et al.

ICCV 2025arXiv:2508.05989
2
citations
#1230

SurfaceSplat: Connecting Surface Reconstruction and Gaussian Splatting

Zihui Gao, Jia-Wang Bian, Guosheng Lin et al.

ICCV 2025arXiv:2507.15602
2
citations
#1231

HiGarment: Cross-modal Harmony Based Diffusion Model for Flat Sketch to Realistic Garment Image

Junyi Guo, Jingxuan Zhang, Fangyu Wu et al.

ICCV 2025arXiv:2505.23186
2
citations
#1232

Enhancing Transformers Through Conditioned Embedded Tokens

Hemanth Saratchandran, Simon Lucey

ICCV 2025arXiv:2505.12789
2
citations
#1233

LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders

Ilan Naiman, Emanuel Baruch Baruch, Oron Anschel et al.

ICCV 2025arXiv:2504.03501
2
citations
#1234

CLIP-Adapted Region-to-Text Learning for Generative Open-Vocabulary Semantic Segmentation

Jiannan Ge, Lingxi Xie, Hongtao Xie et al.

ICCV 2025
2
citations
#1235

Generative Modeling of Shape-Dependent Self-Contact Human Poses

Takehiko Ohkawa, Jihyun Lee, Shunsuke Saito et al.

ICCV 2025arXiv:2509.23393
2
citations
#1236

PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions

Mahesh Bhosale, Abdul Wasi, Yuanhao Zhai et al.

ICCV 2025arXiv:2506.23440
2
citations
#1237

Towards Open-World Generation of Stereo Images and Unsupervised Matching

Feng Qiao, Zhexiao Xiong, Eric Xing et al.

ICCV 2025arXiv:2503.12720
2
citations
#1238

CoMoGaussian: Continuous Motion-Aware Gaussian Splatting from Motion-Blurred Images

Jungho Lee, DongHyeong Kim, Dogyoon Lee et al.

ICCV 2025arXiv:2503.05332
2
citations
#1239

Joint Asymmetric Loss for Learning with Noisy Labels

Jialiang Wang, Xianming Liu, Xiong Zhou et al.

ICCV 2025arXiv:2507.17692
2
citations
#1240

JailbreakDiffBench: A Comprehensive Benchmark for Jailbreaking Diffusion Models

Xiaolong Jin, Zixuan Weng, Hanxi Guo et al.

ICCV 2025
2
citations
#1241

SAFER: Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers

Bhavna Gopal, Huanrui Yang, Mark Horton et al.

ICCV 2025arXiv:2501.01529
2
citations
#1242

SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark

Alex Costanzino, Pierluigi Zama Ramirez, Luigi Lella et al.

ICCV 2025arXiv:2506.21549
2
citations
#1243

SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions

Jessica Bader, Leander Girrbach, Stephan Alaniz et al.

ICCV 2025arXiv:2507.23784
2
citations
#1244

Refer to Any Segmentation Mask Group With Vision-Language Prompts

Shengcao Cao, Zijun Wei, Jason Kuen et al.

ICCV 2025arXiv:2506.05342
2
citations
#1245

Noise2Score3D: Tweedie's Approach for Unsupervised Point Cloud Denoising

Xiangbin Wei, Yuanfeng Wang, Ao XU et al.

ICCV 2025arXiv:2503.09283
2
citations
#1246

Triad: Empowering LMM-based Anomaly Detection with Expert-guided Region-of-Interest Tokenizer and Manufacturing Process

Yuanze Li, Shihao Yuan, Haolin Wang et al.

ICCV 2025
2
citations
#1247

Multimodal Prompt Alignment for Facial Expression Recognition

Fuyan Ma, Yiran He, Bin Sun et al.

ICCV 2025arXiv:2506.21017
2
citations
#1248

RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction

Johannes Künzel, Anna Hilsmann, Peter Eisert

ICCV 2025arXiv:2507.04839
2
citations
#1249

InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes

Zesong Yang, Bangbang Yang, Wenqi Dong et al.

ICCV 2025arXiv:2507.08416
2
citations
#1250

VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning

Jinglei Zhang, Yuanfan Guo, Rolandos Alexandros Potamias et al.

ICCV 2025arXiv:2510.14672
2
citations
#1251

Everything is a Video: Unifying Modalities through Next-Frame Prediction

G Thomas Hudson, Dean Slack, Thomas Winterbottom et al.

ICCV 2025arXiv:2411.10503
2
citations
#1252

What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

Xavier Thomas, Deepti Ghadiyaram

ICCV 2025arXiv:2503.06698
2
citations
#1253

LiON-LoRA: Rethinking LoRA Fusion to Unify Controllable Spatial and Temporal Generation for Video Diffusion

Yisu Zhang, Chenjie Cao, Chaohui Yu et al.

ICCV 2025arXiv:2507.05678
2
citations
#1254

CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling

Trong-Thang Pham, AKASH AWASTHI, Saba Khan et al.

ICCV 2025highlightarXiv:2507.12591
2
citations
#1255

EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba

Quang Nguyen, Nhat Le, Baoru Huang et al.

ICCV 2025arXiv:2508.10522
2
citations
#1256

Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection

Ji Du, Xin WANG, Fangwei Hao et al.

ICCV 2025arXiv:2510.18437
2
citations
#1257

Advancing Visual Large Language Model for Multi-granular Versatile Perception

Wentao Xiang, Haoxian Tan, Cong Wei et al.

ICCV 2025arXiv:2507.16213
2
citations
#1258

Kaputt: A Large-Scale Dataset for Visual Defect Detection

Sebastian Höfer, Dorian Henning, Artemij Amiranashvili et al.

ICCV 2025arXiv:2510.05903
2
citations
#1259

Free-running vs Synchronous: Single-Photon Lidar for High-flux 3D Imaging

Ruangrawee Kitichotkul, Shashwath Bharadwaj, Joshua Rapp et al.

ICCV 2025arXiv:2507.09386
2
citations
#1260

ODP-Bench: Benchmarking Out-of-Distribution Performance Prediction

Han Yu, Kehan Li, Dongbai Li et al.

ICCV 2025arXiv:2510.27263
2
citations
#1261

DNF-Intrinsic: Deterministic Noise-Free Diffusion for Indoor Inverse Rendering

Rongjia Zheng, Qing Zhang, Chengjiang Long et al.

ICCV 2025arXiv:2507.03924
2
citations
#1262

Stylized-Face: A Million-level Stylized Face Dataset for Face Recognition

Zhengyuan Peng, Jianqing Xu, Yuge Huang et al.

ICCV 2025
2
citations
#1263

Degradation-Modeled Multipath Diffusion for Tunable Metalens Photography

Jianing Zhang, Jiayi Zhu, Feiyu Ji et al.

ICCV 2025highlightarXiv:2506.22753
2
citations
#1264

Alleviating Textual Reliance in Medical Language-guided Segmentation via Prototype-driven Semantic Approximation

Shuchang Ye, Usman Naseem, Mingyuan Meng et al.

ICCV 2025arXiv:2507.11055
2
citations
#1265

Generate, Transduct, Adapt: Iterative Transduction with VLMs

Oindrila Saha, Logan Lawrence, Grant Horn et al.

ICCV 2025arXiv:2501.06031
2
citations
#1266

Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion

Yidi Liu, Dong Li, Yuxin Ma et al.

ICCV 2025arXiv:2503.12764
2
citations
#1267

LookOut: Real-World Humanoid Egocentric Navigation

Boxiao Pan, Adam Harley, Francis Engelmann et al.

ICCV 2025arXiv:2508.14466
2
citations
#1268

PseudoMapTrainer: Learning Online Mapping without HD Maps

Christian Löwens, Thorben Funke, Jingchao Xie et al.

ICCV 2025arXiv:2508.18788
2
citations
#1269

DIP: Unsupervised Dense In-Context Post-training of Visual Representations

Sophia Sirko-Galouchenko, Spyros Gidaris, Antonin Vobecky et al.

ICCV 2025arXiv:2506.18463
2
citations
#1270

MAVias: Mitigate any Visual Bias

Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos et al.

ICCV 2025arXiv:2412.06632
2
citations
#1271

Evading Data Provenance in Deep Neural Networks

Hongyu Zhu, Sichu Liang, Wenwen Wang et al.

ICCV 2025highlightarXiv:2508.01074
2
citations
#1272

MoSiC: Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning

Mohammadreza Salehi, Shashanka Venkataramanan, Ioana Simion et al.

ICCV 2025arXiv:2506.08694
2
citations
#1273

MMOne: Representing Multiple Modalities in One Scene

Zhifeng Gu, Bing WANG

ICCV 2025arXiv:2507.11129
2
citations
#1274

DictAS: A Framework for Class-Generalizable Few-Shot Anomaly Segmentation via Dictionary Lookup

Zhen Qu, Xian Tao, Xinyi Gong et al.

ICCV 2025arXiv:2508.13560
2
citations
#1275

SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting

Shengjie Lin, Jiading Fang, Muhammad Zubair Irshad et al.

ICCV 2025arXiv:2506.03594
2
citations
#1276

Web Artifact Attacks Disrupt Vision Language Models

Maan Qraitem, Piotr Teterwak, Kate Saenko et al.

ICCV 2025arXiv:2503.13652
2
citations
#1277

Fast Globally Optimal and Geometrically Consistent 3D Shape Matching

Paul Roetzer, Florian Bernard

ICCV 2025highlightarXiv:2504.06385
2
citations
#1278

DALIP: Distribution Alignment-based Language-Image Pre-Training for Domain-Specific Data

Junjie Wu, Jiangtao Xie, Zhaolin Zhang et al.

ICCV 2025arXiv:2504.01386
2
citations
#1279

MixA-Q: Revisiting Activation Sparsity for Vision Transformers from a Mixed-Precision Quantization Perspective

Weitian Wang, Shubham rai, Cecilia De la Parra et al.

ICCV 2025arXiv:2507.19131
2
citations
#1280

Consensus-Driven Active Model Selection

Justin Kay, Grant Horn, Subhransu Maji et al.

ICCV 2025highlightarXiv:2507.23771
2
citations
#1281

SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation

Hao Ban, Gokul Ram Subramani, Kaiyi Ji

ICCV 2025arXiv:2507.07883
2
citations
#1282

CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective

Zongheng Tang, Yi Liu, Yifan Sun et al.

ICCV 2025highlightarXiv:2508.00359
2
citations
#1283

CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation

Xiao Lin, Yun Peng, Liuyi Wang et al.

ICCV 2025arXiv:2502.01312
2
citations
#1284

OpenAnimals: Revisiting Person Re-Identification for Animals Towards Better Generalization

Saihui Hou, Panjian Huang, Zengbin Wang et al.

ICCV 2025arXiv:2410.00204
2
citations
#1285

LaRender: Training-Free Occlusion Control in Image Generation via Latent Rendering

Xiaohang Zhan, Dingming Liu

ICCV 2025arXiv:2508.07647
2
citations
#1286

Generalized Few-Shot Point Cloud Segmentation via LLM-Assisted Hyper-Relation Matching

Zhaoyang Li, Yuan Wang, Guoxin Xiong et al.

ICCV 2025
2
citations
#1287

SketchSplat: 3D Edge Reconstruction via Differentiable Multi-view Sketch Splatting

Haiyang Ying, Matthias Zwicker

ICCV 2025arXiv:2503.14786
2
citations
#1288

SG-LDM: Semantic-Guided LiDAR Generation via Latent-Aligned Diffusion

Zhengkang Xiang, Zizhao Li, Amir Khodabandeh et al.

ICCV 2025arXiv:2506.23606
2
citations
#1289

PRISM: Reducing Spurious Implicit Biases in Vision-Language Models with LLM-Guided Embedding Projection

Mahdiyar Molahasani, Azadeh Motamedi, Michael Greenspan et al.

ICCV 2025arXiv:2507.08979
2
citations
#1290

Sim-DETR: Unlock DETR for Temporal Sentence Grounding

Jiajin Tang, Zhengxuan Wei, Yuchen Zhu et al.

ICCV 2025arXiv:2509.23867
2
citations
#1291

ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis

Onkar Susladkar, Gayatri Deshmukh, Yalcin Tur et al.

ICCV 2025arXiv:2505.04963
2
citations
#1292

Selective Contrastive Learning for Weakly Supervised Affordance Grounding

WonJun Moon, Hyun Seok Seong, Jae-Pil Heo

ICCV 2025arXiv:2508.07877
2
citations
#1293

CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting

Lei Tian, Xiaomin Li, Liqian Ma et al.

ICCV 2025arXiv:2505.20469
2
citations
#1294

Generative Adversarial Diffusion

U-Chae Jun, Jaeeun Ko, Jiwoo Kang

ICCV 2025
2
citations
#1295

Monocular Facial Appearance Capture in the Wild

Yingyan Xu, Kate Gadola, Prashanth Chandran et al.

ICCV 2025arXiv:2412.12765
2
citations
#1296

Intrepretable Zero-Shot Learning with Locally-Aligned Vision-Language Model

Shiming Chen, Bowen Duan, Salman Khan et al.

ICCV 2025
2
citations
#1297

AMD: Adaptive Momentum and Decoupled Contrastive Learning Framework for Robust Long-Tail Trajectory Prediction

Bin Rao, Haicheng Liao, Yanchen Guan et al.

ICCV 2025arXiv:2507.01801
2
citations
#1298

Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention

Jiawei Gu, Ziyue Qiao, Zechao Li

ICCV 2025arXiv:2507.01417
2
citations
#1299

Deep Incomplete Multi-view Clustering with Distribution Dual-Consistency Recovery Guidance

Jiaqi Jin, Siwei Wang, Zhibin Dong et al.

ICCV 2025arXiv:2503.11017
2
citations
#1300

ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning

Zhengzhuo Xu, Sinan Du, Yiyan Qi et al.

ICCV 2025arXiv:2512.00305
2
citations
#1301

Learning Visual Hierarchies in Hyperbolic Space for Image Retrieval

Ziwei Wang, Sameera Ramasinghe, Chenchen Xu et al.

ICCV 2025arXiv:2411.17490
2
citations
#1302

Improving Noise Efficiency in Privacy-preserving Dataset Distillation

Runkai Zheng, Vishnu Dasu, Yinong Wang et al.

ICCV 2025arXiv:2508.01749
2
citations
#1303

D2ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition

Wenjie Pei, Qizhong Tan, Guangming Lu et al.

ICCV 2025
2
citations
#1304

DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing

Yang JingYi, Xun Lin, Zitong YU et al.

ICCV 2025arXiv:2503.00429
2
citations
#1305

Quantifying and Narrowing the Unknown: Interactive Text-to-Video Retrieval via Uncertainty Minimization

Bingqing Zhang, Zhuo Cao, Heming Du et al.

ICCV 2025arXiv:2507.15504
2
citations
#1306

Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment

Kejia Zhang, Juanjuan Weng, Zhiming Luo et al.

ICCV 2025arXiv:2408.06079
2
citations
#1307

Federated Domain Generalization with Domain-specific Soft Prompts Generation

Jianhan Wu, Xiaoyang Qu, Zhangcheng Huang et al.

ICCV 2025arXiv:2509.20807
2
citations
#1308

Boosting Vision Semantic Density with Anatomy Normality Modeling for Medical Vision-language Pre-training

Weiwei Cao, Jianpeng Zhang, Zhongyi Shui et al.

ICCV 2025arXiv:2508.03742
2
citations
#1309

Timestep-Aware Diffusion Model for Extreme Image Rescaling

Ce Wang, Zhenyu Hu, Wanjie Sun et al.

ICCV 2025arXiv:2408.09151
2
citations
#1310

Adversarial Attention Perturbations for Large Object Detection Transformers

Zachary Yahn, Selim Tekin, Fatih Ilhan et al.

ICCV 2025arXiv:2508.02987
2
citations
#1311

Global Regulation and Excitation via Attention Tuning for Stereo Matching

Jiahao LI, Xinhong Chen, Zhengmin JIANG et al.

ICCV 2025arXiv:2509.15891
2
citations
#1312

Continual Multiple Instance Learning with Enhanced Localization for Histopathological Whole Slide Image Analysis

Byung Hyun Lee, Wongi Jeong, Woojae Han et al.

ICCV 2025arXiv:2507.02395
2
citations
#1313

MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics

Bowei Guo, Shengkun Tang, Cong Zeng et al.

ICCV 2025arXiv:2510.11962
2
citations
#1314

Cross-Architecture Distillation Made Simple with Redundancy Suppression

Weijia Zhang, Yuehao Liu, Wu Ran et al.

ICCV 2025highlightarXiv:2507.21844
2
citations
#1315

Text2VDM: Text to Vector Displacement Maps for Expressive and Interactive 3D Sculpting

Hengyu Meng, Duotun Wang, Zhijing Shao et al.

ICCV 2025arXiv:2502.20045
2
citations
#1316

Diffusion Image Prior

Hamadi Chihaoui, Paolo Favaro

ICCV 2025arXiv:2503.21410
2
citations
#1317

MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers

Yang Tian, Zheng Lu, Mingqi Gao et al.

ICCV 2025arXiv:2503.16856
2
citations
#1318

BézierGS: Dynamic Urban Scene Reconstruction with Bézier Curve Gaussian Splatting

Zipei Ma, Junzhe Jiang, Yurui Chen et al.

ICCV 2025arXiv:2506.22099
2
citations
#1319

Backdoor Attacks on Neural Networks via One-Bit Flip

Xiang Li, Lannan Luo, Qiang Zeng

ICCV 2025
2
citations
#1320

GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion

Li-Heng Chen, Zi-Xin Zou, Chang Liu et al.

ICCV 2025arXiv:2503.22349
2
citations
#1321

Differentiable Room Acoustic Rendering with Multi-View Vision Priors

Derong Jin, Ruohan Gao

ICCV 2025arXiv:2504.21847
2
citations
#1322

FICGen: Frequency-Inspired Contextual Disentanglement for Layout-driven Degraded Image Generation

Wenzhuang Wang, Yifan Zhao, Mingcan Ma et al.

ICCV 2025arXiv:2509.01107
2
citations
#1323

Trade-offs in Image Generation: How Do Different Dimensions Interact?

Sicheng Zhang, Binzhu Xie, Zhonghao Yan et al.

ICCV 2025arXiv:2507.22100
2
citations
#1324

HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction for Large-Scale Aerial Scenes

Mai Su, Zhongtao Wang, Huishan Au et al.

ICCV 2025arXiv:2504.16606
2
citations
#1325

Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration

Katie Luo, Minh-Quan Dao, Zhenzhen Liu et al.

ICCV 2025arXiv:2502.14156
2
citations
#1326

LOTA: Bit-Planes Guided AI-Generated Image Detection

Renxi Cheng, Hongsong Wang, Yang Zhang et al.

ICCV 2025arXiv:2510.14230
2
citations
#1327

PAN-Crafter: Learning Modality-Consistent Alignment for PAN-Sharpening

Jeonghyeok Do, Sungpyo Kim, Geunhyuk Youk et al.

ICCV 2025arXiv:2505.23367
2
citations
#1328

GLEAM: Enhanced Transferable Adversarial Attacks for Vision-Language Pre-training Models via Global-Local Transformations

Yunqi Liu, Xiaohui Cui, Ouyang Xue

ICCV 2025
2
citations
#1329

Progressive Growing of Video Tokenizers for Temporally Compact Latent Spaces

Aniruddha Mahapatra, Long Mai, David Bourgin et al.

ICCV 2025arXiv:2501.05442
2
citations
#1330

Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models

Hyungjin Kim, Seokho Ahn, Young-Duk Seo

ICCV 2025arXiv:2508.03481
2
citations
#1331

Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation

Yong Liu, Song-Li Wu, Sule Bai et al.

ICCV 2025arXiv:2506.16058
2
citations
#1332

HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding

JIAHE ZHAO, RuiBing Hou, zejie tian et al.

ICCV 2025arXiv:2503.12955
2
citations
#1333

Holistic Tokenizer for Autoregressive Image Generation

Anlin Zheng, Haochen Wang, Yucheng Zhao et al.

ICCV 2025arXiv:2507.02358
2
citations
#1334

PLA: Prompt Learning Attack against Text-to-Image Generative Models

XINQI LYU, Yihao LIU, Yanjie Li et al.

ICCV 2025arXiv:2508.03696
2
citations
#1335

HouseTour: A Virtual Real Estate A(I)gent

Ata Çelen, Iro Armeni, Daniel Barath et al.

ICCV 2025arXiv:2510.18054
2
citations
#1336

Retinex-MEF: Retinex-based Glare Effects Aware Unsupervised Multi-Exposure Image Fusion

Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.

ICCV 2025arXiv:2503.07235
2
citations
#1337

ViLU: Learning Vision-Language Uncertainties for Failure Prediction

Marc Lafon, Yannis Karmim, Julio Silva-Rodríguez et al.

ICCV 2025arXiv:2507.07620
2
citations
#1338

Subjective Camera 1.0: Bridging Human Cognition and Visual Reconstruction through Sequence-Aware Sketch-Guided Diffusion

Haoyang Chen, Dongfang Sun, Caoyuan Ma et al.

ICCV 2025arXiv:2506.23711
2
citations
#1339

Adapting In-Domain Few-Shot Segmentation to New Domains without Source Domain Retraining

Qi Fan, Kaiqi Liu, Nian Liu et al.

ICCV 2025arXiv:2504.21414
2
citations
#1340

FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models

Mainak Singha, Subhankar Roy, Sarthak Mehrotra et al.

ICCV 2025arXiv:2504.20860
2
citations
#1341

Demeter: A Parametric Model of Crop Plant Morphology from the Real World

Tianhang Cheng, Albert Zhai, Evan Chen et al.

ICCV 2025arXiv:2510.16377
2
citations
#1342

DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models

Revant Teotia, Candace Ross, Karen Ullrich et al.

ICCV 2025arXiv:2506.05108
2
citations
#1343

MoGA: 3D Generative Avatar Prior for Monocular Gaussian Avatar Reconstruction

Zijian Dong, Longteng Duan, Jie Song et al.

ICCV 2025highlightarXiv:2507.23597
2
citations
#1344

Improving Rectified Flow with Boundary Conditions

Xixi Hu, Runlong Liao, Bo Liu et al.

ICCV 2025arXiv:2506.15864
2
citations
#1345

InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

Tao Han, Wanghan Xu, Junchao Gong et al.

ICCV 2025arXiv:2509.10441
2
citations
#1346

What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models

Lorenzo Baraldi, Davide Bucciarelli, Federico Betti et al.

ICCV 2025arXiv:2505.20405
2
citations
#1347

Stable Score Distillation

Haiming Zhu, Yangyang Xu, Chenshu Xu et al.

ICCV 2025arXiv:2507.09168
2
citations
#1348

Aligning Constraint Generation with Design Intent in Parametric CAD

Evan Casey, Tianyu Zhang, Shu Ishida et al.

ICCV 2025arXiv:2504.13178
2
citations
#1349

Supercharged One-step Text-to-Image Diffusion Models with Negative Prompts

Viet Nguyen, Anh Nguyen, Trung Dao et al.

ICCV 2025arXiv:2412.02687
2
citations
#1350

MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation

Yanchen Liu, Yanan SUN, Zhening Xing et al.

ICCV 2025arXiv:2507.16310
2
citations
#1351

Denoising Token Prediction in Masked Autoregressive Models

Ting Yao, Yehao Li, Yingwei Pan et al.

ICCV 2025
2
citations
#1352

Balanced Sharpness-Aware Minimization for Imbalanced Regression

Yahao Liu, Qin Wang, Lixin Duan et al.

ICCV 2025arXiv:2508.16973
2
citations
#1353

Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views

Xiangdong Zhang, Shaofeng Zhang, Junchi Yan

ICCV 2025arXiv:2509.01250
2
citations
#1354

DOGR: Towards Versatile Visual Document Grounding and Referring

Yinan Zhou, Yuxin Chen, Haokun Lin et al.

ICCV 2025arXiv:2411.17125
2
citations
#1355

SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting

Paschalis Giakoumoglou, Dimitrios Karageorgiou, Symeon Papadopoulos et al.

ICCV 2025arXiv:2502.06593
2
citations
#1356

ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition

Sanjoy Kundu, Shanmukha Vellamcheti, Sathyanarayanan Aakur

ICCV 2025arXiv:2504.03948
2
citations
#1357

MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion

Fei Peng, Junqiang Wu, Yan Li et al.

ICCV 2025arXiv:2508.14440
2
citations
#1358

StyleKeeper: Prevent Content Leakage using Negative Visual Query Guidance

Jaeseok Jeong, Junho Kim, Youngjung Uh et al.

ICCV 2025arXiv:2510.06827
2
citations
#1359

Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data

Zeyi Sun, Tong Wu, Pan Zhang et al.

ICCV 2025arXiv:2406.00093
2
citations
#1360

RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions

Bimsara Pathiraja, Maitreya Patel, Shivam Singh et al.

ICCV 2025arXiv:2506.03448
2
citations
#1361

SPADE: Spatial-Aware Denoising Network for Open-vocabulary Panoptic Scene Graph Generation with Long- and Local-range Context Reasoning

XIN Hu, Ke Qin, Guiduo Duan et al.

ICCV 2025arXiv:2507.05798
2
citations
#1362

CAFA: a Controllable Automatic Foley Artist

Roi Benita, Michael Finkelson, Tavi Halperin et al.

ICCV 2025arXiv:2504.06778
2
citations
#1363

Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion

Enyu Liu, En Yu, Sijia Chen et al.

ICCV 2025arXiv:2507.08555
2
citations
#1364

LINR-PCGC: Lossless Implicit Neural Representations for Point Cloud Geometry Compression

Wenjie Huang, Qi Yang, Shuting Xia et al.

ICCV 2025arXiv:2507.15686
2
citations
#1365

Beyond Simple Edits: Composed Video Retrieval with Dense Modifications

Omkar Thawakar, Dmitry Demidov, Ritesh Thawkar et al.

ICCV 2025arXiv:2508.14039
2
citations
#1366

Prior2Former - Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation

Sebastian Schmidt, Julius Koerner, Dominik Fuchsgruber et al.

ICCV 2025highlightarXiv:2504.04841
2
citations
#1367

Trust but Verify: Programmatic VLM Evaluation in the Wild

Viraj Prabhu, Senthil Purushwalkam, An Yan et al.

ICCV 2025arXiv:2410.13121
2
citations
#1368

SDMatte: Grafting Diffusion Models for Interactive Matting

Longfei Huang, Yu Liang, Hao Zhang et al.

ICCV 2025arXiv:2508.00443
2
citations
#1369

Representing 3D Shapes With 64 Latent Vectors for 3D Diffusion Models

In Cho, Youngbeom Yoo, Subin Jeon et al.

ICCV 2025arXiv:2503.08737
2
citations
#1370

Robust 3D Object Detection using Probabilistic Point Clouds from Single-Photon LiDARs

Bhavya Goyal, Felipe Gutierrez-Barragan, Wei Lin et al.

ICCV 2025arXiv:2508.00169
2
citations
#1371

GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding

Zijun Lin, Shuting He, Cheston Tan et al.

ICCV 2025arXiv:2506.21188
2
citations
#1372

Understanding Co-speech Gestures in-the-wild

Sindhu Hegde, K R Prajwal, Taein Kwon et al.

ICCV 2025arXiv:2503.22668
2
citations
#1373

AnyPortal: Zero-Shot Consistent Video Background Replacement

Wenshuo Gao, Xicheng Lan, Shuai Yang

ICCV 2025arXiv:2509.07472
2
citations
#1374

MatchDiffusion: Training-free Generation of Match-Cuts

Alejandro Pardo, Fabio Pizzati, Tong Zhang et al.

ICCV 2025arXiv:2411.18677
2
citations
#1375

IntroStyle: Training-Free Introspective Style Attribution using Diffusion Features

Anand Kumar, Jiteng Mu, Nuno Vasconcelos

ICCV 2025arXiv:2412.14432
2
citations
#1376

FedPall: Prototype-based Adversarial and Collaborative Learning for Federated Learning with Feature Drift

yong zhang, Feng Liang, Guanghu Yuan et al.

ICCV 2025arXiv:2507.04781
2
citations
#1377

SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation

Shiqi Huang, Shuting He, Huaiyuan Qin et al.

ICCV 2025highlightarXiv:2507.12857
2
citations
#1378

PASTA: Part-Aware Sketch-to-3D Shape Generation with Text-Aligned Prior

Seunggwan Lee, Hwanhee Jung, ByoungSoo Koh et al.

ICCV 2025arXiv:2503.12834
2
citations
#1379

LUT-Fuse: Towards Extremely Fast Infrared and Visible Image Fusion via Distillation to Learnable Look-Up Tables

Xunpeng Yi, yibing zhang, Xinyu Xiang et al.

ICCV 2025arXiv:2509.00346
2
citations
#1380

An Inversion-based Measure of Memorization for Diffusion Models

Zhe Ma, Qingming Li, Xuhong Zhang et al.

ICCV 2025arXiv:2405.05846
2
citations
#1381

Learning to See in the Extremely Dark

Hai Jiang, Binhao Guan, Zhen Liu et al.

ICCV 2025arXiv:2506.21132
1
citations
#1382

Principles of Visual Tokens for Efficient Video Understanding

Xinyue Hao, Li, Shreyank Gowda et al.

ICCV 2025arXiv:2411.13626
1
citations
#1383

Aligning Moments in Time using Video Queries

Yogesh Kumar, Uday Agarwal, Manish Gupta et al.

ICCV 2025arXiv:2508.15439
1
citations
#1384

ConsNoTrainLoRA: Data-driven Weight Initialization of Low-rank Adapters using Constraints

Debasmit Das, Hyoungwoo Park, Munawar Hayat et al.

ICCV 2025arXiv:2507.08044
1
citations
#1385

Correspondence-Free Fast and Robust Spherical Point Pattern Registration

Anik Sarker, Alan Asbeck

ICCV 2025arXiv:2508.02339
1
citations
#1386

Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning

Junjie Shan, Ziqi Zhao, Jialin Lu et al.

ICCV 2025arXiv:2411.14937
1
citations
#1387

TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes

Yan Xia, Yunxiang Lu, Rui Song et al.

ICCV 2025arXiv:2412.10308
1
citations
#1388

ContraGS: Codebook-Condensed and Trainable Gaussian Splatting for Fast, Memory-Efficient Reconstruction

Sankeerth Durvasula, Sharanshangar Muhunthan, Zain Moustafa et al.

ICCV 2025arXiv:2509.03775
1
citations
#1389

Fewer Denoising Steps or Cheaper Per-Step Inference: Towards Compute-Optimal Diffusion Model Deployment

Zhenbang Du, Yonggan Fu, Lifu Wang et al.

ICCV 2025arXiv:2508.06160
1
citations
#1390

Diff2I2P: Differentiable Image-to-Point Cloud Registration with Diffusion Prior

Juncheng Mu, Chengwei REN, Weixiang Zhang et al.

ICCV 2025
1
citations
#1391

Removing Cost Volumes from Optical Flow Estimators

Simon Kiefhaber, Stefan Roth, Simone Schaub-Meyer

ICCV 2025arXiv:2510.13317
1
citations
#1392

CA-I2P: Channel-Adaptive Registration Network with Global Optimal Selection

Zhixin Cheng, Jiacheng Deng, Xinjun Li et al.

ICCV 2025arXiv:2506.21364
1
citations
#1393

D-Attn: Decomposed Attention for Large Vision-and-Language Model

Chia-Wen Kuo, Sijie Zhu, Fan Chen et al.

ICCV 2025arXiv:2502.01906
1
citations
#1394

ImHead: A Large-scale Implicit Morphable Model for Localized Head Modeling

Rolandos Alexandros Potamias, Stathis Galanakis, Jiankang Deng et al.

ICCV 2025arXiv:2510.10793
1
citations
#1395

VRM: Knowledge Distillation via Virtual Relation Matching

Weijia Zhang, Fei Xie, Weidong Cai et al.

ICCV 2025highlightarXiv:2502.20760
1
citations
#1396

Normal and Abnormal Pathology Knowledge-Augmented Vision-Language Model for Anomaly Detection in Pathology Images

Jinsol Song, Jiamu Wang, Anh Nguyen et al.

ICCV 2025arXiv:2508.15256
1
citations
#1397

Augmenting Moment Retrieval: Zero-Dependency Two-Stage Learning

Zhengxuan Wei, Jiajin Tang, Sibei Yang

ICCV 2025arXiv:2510.19622
1
citations
#1398

Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning

Zedong Wang, Siyuan Li, Dan Xu

ICCV 2025highlightarXiv:2507.21049
1
citations
#1399

Activation Subspaces for Out-of-Distribution Detection

Barış Zöngür, Robin Hesse, Stefan Roth

ICCV 2025arXiv:2508.21695
1
citations
#1400

IRGPT: Understanding Real-world Infrared Image with Bi-cross-modal Curriculum on Large-scale Benchmark

Zhe Cao, Jin Zhang, Ruiheng Zhang

ICCV 2025arXiv:2507.14449
1
citations