Most Cited ICCV "autoregressive multimodal model" Papers

2,701 papers found • Page 7 of 14

Filters:Most Cited ICCV autoregressive multimodal model Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#1201

UniConvNet: Expanding Effective Receptive Field while Maintaining Asymptotically Gaussian Distribution for ConvNets of Any Scale

Yuhao Wang, Wei Xi

ICCV 2025arXiv:2508.09000

citations

#1202

Supercharging Floorplan Localization with Semantic Rays

Yuval Grader, Hadar Averbuch-Elor

ICCV 2025arXiv:2507.09291

citations

#1203

Consistency Trajectory Matching for One-Step Generative Super-Resolution

Weiyi You, Mingyang Zhang, Leheng Zhang et al.

ICCV 2025arXiv:2503.20349

citations

#1204

AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?

Shouwei Ruan, Hanqing Liu, Yao Huang et al.

ICCV 2025highlightarXiv:2412.03002

citations

#1205

CABLD: Contrast-Agnostic Brain Landmark Detection with Consistency-Based Regularization

Soorena Salari, Arash Harirpoush, Hassan Rivaz et al.

ICCV 2025arXiv:2411.17845

citations

#1206

GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects

Yidi Shao, Mu Huang, Chen Change Loy et al.

ICCV 2025arXiv:2412.17804

citations

#1207

Learnable Feature Patches and Vectors for Boosting Low-light Image Enhancement without External Knowledge

Xiaogang Xu, Jiafei Wu, Qingsen Yan et al.

ICCV 2025

citations

#1208

MGSR: 2D/3D Mutual-boosted Gaussian Splatting for High-fidelity Surface Reconstruction under Various Light Conditions

Qingyuan Zhou, Yuehu Gong, Weidong Yang et al.

ICCV 2025arXiv:2503.05182

citations

#1209

CVPT: Cross Visual Prompt Tuning

Lingyun Huang, Jianxu Mao, Junfei YI et al.

ICCV 2025arXiv:2408.14961

citations

#1210

Addressing Text Embedding Leakage in Diffusion-based Image Editing

Sunung Mun, Jinhwan Nam, Sunghyun Cho et al.

ICCV 2025arXiv:2412.04715

citations

#1211

Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing

Chengxu Liu, Lu Qi, Jinshan Pan et al.

ICCV 2025arXiv:2507.01275

citations

#1212

Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models

Vahid Balazadeh, Mohammadmehdi Ataei, Hyunmin Cheong et al.

ICCV 2025arXiv:2412.08619

citations

#1213

Leveraging Local Patch Alignment to Seam-cutting for Large Parallax Image Stitching

Tianli Liao, Chenyang Zhao, Lei Li et al.

ICCV 2025arXiv:2311.18564

citations

#1214

Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models

Xiao Liang, Di Wang, Zhicheng Jiao et al.

ICCV 2025arXiv:2507.09209

citations

#1215

Efficient Multi-Person Motion Prediction by Lightweight Spatial and Temporal Interactions

Yuanhong Zheng, Ruixuan Yu, Jian Sun

ICCV 2025arXiv:2507.09446

citations

#1216

CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation

Jianyu Wu, Yizhou Wang, Xiangyu Yue et al.

ICCV 2025arXiv:2504.20830

citations

#1217

Bridging 3D Anomaly Localization and Repair via High-Quality Continuous Geometric Representation

Bozhong Zheng, Jinye Gan, Xiaohao Xu et al.

ICCV 2025arXiv:2505.24431

citations

#1218

Simultaneous Motion And Noise Estimation with Event Cameras

Shintaro Shiba, Yoshimitsu Aoki, Guillermo Gallego

ICCV 2025arXiv:2504.04029

citations

#1219

CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models

Junho Kim, Hyungjin Chung, Byung-Hoon Kim

ICCV 2025arXiv:2411.06869

citations

#1220

Seeing and Seeing Through the Glass: Real and Synthetic Data for Multi-Layer Depth Estimation

Hongyu Wen, Yiming Zuo, Venkat Subramanian et al.

ICCV 2025arXiv:2503.11633

citations

#1221

Hybrid-grained Feature Aggregation with Coare-to-fine Language Guidance for Self-supervised Monocular Depth Estimation

Wenyao Zhang, Hongsi Liu, Bohan Li et al.

ICCV 2025

citations

#1222

Guiding Diffusion-Based Articulated Object Generation by Partial Point Cloud Alignment and Physical Plausibility Constraints

Jens U. Kreber, Joerg Stueckler

ICCV 2025highlightarXiv:2508.00558

citations

#1223

Multi-Modal Few-Shot Temporal Action Segmentation

Zijia Lu, Ehsan Elhamifar

ICCV 2025

citations

#1224

Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions

Liang Xu, Chengqun Yang, Zili Lin et al.

ICCV 2025arXiv:2508.04681

citations

#1225

Describe, Don’t Dictate: Semantic Image Editing with Natural Language Intent

En Ci, Shanyan Guan, Yanhao Ge et al.

ICCV 2025

citations

#1226

Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection

Yehao Lu, Minghe Weng, Zekang Xiao et al.

ICCV 2025arXiv:2507.17436

citations

#1227

DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image

Jijun Xiang, Xuan Zhu, Xianqi Wang et al.

ICCV 2025arXiv:2504.01596

citations

#1228

Online Generic Event Boundary Detection

Hyung Rok Jung, Daneul Kim, Seunggyun Lim et al.

ICCV 2025arXiv:2510.06855

citations

#1229

ETA: Energy-based Test-time Adaptation for Depth Completion

Younjoon Chung, Hyoungseob Park, Patrick Rim et al.

ICCV 2025arXiv:2508.05989

citations

#1230

SurfaceSplat: Connecting Surface Reconstruction and Gaussian Splatting

Zihui Gao, Jia-Wang Bian, Guosheng Lin et al.

ICCV 2025arXiv:2507.15602

citations

#1231

HiGarment: Cross-modal Harmony Based Diffusion Model for Flat Sketch to Realistic Garment Image

Junyi Guo, Jingxuan Zhang, Fangyu Wu et al.

ICCV 2025arXiv:2505.23186

citations

#1232

Enhancing Transformers Through Conditioned Embedded Tokens

Hemanth Saratchandran, Simon Lucey

ICCV 2025arXiv:2505.12789

citations

#1233

LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders

Ilan Naiman, Emanuel Baruch Baruch, Oron Anschel et al.

ICCV 2025arXiv:2504.03501

citations

#1234

CLIP-Adapted Region-to-Text Learning for Generative Open-Vocabulary Semantic Segmentation

Jiannan Ge, Lingxi Xie, Hongtao Xie et al.

ICCV 2025

citations

#1235

Generative Modeling of Shape-Dependent Self-Contact Human Poses

Takehiko Ohkawa, Jihyun Lee, Shunsuke Saito et al.

ICCV 2025arXiv:2509.23393

citations

#1236

PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions

Mahesh Bhosale, Abdul Wasi, Yuanhao Zhai et al.

ICCV 2025arXiv:2506.23440

citations

#1237

Towards Open-World Generation of Stereo Images and Unsupervised Matching

Feng Qiao, Zhexiao Xiong, Eric Xing et al.

ICCV 2025arXiv:2503.12720

citations

#1238

CoMoGaussian: Continuous Motion-Aware Gaussian Splatting from Motion-Blurred Images

Jungho Lee, DongHyeong Kim, Dogyoon Lee et al.

ICCV 2025arXiv:2503.05332

citations

#1239

Joint Asymmetric Loss for Learning with Noisy Labels

Jialiang Wang, Xianming Liu, Xiong Zhou et al.

ICCV 2025arXiv:2507.17692

citations

#1240

JailbreakDiffBench: A Comprehensive Benchmark for Jailbreaking Diffusion Models

Xiaolong Jin, Zixuan Weng, Hanxi Guo et al.

ICCV 2025

citations

#1241

SAFER: Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers

Bhavna Gopal, Huanrui Yang, Mark Horton et al.

ICCV 2025arXiv:2501.01529

citations

#1242

SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark

Alex Costanzino, Pierluigi Zama Ramirez, Luigi Lella et al.

ICCV 2025arXiv:2506.21549

citations

#1243

SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions

Jessica Bader, Leander Girrbach, Stephan Alaniz et al.

ICCV 2025arXiv:2507.23784

citations

#1244

Refer to Any Segmentation Mask Group With Vision-Language Prompts

Shengcao Cao, Zijun Wei, Jason Kuen et al.

ICCV 2025arXiv:2506.05342

citations

#1245

Noise2Score3D: Tweedie's Approach for Unsupervised Point Cloud Denoising

Xiangbin Wei, Yuanfeng Wang, Ao XU et al.

ICCV 2025arXiv:2503.09283

citations

#1246

Triad: Empowering LMM-based Anomaly Detection with Expert-guided Region-of-Interest Tokenizer and Manufacturing Process

Yuanze Li, Shihao Yuan, Haolin Wang et al.

ICCV 2025

citations

#1247

Multimodal Prompt Alignment for Facial Expression Recognition

Fuyan Ma, Yiran He, Bin Sun et al.

ICCV 2025arXiv:2506.21017

citations

#1248

RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction

Johannes Künzel, Anna Hilsmann, Peter Eisert

ICCV 2025arXiv:2507.04839

citations

#1249

InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes

Zesong Yang, Bangbang Yang, Wenqi Dong et al.

ICCV 2025arXiv:2507.08416

citations

#1250

VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning

Jinglei Zhang, Yuanfan Guo, Rolandos Alexandros Potamias et al.

ICCV 2025arXiv:2510.14672

citations

#1251

Everything is a Video: Unifying Modalities through Next-Frame Prediction

G Thomas Hudson, Dean Slack, Thomas Winterbottom et al.

ICCV 2025arXiv:2411.10503

citations

#1252

What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

Xavier Thomas, Deepti Ghadiyaram

ICCV 2025arXiv:2503.06698

citations

#1253

LiON-LoRA: Rethinking LoRA Fusion to Unify Controllable Spatial and Temporal Generation for Video Diffusion

Yisu Zhang, Chenjie Cao, Chaohui Yu et al.

ICCV 2025arXiv:2507.05678

citations

#1254

CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling

Trong-Thang Pham, AKASH AWASTHI, Saba Khan et al.

ICCV 2025highlightarXiv:2507.12591

citations

#1255

EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba

Quang Nguyen, Nhat Le, Baoru Huang et al.

ICCV 2025arXiv:2508.10522

citations

#1256

Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection

Ji Du, Xin WANG, Fangwei Hao et al.

ICCV 2025arXiv:2510.18437

citations

#1257

Advancing Visual Large Language Model for Multi-granular Versatile Perception

Wentao Xiang, Haoxian Tan, Cong Wei et al.

ICCV 2025arXiv:2507.16213

citations

#1258

Kaputt: A Large-Scale Dataset for Visual Defect Detection

Sebastian Höfer, Dorian Henning, Artemij Amiranashvili et al.

ICCV 2025arXiv:2510.05903

citations

#1259

Free-running vs Synchronous: Single-Photon Lidar for High-flux 3D Imaging

Ruangrawee Kitichotkul, Shashwath Bharadwaj, Joshua Rapp et al.

ICCV 2025arXiv:2507.09386

citations

#1260

ODP-Bench: Benchmarking Out-of-Distribution Performance Prediction

Han Yu, Kehan Li, Dongbai Li et al.

ICCV 2025arXiv:2510.27263

citations

#1261

DNF-Intrinsic: Deterministic Noise-Free Diffusion for Indoor Inverse Rendering

Rongjia Zheng, Qing Zhang, Chengjiang Long et al.

ICCV 2025arXiv:2507.03924

citations

#1262

Stylized-Face: A Million-level Stylized Face Dataset for Face Recognition

Zhengyuan Peng, Jianqing Xu, Yuge Huang et al.

ICCV 2025

citations

#1263

Degradation-Modeled Multipath Diffusion for Tunable Metalens Photography

Jianing Zhang, Jiayi Zhu, Feiyu Ji et al.

ICCV 2025highlightarXiv:2506.22753

citations

#1264

Alleviating Textual Reliance in Medical Language-guided Segmentation via Prototype-driven Semantic Approximation

Shuchang Ye, Usman Naseem, Mingyuan Meng et al.

ICCV 2025arXiv:2507.11055

citations

#1265

Generate, Transduct, Adapt: Iterative Transduction with VLMs

Oindrila Saha, Logan Lawrence, Grant Horn et al.

ICCV 2025arXiv:2501.06031

citations

#1266

Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion

Yidi Liu, Dong Li, Yuxin Ma et al.

ICCV 2025arXiv:2503.12764

citations

#1267

LookOut: Real-World Humanoid Egocentric Navigation

Boxiao Pan, Adam Harley, Francis Engelmann et al.

ICCV 2025arXiv:2508.14466

citations

#1268

PseudoMapTrainer: Learning Online Mapping without HD Maps

Christian Löwens, Thorben Funke, Jingchao Xie et al.

ICCV 2025arXiv:2508.18788

citations

#1269

DIP: Unsupervised Dense In-Context Post-training of Visual Representations

Sophia Sirko-Galouchenko, Spyros Gidaris, Antonin Vobecky et al.

ICCV 2025arXiv:2506.18463

citations

#1270

MAVias: Mitigate any Visual Bias

Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos et al.

ICCV 2025arXiv:2412.06632

citations

#1271

Evading Data Provenance in Deep Neural Networks

Hongyu Zhu, Sichu Liang, Wenwen Wang et al.

ICCV 2025highlightarXiv:2508.01074

citations

#1272

MoSiC: Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning

Mohammadreza Salehi, Shashanka Venkataramanan, Ioana Simion et al.

ICCV 2025arXiv:2506.08694

citations

#1273

MMOne: Representing Multiple Modalities in One Scene

Zhifeng Gu, Bing WANG

ICCV 2025arXiv:2507.11129

citations

#1274

DictAS: A Framework for Class-Generalizable Few-Shot Anomaly Segmentation via Dictionary Lookup

Zhen Qu, Xian Tao, Xinyi Gong et al.

ICCV 2025arXiv:2508.13560

citations

#1275

SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting

Shengjie Lin, Jiading Fang, Muhammad Zubair Irshad et al.

ICCV 2025arXiv:2506.03594

citations

#1276

Web Artifact Attacks Disrupt Vision Language Models

Maan Qraitem, Piotr Teterwak, Kate Saenko et al.

ICCV 2025arXiv:2503.13652

citations

#1277

Fast Globally Optimal and Geometrically Consistent 3D Shape Matching

Paul Roetzer, Florian Bernard

ICCV 2025highlightarXiv:2504.06385

citations

#1278

DALIP: Distribution Alignment-based Language-Image Pre-Training for Domain-Specific Data

Junjie Wu, Jiangtao Xie, Zhaolin Zhang et al.

ICCV 2025arXiv:2504.01386

citations

#1279

MixA-Q: Revisiting Activation Sparsity for Vision Transformers from a Mixed-Precision Quantization Perspective

Weitian Wang, Shubham rai, Cecilia De la Parra et al.

ICCV 2025arXiv:2507.19131

citations

#1280

Consensus-Driven Active Model Selection

Justin Kay, Grant Horn, Subhransu Maji et al.

ICCV 2025highlightarXiv:2507.23771

citations

#1281

SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation

Hao Ban, Gokul Ram Subramani, Kaiyi Ji

ICCV 2025arXiv:2507.07883

citations

#1282

CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective

Zongheng Tang, Yi Liu, Yifan Sun et al.

ICCV 2025highlightarXiv:2508.00359

citations

#1283

CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation

Xiao Lin, Yun Peng, Liuyi Wang et al.

ICCV 2025arXiv:2502.01312

citations

#1284

OpenAnimals: Revisiting Person Re-Identification for Animals Towards Better Generalization

Saihui Hou, Panjian Huang, Zengbin Wang et al.

ICCV 2025arXiv:2410.00204

citations

#1285

LaRender: Training-Free Occlusion Control in Image Generation via Latent Rendering

Xiaohang Zhan, Dingming Liu

ICCV 2025arXiv:2508.07647

citations

#1286

Generalized Few-Shot Point Cloud Segmentation via LLM-Assisted Hyper-Relation Matching

Zhaoyang Li, Yuan Wang, Guoxin Xiong et al.

ICCV 2025

citations

#1287

SketchSplat: 3D Edge Reconstruction via Differentiable Multi-view Sketch Splatting

Haiyang Ying, Matthias Zwicker

ICCV 2025arXiv:2503.14786

citations

#1288

SG-LDM: Semantic-Guided LiDAR Generation via Latent-Aligned Diffusion

Zhengkang Xiang, Zizhao Li, Amir Khodabandeh et al.

ICCV 2025arXiv:2506.23606

citations

#1289

PRISM: Reducing Spurious Implicit Biases in Vision-Language Models with LLM-Guided Embedding Projection

Mahdiyar Molahasani, Azadeh Motamedi, Michael Greenspan et al.

ICCV 2025arXiv:2507.08979

citations

#1290

Sim-DETR: Unlock DETR for Temporal Sentence Grounding

Jiajin Tang, Zhengxuan Wei, Yuchen Zhu et al.

ICCV 2025arXiv:2509.23867

citations

#1291

ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis

Onkar Susladkar, Gayatri Deshmukh, Yalcin Tur et al.

ICCV 2025arXiv:2505.04963

citations

#1292

Selective Contrastive Learning for Weakly Supervised Affordance Grounding

WonJun Moon, Hyun Seok Seong, Jae-Pil Heo

ICCV 2025arXiv:2508.07877

citations

#1293

CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting

Lei Tian, Xiaomin Li, Liqian Ma et al.

ICCV 2025arXiv:2505.20469

citations

#1294

Generative Adversarial Diffusion

U-Chae Jun, Jaeeun Ko, Jiwoo Kang

ICCV 2025

citations

#1295

Monocular Facial Appearance Capture in the Wild

Yingyan Xu, Kate Gadola, Prashanth Chandran et al.

ICCV 2025arXiv:2412.12765

citations

#1296

Intrepretable Zero-Shot Learning with Locally-Aligned Vision-Language Model

Shiming Chen, Bowen Duan, Salman Khan et al.

ICCV 2025

citations

#1297

AMD: Adaptive Momentum and Decoupled Contrastive Learning Framework for Robust Long-Tail Trajectory Prediction

Bin Rao, Haicheng Liao, Yanchen Guan et al.

ICCV 2025arXiv:2507.01801

citations

#1298

Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention

Jiawei Gu, Ziyue Qiao, Zechao Li

ICCV 2025arXiv:2507.01417

citations

#1299

Deep Incomplete Multi-view Clustering with Distribution Dual-Consistency Recovery Guidance

Jiaqi Jin, Siwei Wang, Zhibin Dong et al.

ICCV 2025arXiv:2503.11017

citations

#1300

ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning

Zhengzhuo Xu, Sinan Du, Yiyan Qi et al.

ICCV 2025arXiv:2512.00305

citations

#1301

Learning Visual Hierarchies in Hyperbolic Space for Image Retrieval

Ziwei Wang, Sameera Ramasinghe, Chenchen Xu et al.

ICCV 2025arXiv:2411.17490

citations

#1302

Improving Noise Efficiency in Privacy-preserving Dataset Distillation

Runkai Zheng, Vishnu Dasu, Yinong Wang et al.

ICCV 2025arXiv:2508.01749

citations

#1303

D2ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition

Wenjie Pei, Qizhong Tan, Guangming Lu et al.

ICCV 2025

citations

#1304

DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing

Yang JingYi, Xun Lin, Zitong YU et al.

ICCV 2025arXiv:2503.00429

citations

#1305

Quantifying and Narrowing the Unknown: Interactive Text-to-Video Retrieval via Uncertainty Minimization

Bingqing Zhang, Zhuo Cao, Heming Du et al.

ICCV 2025arXiv:2507.15504

citations

#1306

Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment

Kejia Zhang, Juanjuan Weng, Zhiming Luo et al.

ICCV 2025arXiv:2408.06079

citations

#1307

Federated Domain Generalization with Domain-specific Soft Prompts Generation

Jianhan Wu, Xiaoyang Qu, Zhangcheng Huang et al.

ICCV 2025arXiv:2509.20807

citations

#1308

Boosting Vision Semantic Density with Anatomy Normality Modeling for Medical Vision-language Pre-training

Weiwei Cao, Jianpeng Zhang, Zhongyi Shui et al.

ICCV 2025arXiv:2508.03742

citations

#1309

Timestep-Aware Diffusion Model for Extreme Image Rescaling

Ce Wang, Zhenyu Hu, Wanjie Sun et al.

ICCV 2025arXiv:2408.09151

citations

#1310

Adversarial Attention Perturbations for Large Object Detection Transformers

Zachary Yahn, Selim Tekin, Fatih Ilhan et al.

ICCV 2025arXiv:2508.02987

citations

#1311

Global Regulation and Excitation via Attention Tuning for Stereo Matching

Jiahao LI, Xinhong Chen, Zhengmin JIANG et al.

ICCV 2025arXiv:2509.15891

citations

#1312

Continual Multiple Instance Learning with Enhanced Localization for Histopathological Whole Slide Image Analysis

Byung Hyun Lee, Wongi Jeong, Woojae Han et al.

ICCV 2025arXiv:2507.02395

citations

#1313

MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics

Bowei Guo, Shengkun Tang, Cong Zeng et al.

ICCV 2025arXiv:2510.11962

citations

#1314

Cross-Architecture Distillation Made Simple with Redundancy Suppression

Weijia Zhang, Yuehao Liu, Wu Ran et al.

ICCV 2025highlightarXiv:2507.21844

citations

#1315

Text2VDM: Text to Vector Displacement Maps for Expressive and Interactive 3D Sculpting

Hengyu Meng, Duotun Wang, Zhijing Shao et al.

ICCV 2025arXiv:2502.20045

citations

#1316

Diffusion Image Prior

Hamadi Chihaoui, Paolo Favaro

ICCV 2025arXiv:2503.21410

citations

#1317

MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers

Yang Tian, Zheng Lu, Mingqi Gao et al.

ICCV 2025arXiv:2503.16856

citations

#1318

BézierGS: Dynamic Urban Scene Reconstruction with Bézier Curve Gaussian Splatting

Zipei Ma, Junzhe Jiang, Yurui Chen et al.

ICCV 2025arXiv:2506.22099

citations

#1319

Backdoor Attacks on Neural Networks via One-Bit Flip

Xiang Li, Lannan Luo, Qiang Zeng

ICCV 2025

citations

#1320

GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion

Li-Heng Chen, Zi-Xin Zou, Chang Liu et al.

ICCV 2025arXiv:2503.22349

citations

#1321

Differentiable Room Acoustic Rendering with Multi-View Vision Priors

Derong Jin, Ruohan Gao

ICCV 2025arXiv:2504.21847

citations

#1322

FICGen: Frequency-Inspired Contextual Disentanglement for Layout-driven Degraded Image Generation

Wenzhuang Wang, Yifan Zhao, Mingcan Ma et al.

ICCV 2025arXiv:2509.01107

citations

#1323

Trade-offs in Image Generation: How Do Different Dimensions Interact?

Sicheng Zhang, Binzhu Xie, Zhonghao Yan et al.

ICCV 2025arXiv:2507.22100

citations

#1324

HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction for Large-Scale Aerial Scenes

Mai Su, Zhongtao Wang, Huishan Au et al.

ICCV 2025arXiv:2504.16606

citations

#1325

Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration

Katie Luo, Minh-Quan Dao, Zhenzhen Liu et al.

ICCV 2025arXiv:2502.14156

citations

#1326

LOTA: Bit-Planes Guided AI-Generated Image Detection

Renxi Cheng, Hongsong Wang, Yang Zhang et al.

ICCV 2025arXiv:2510.14230

citations

#1327

PAN-Crafter: Learning Modality-Consistent Alignment for PAN-Sharpening

Jeonghyeok Do, Sungpyo Kim, Geunhyuk Youk et al.

ICCV 2025arXiv:2505.23367

citations

#1328

GLEAM: Enhanced Transferable Adversarial Attacks for Vision-Language Pre-training Models via Global-Local Transformations

Yunqi Liu, Xiaohui Cui, Ouyang Xue

ICCV 2025

citations

#1329

Progressive Growing of Video Tokenizers for Temporally Compact Latent Spaces

Aniruddha Mahapatra, Long Mai, David Bourgin et al.

ICCV 2025arXiv:2501.05442

citations

#1330

Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models

Hyungjin Kim, Seokho Ahn, Young-Duk Seo

ICCV 2025arXiv:2508.03481

citations

#1331

Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation

Yong Liu, Song-Li Wu, Sule Bai et al.

ICCV 2025arXiv:2506.16058

citations

#1332

HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding

JIAHE ZHAO, RuiBing Hou, zejie tian et al.

ICCV 2025arXiv:2503.12955

citations

#1333

Holistic Tokenizer for Autoregressive Image Generation

Anlin Zheng, Haochen Wang, Yucheng Zhao et al.

ICCV 2025arXiv:2507.02358

citations

#1334

PLA: Prompt Learning Attack against Text-to-Image Generative Models

XINQI LYU, Yihao LIU, Yanjie Li et al.

ICCV 2025arXiv:2508.03696

citations

#1335

HouseTour: A Virtual Real Estate A(I)gent

Ata Çelen, Iro Armeni, Daniel Barath et al.

ICCV 2025arXiv:2510.18054

citations

#1336

Retinex-MEF: Retinex-based Glare Effects Aware Unsupervised Multi-Exposure Image Fusion

Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.

ICCV 2025arXiv:2503.07235

citations

#1337

ViLU: Learning Vision-Language Uncertainties for Failure Prediction

Marc Lafon, Yannis Karmim, Julio Silva-Rodríguez et al.

ICCV 2025arXiv:2507.07620

citations

#1338

Subjective Camera 1.0: Bridging Human Cognition and Visual Reconstruction through Sequence-Aware Sketch-Guided Diffusion

Haoyang Chen, Dongfang Sun, Caoyuan Ma et al.

ICCV 2025arXiv:2506.23711

citations

#1339

Adapting In-Domain Few-Shot Segmentation to New Domains without Source Domain Retraining

Qi Fan, Kaiqi Liu, Nian Liu et al.

ICCV 2025arXiv:2504.21414

citations

#1340

FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models

Mainak Singha, Subhankar Roy, Sarthak Mehrotra et al.

ICCV 2025arXiv:2504.20860

citations

#1341

Demeter: A Parametric Model of Crop Plant Morphology from the Real World

Tianhang Cheng, Albert Zhai, Evan Chen et al.

ICCV 2025arXiv:2510.16377

citations

#1342

DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models

Revant Teotia, Candace Ross, Karen Ullrich et al.

ICCV 2025arXiv:2506.05108

citations

#1343

MoGA: 3D Generative Avatar Prior for Monocular Gaussian Avatar Reconstruction

Zijian Dong, Longteng Duan, Jie Song et al.

ICCV 2025highlightarXiv:2507.23597

citations

#1344

Improving Rectified Flow with Boundary Conditions

Xixi Hu, Runlong Liao, Bo Liu et al.

ICCV 2025arXiv:2506.15864

citations

#1345

InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

Tao Han, Wanghan Xu, Junchao Gong et al.

ICCV 2025arXiv:2509.10441

citations

#1346

What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models

Lorenzo Baraldi, Davide Bucciarelli, Federico Betti et al.

ICCV 2025arXiv:2505.20405

citations

#1347

Stable Score Distillation

Haiming Zhu, Yangyang Xu, Chenshu Xu et al.

ICCV 2025arXiv:2507.09168

citations

#1348

Aligning Constraint Generation with Design Intent in Parametric CAD

Evan Casey, Tianyu Zhang, Shu Ishida et al.

ICCV 2025arXiv:2504.13178

citations

#1349

Supercharged One-step Text-to-Image Diffusion Models with Negative Prompts

Viet Nguyen, Anh Nguyen, Trung Dao et al.

ICCV 2025arXiv:2412.02687

citations

#1350

MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation

Yanchen Liu, Yanan SUN, Zhening Xing et al.

ICCV 2025arXiv:2507.16310

citations

#1351

Denoising Token Prediction in Masked Autoregressive Models

Ting Yao, Yehao Li, Yingwei Pan et al.

ICCV 2025

citations

#1352

Balanced Sharpness-Aware Minimization for Imbalanced Regression

Yahao Liu, Qin Wang, Lixin Duan et al.

ICCV 2025arXiv:2508.16973

citations

#1353

Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views

Xiangdong Zhang, Shaofeng Zhang, Junchi Yan

ICCV 2025arXiv:2509.01250

citations

#1354

DOGR: Towards Versatile Visual Document Grounding and Referring

Yinan Zhou, Yuxin Chen, Haokun Lin et al.

ICCV 2025arXiv:2411.17125

citations

#1355

SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting

Paschalis Giakoumoglou, Dimitrios Karageorgiou, Symeon Papadopoulos et al.

ICCV 2025arXiv:2502.06593

citations

#1356

ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition

Sanjoy Kundu, Shanmukha Vellamcheti, Sathyanarayanan Aakur

ICCV 2025arXiv:2504.03948

citations

#1357

MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion

Fei Peng, Junqiang Wu, Yan Li et al.

ICCV 2025arXiv:2508.14440

citations

#1358

StyleKeeper: Prevent Content Leakage using Negative Visual Query Guidance

Jaeseok Jeong, Junho Kim, Youngjung Uh et al.

ICCV 2025arXiv:2510.06827

citations

#1359

Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data

Zeyi Sun, Tong Wu, Pan Zhang et al.

ICCV 2025arXiv:2406.00093

citations

#1360

RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions

Bimsara Pathiraja, Maitreya Patel, Shivam Singh et al.

ICCV 2025arXiv:2506.03448

citations

#1361

SPADE: Spatial-Aware Denoising Network for Open-vocabulary Panoptic Scene Graph Generation with Long- and Local-range Context Reasoning

XIN Hu, Ke Qin, Guiduo Duan et al.

ICCV 2025arXiv:2507.05798

citations

#1362

CAFA: a Controllable Automatic Foley Artist

Roi Benita, Michael Finkelson, Tavi Halperin et al.

ICCV 2025arXiv:2504.06778

citations

#1363

Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion

Enyu Liu, En Yu, Sijia Chen et al.

ICCV 2025arXiv:2507.08555

citations

#1364

LINR-PCGC: Lossless Implicit Neural Representations for Point Cloud Geometry Compression

Wenjie Huang, Qi Yang, Shuting Xia et al.

ICCV 2025arXiv:2507.15686

citations

#1365

Beyond Simple Edits: Composed Video Retrieval with Dense Modifications

Omkar Thawakar, Dmitry Demidov, Ritesh Thawkar et al.

ICCV 2025arXiv:2508.14039

citations

#1366

Prior2Former - Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation

Sebastian Schmidt, Julius Koerner, Dominik Fuchsgruber et al.

ICCV 2025highlightarXiv:2504.04841

citations

#1367

Trust but Verify: Programmatic VLM Evaluation in the Wild

Viraj Prabhu, Senthil Purushwalkam, An Yan et al.

ICCV 2025arXiv:2410.13121

citations

#1368

SDMatte: Grafting Diffusion Models for Interactive Matting

Longfei Huang, Yu Liang, Hao Zhang et al.

ICCV 2025arXiv:2508.00443

citations

#1369

Representing 3D Shapes With 64 Latent Vectors for 3D Diffusion Models

In Cho, Youngbeom Yoo, Subin Jeon et al.

ICCV 2025arXiv:2503.08737

citations

#1370

Robust 3D Object Detection using Probabilistic Point Clouds from Single-Photon LiDARs

Bhavya Goyal, Felipe Gutierrez-Barragan, Wei Lin et al.

ICCV 2025arXiv:2508.00169

citations

#1371

GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding

Zijun Lin, Shuting He, Cheston Tan et al.

ICCV 2025arXiv:2506.21188

citations

#1372

Understanding Co-speech Gestures in-the-wild

Sindhu Hegde, K R Prajwal, Taein Kwon et al.

ICCV 2025arXiv:2503.22668

citations

#1373

AnyPortal: Zero-Shot Consistent Video Background Replacement

Wenshuo Gao, Xicheng Lan, Shuai Yang

ICCV 2025arXiv:2509.07472

citations

#1374

MatchDiffusion: Training-free Generation of Match-Cuts

Alejandro Pardo, Fabio Pizzati, Tong Zhang et al.

ICCV 2025arXiv:2411.18677

citations

#1375

IntroStyle: Training-Free Introspective Style Attribution using Diffusion Features

Anand Kumar, Jiteng Mu, Nuno Vasconcelos

ICCV 2025arXiv:2412.14432

citations

#1376

FedPall: Prototype-based Adversarial and Collaborative Learning for Federated Learning with Feature Drift

yong zhang, Feng Liang, Guanghu Yuan et al.

ICCV 2025arXiv:2507.04781

citations

#1377

SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation

Shiqi Huang, Shuting He, Huaiyuan Qin et al.

ICCV 2025highlightarXiv:2507.12857

citations

#1378

PASTA: Part-Aware Sketch-to-3D Shape Generation with Text-Aligned Prior

Seunggwan Lee, Hwanhee Jung, ByoungSoo Koh et al.

ICCV 2025arXiv:2503.12834

citations

#1379

LUT-Fuse: Towards Extremely Fast Infrared and Visible Image Fusion via Distillation to Learnable Look-Up Tables

Xunpeng Yi, yibing zhang, Xinyu Xiang et al.

ICCV 2025arXiv:2509.00346

citations

#1380

An Inversion-based Measure of Memorization for Diffusion Models

Zhe Ma, Qingming Li, Xuhong Zhang et al.

ICCV 2025arXiv:2405.05846

citations

#1381

Learning to See in the Extremely Dark

Hai Jiang, Binhao Guan, Zhen Liu et al.

ICCV 2025arXiv:2506.21132

citations

#1382

Principles of Visual Tokens for Efficient Video Understanding

Xinyue Hao, Li, Shreyank Gowda et al.

ICCV 2025arXiv:2411.13626

citations

#1383

Aligning Moments in Time using Video Queries

Yogesh Kumar, Uday Agarwal, Manish Gupta et al.

ICCV 2025arXiv:2508.15439

citations

#1384

ConsNoTrainLoRA: Data-driven Weight Initialization of Low-rank Adapters using Constraints

Debasmit Das, Hyoungwoo Park, Munawar Hayat et al.

ICCV 2025arXiv:2507.08044

citations

#1385

Correspondence-Free Fast and Robust Spherical Point Pattern Registration

Anik Sarker, Alan Asbeck

ICCV 2025arXiv:2508.02339

citations

#1386

Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning

Junjie Shan, Ziqi Zhao, Jialin Lu et al.

ICCV 2025arXiv:2411.14937

citations

#1387

TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes

Yan Xia, Yunxiang Lu, Rui Song et al.

ICCV 2025arXiv:2412.10308

citations

#1388

ContraGS: Codebook-Condensed and Trainable Gaussian Splatting for Fast, Memory-Efficient Reconstruction

Sankeerth Durvasula, Sharanshangar Muhunthan, Zain Moustafa et al.

ICCV 2025arXiv:2509.03775

citations

#1389

Fewer Denoising Steps or Cheaper Per-Step Inference: Towards Compute-Optimal Diffusion Model Deployment

Zhenbang Du, Yonggan Fu, Lifu Wang et al.

ICCV 2025arXiv:2508.06160

citations

#1390

Diff2I2P: Differentiable Image-to-Point Cloud Registration with Diffusion Prior

Juncheng Mu, Chengwei REN, Weixiang Zhang et al.

ICCV 2025

citations

#1391

Removing Cost Volumes from Optical Flow Estimators

Simon Kiefhaber, Stefan Roth, Simone Schaub-Meyer

ICCV 2025arXiv:2510.13317

citations

#1392

CA-I2P: Channel-Adaptive Registration Network with Global Optimal Selection

Zhixin Cheng, Jiacheng Deng, Xinjun Li et al.

ICCV 2025arXiv:2506.21364

citations

#1393

D-Attn: Decomposed Attention for Large Vision-and-Language Model

Chia-Wen Kuo, Sijie Zhu, Fan Chen et al.

ICCV 2025arXiv:2502.01906

citations

#1394

ImHead: A Large-scale Implicit Morphable Model for Localized Head Modeling

Rolandos Alexandros Potamias, Stathis Galanakis, Jiankang Deng et al.

ICCV 2025arXiv:2510.10793

citations

#1395

VRM: Knowledge Distillation via Virtual Relation Matching

Weijia Zhang, Fei Xie, Weidong Cai et al.

ICCV 2025highlightarXiv:2502.20760

citations

#1396

Normal and Abnormal Pathology Knowledge-Augmented Vision-Language Model for Anomaly Detection in Pathology Images

Jinsol Song, Jiamu Wang, Anh Nguyen et al.

ICCV 2025arXiv:2508.15256

citations

#1397

Augmenting Moment Retrieval: Zero-Dependency Two-Stage Learning

Zhengxuan Wei, Jiajin Tang, Sibei Yang

ICCV 2025arXiv:2510.19622

citations

#1398

Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning

Zedong Wang, Siyuan Li, Dan Xu

ICCV 2025highlightarXiv:2507.21049

citations

#1399

Activation Subspaces for Out-of-Distribution Detection

Barış Zöngür, Robin Hesse, Stefan Roth

ICCV 2025arXiv:2508.21695

citations

#1400

IRGPT: Understanding Real-world Infrared Image with Bi-cross-modal Curriculum on Large-scale Benchmark

Zhe Cao, Jin Zhang, Ruiheng Zhang

ICCV 2025arXiv:2507.14449

citations

← Previous

1...5 6 7 8 9...14