Most Cited ICCV "canonicalization" Papers

2,701 papers found • Page 3 of 14

Filters:Most Cited ICCV canonicalization Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#401

Adversarial Robust Memory-Based Continual Learner

Xiaoyue Mi, Fan Tang, Zonghan Yang et al.

ICCV 2025posterarXiv:2311.17608

citations

#402

LVFace: Progressive Cluster Optimization for Large Vision Models in Face Recognition

Jinghan You, Shanglin Li, Yuanrui Sun et al.

ICCV 2025highlightarXiv:2501.13420

citations

#403

MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance

Hallee Wong, Jose Javier Gonzalez Ortiz, John Guttag et al.

ICCV 2025posterarXiv:2412.15058

citations

#404

ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness

Boqian Li, Zeyu Cai, Michael Black et al.

ICCV 2025highlightarXiv:2503.10624

citations

#405

Multi-turn Consistent Image Editing

Zijun Zhou, Yingying Deng, Xiangyu He et al.

ICCV 2025posterarXiv:2505.04320

citations

#406

OpenM3D: Open Vocabulary Multi-view Indoor 3D Object Detection without Human Annotations

Peng-Hao Hsu, Ke Zhang, Fu-En Wang et al.

ICCV 2025posterarXiv:2508.20063

citations

#407

Latent Diffusion Models with Masked AutoEncoders

Junho Lee, Jeongwoo Shin, Hyungwook Choi et al.

ICCV 2025posterarXiv:2507.09984

citations

#408

FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation

Cui Miao, Tao Chang, meihan wu et al.

ICCV 2025posterarXiv:2508.02190

citations

#409

Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation

Tuna Meral, Enis Simsar, Federico Tombari et al.

ICCV 2025highlightarXiv:2403.19776

citations

#410

DLF: Extreme Image Compression with Dual-generative Latent Fusion

Naifu Xue, Zhaoyang Jia, Jiahao Li et al.

ICCV 2025highlightarXiv:2503.01428

citations

#411

PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning

Yan Zhang, Yao Feng, Alpár Cseke et al.

ICCV 2025posterarXiv:2503.17544

citations

#412

DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Model

Junjia Huang, Pengxiang Yan, Jinhang Cai et al.

ICCV 2025highlight

citations

#413

Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation

Yujie Zhang, Bingyang Cui, Qi Yang et al.

ICCV 2025posterarXiv:2412.11170

citations

#414

SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering

Byeongjun Park, Hyojun Go, Hyelin Nam et al.

ICCV 2025posterarXiv:2503.12024

citations

#415

FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model

Yukang Cao, Chenyang Si, Jinghao Wang et al.

ICCV 2025posterarXiv:2507.01953

citations

#416

Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers

Divyansh Srivastava, Xiang Zhang, He Wen et al.

ICCV 2025posterarXiv:2505.04718

citations

#417

DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models

hongji yang, Wencheng Han, Yucheng Zhou et al.

ICCV 2025posterarXiv:2502.14779

citations

#418

DreamFuse: Adaptive Image Fusion with Diffusion Transformer

Junjia Huang, Pengxiang Yan, Jiyang Liu et al.

ICCV 2025posterarXiv:2504.08291

citations

#419

X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation

jian ma, Qirong Peng, Xu Guo et al.

ICCV 2025posterarXiv:2503.06134

citations

#420

LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal

Shr-Ruei Tsai, Wei-Cheng Chang, Jie-Ying Lee et al.

ICCV 2025posterarXiv:2510.15868

citations

#421

DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization

Wenchuan Wang, Mengqi Huang, Yijing Tu et al.

ICCV 2025posterarXiv:2505.02192

citations

#422

Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration

Junyuan Deng, Wei Yin, Xiaoyang Guo et al.

ICCV 2025posterarXiv:2411.17240

citations

#423

Learning Interpretable Queries for Explainable Image Classification with Information Pursuit

Stefan Kolek, Aditya Chattopadhyay, Kwan Ho Ryan Chan et al.

ICCV 2025posterarXiv:2312.11548

citations

#424

NeurOp-Diff: Continuous Remote Sensing Image Super-Resolution via Neural Operator Diffusion

Zihao Xu, Yuzhi Tang, Bowen Xu et al.

ICCV 2025

citations

#425

Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context

Ge Zheng, Jiaye Qian, Jiajin Tang et al.

ICCV 2025posterarXiv:2510.20229

citations

#426

Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models?

Yuru Jia, Valerio Marsocci, Ziyang Gong et al.

ICCV 2025posterarXiv:2503.07890

citations

#427

CaO2: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation

Haoxuan Wang, Zhenghao Zhao, Junyi Wu et al.

ICCV 2025poster

citations

#428

Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Zhengyao Lyu, Tianlin Pan, Chenyang Si et al.

ICCV 2025posterarXiv:2506.07986

citations

#429

Multi-identity Human Image Animation with Structural Video Diffusion

Zhenzhi Wang, Yixuan Li, yanhong zeng et al.

ICCV 2025posterarXiv:2504.04126

citations

#430

SEGS-SLAM: Structure-enhanced 3D Gaussian Splatting SLAM with Appearance Embedding

Tianci Wen, Zhiang Liu, Yongchun Fang

ICCV 2025posterarXiv:2501.05242

citations

#431

PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Model

Jinhua Zhang, Hualian Sheng, Sijia Cai et al.

ICCV 2025posterarXiv:2407.06109

citations

#432

SweetTok: Semantic-Aware Spatial-Temporal Tokenizer for Compact Video Discretization

Zhentao Tan, Ben Xue, Jian Jia et al.

ICCV 2025posterarXiv:2412.10443

citations

#433

MOSCATO: Predicting Multiple Object State Change Through Actions

Parnian Zameni, Yuhan Shen, Ehsan Elhamifar

ICCV 2025poster

citations

#434

SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

Gengze Zhou, Yicong Hong, Zun Wang et al.

ICCV 2025posterarXiv:2412.05552

citations

#435

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Wenqi Zhang, Hang Zhang, Xin Li et al.

ICCV 2025highlightarXiv:2501.00958

citations

#436

BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes

Minkyun Seo, Hyungtae Lim, Kanghee Lee et al.

ICCV 2025highlightarXiv:2503.07940

citations

#437

egoPPG: Heart Rate Estimation from Eye-Tracking Cameras in Egocentric Systems to Benefit Downstream Vision Tasks

Björn Braun, Rayan Armani, Manuel Meier et al.

ICCV 2025posterarXiv:2502.20879

citations

#438

Rectifying Magnitude Neglect in Linear Attention

Qihang Fan, Huaibo Huang, Yuang Ai et al.

ICCV 2025highlightarXiv:2507.00698

citations

#439

CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation

Zhuoyan Luo, Yinghao Wu, Tianheng Cheng et al.

ICCV 2025posterarXiv:2405.15658

citations

#440

Fine-grained Spatiotemporal Grounding on Egocentric Videos

Shuo LIANG, Yiwu Zhong, Zi-Yuan Hu et al.

ICCV 2025posterarXiv:2508.00518

citations

#441

Low-Light Image Enhancement using Event-Based Illumination Estimation

Lei Sun, Yuhan Bao, Jiajun Zhai et al.

ICCV 2025posterarXiv:2504.09379

citations

#442

Make Your Training Flexible: Towards Deployment-Efficient Video Models

Chenting Wang, Kunchang Li, Tianxiang Jiang et al.

ICCV 2025posterarXiv:2503.14237

citations

#443

Hierarchical Cross-modal Prompt Learning for Vision-Language Models

Hao Zheng, Shunzhi Yang, Zhuoxin He et al.

ICCV 2025posterarXiv:2507.14976

citations

#444

Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation

Ao Ma, Jiasong Feng, Ke Cao et al.

ICCV 2025posterarXiv:2508.08949

citations

#445

Φ-GAN:Physics-Inspired GAN for Generating SAR Images Under Limited Data

Xidan Zhang, Yihan Zhuang, Qian Guo et al.

ICCV 2025poster

citations

#446

AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction

Junhao Cheng, Yuying Ge, Yixiao Ge et al.

ICCV 2025posterarXiv:2504.01014

citations

#447

ResGS: Residual Densification of 3D Gaussian for Efficient Detail Recovery

Yanzhe Lyu, Kai Cheng, Kang Xin et al.

ICCV 2025posterarXiv:2412.07494

citations

#448

LightSwitch: Multi-view Relighting with Material-guided Diffusion

Yehonathan Litman, Fernando De la Torre, Shubham Tulsiani

ICCV 2025posterarXiv:2508.06494

citations

#449

SparseRecon: Neural Implicit Surface Reconstruction from Sparse Views with Feature and Depth Consistencies

Liang Han, Xu Zhang, Haichuan Song et al.

ICCV 2025posterarXiv:2508.00366

citations

#450

Quadratic Gaussian Splatting: High Quality Surface Reconstruction with Second-order Geometric Primitives

ziyu zhang, Binbin Huang, Hanqing Jiang et al.

ICCV 2025posterarXiv:2411.16392

citations

#451

SP2T: Sparse Proxy Attention for Dual-stream Point Transformer

Jiaxu Wan, Hong Zhang, Ziqi He et al.

ICCV 2025poster

citations

#452

QuickSplat: Fast 3D Surface Reconstruction via Learned Gaussian Initialization

Yueh-Cheng Liu, Lukas Höllein, Matthias Nießner et al.

ICCV 2025posterarXiv:2505.05591

citations

#453

NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes

Han-Hung Lee, Qinghong Han, Angel Chang

ICCV 2025posterarXiv:2503.16375

citations

#454

MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network

Jianfei Jiang, Qiankun Liu, Haochen Yu et al.

ICCV 2025posterarXiv:2507.11333

citations

#455

StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting

Shakiba Kheradmand, Delio Vicini, George Kopanas et al.

ICCV 2025posterarXiv:2503.24366

citations

#456

EAMamba: Efficient All-Around Vision State Space Model for Image Restoration

Yu-Cheng Lin, Yu-Syuan Xu, Hao-Wei Chen et al.

ICCV 2025posterarXiv:2506.22246

citations

#457

TurboReg: TurboClique for Robust and Efficient Point Cloud Registration

Shaocheng Yan, Pengcheng Shi, Zhenjun Zhao et al.

ICCV 2025posterarXiv:2507.01439

citations

#458

PriorMotion: Generative Class-Agnostic Motion Prediction with Raster-Vector Motion Field Priors

Kangan Qian, Jinyu Miao, Xinyu Jiao et al.

ICCV 2025poster

citations

#459

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities

Liuyi Wang, Xinyuan Xia, Hui Zhao et al.

ICCV 2025posterarXiv:2507.13019

citations

#460

ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models

Shadi Hamdan, Chonghao Sima, Zetong Yang et al.

ICCV 2025posterarXiv:2506.07725

citations

#461

7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting

Zhongpai Gao, Benjamin Planche, Meng Zheng et al.

ICCV 2025posterarXiv:2503.07946

citations

#462

GS-ID: Illumination Decomposition on Gaussian Splatting via Adaptive Light Aggregation and Diffusion-Guided Material Priors

Kang DU, Zhihao Liang, Yulin Shen et al.

ICCV 2025posterarXiv:2408.08524

citations

#463

Improving Multimodal Learning via Imbalanced Learning

Shicai Wei, Chunbo Luo, Yang Luo

ICCV 2025posterarXiv:2507.10203

citations

#464

Auto-Vocabulary Semantic Segmentation

Osman Ülger, Maksymilian Kulicki, Yuki Asano et al.

ICCV 2025posterarXiv:2312.04539

citations

#465

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation

Emmanuelle Bourigault, Amir Jamaludin, Abdullah Hamdi

ICCV 2025posterarXiv:2504.06908

citations

#466

Beyond [cls]: Exploring the True Potential of Masked Image Modeling Representations

Marcin Przewięźlikowski, Randall Balestriero, Wojciech Jasiński et al.

ICCV 2025posterarXiv:2412.03215

citations

#467

JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers

Kwon Byung-Ki, Qi Dai, Lee Hyoseok et al.

ICCV 2025posterarXiv:2505.00482

citations

#468

Neural Shell Texture Splatting: More Details and Fewer Primitives

Xin Zhang, Anpei Chen, Jincheng Xiong et al.

ICCV 2025posterarXiv:2507.20200

citations

#469

UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation

Chaitanya Patel, Hiroki Nakamura, Yuta Kyuragi et al.

ICCV 2025posterarXiv:2508.01126

citations

#470

DistillDrive: End-to-End Multi-Mode Autonomous Driving Distillation by Isomorphic Hetero-Source Planning Model

Rui Yu, Xianghang Zhang, Runkai Zhao et al.

ICCV 2025posterarXiv:2508.05402

citations

#471

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

Andreas Engelhardt, Mark Boss, Vikram Voleti et al.

ICCV 2025posterarXiv:2510.08271

citations

#472

SAM4D: Segment Anything in Camera and LiDAR Streams

Jianyun Xu, Song Wang, Ziqian Ni et al.

ICCV 2025posterarXiv:2506.21547

citations

#473

LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

Nan Chen, Mengqi Huang, Yihao Meng et al.

ICCV 2025posterarXiv:2507.01945

citations

#474

Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation

Youwei Zheng, Yuxi Ren, Xin Xia et al.

ICCV 2025posterarXiv:2510.09094

citations

#475

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality

Sijie Li, Chen Chen, Jungong Han

ICCV 2025posterarXiv:2507.19264

citations

#476

Balanced Image Stylization with Style Matching Score

Yuxin Jiang, Liming Jiang, Shuai Yang et al.

ICCV 2025posterarXiv:2503.07601

citations

#477

GAP: Gaussianize Any Point Clouds with Text Guidance

Weiqi Zhang, Junsheng Zhou, Haotian Geng et al.

ICCV 2025posterarXiv:2508.05631

citations

#478

Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models

Zhen Zeng, Leijiang Gu, Xun Yang et al.

ICCV 2025posterarXiv:2411.12790

citations

#479

HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis

Timo Teufel, xilong zhou, Umar Iqbal et al.

ICCV 2025posterarXiv:2508.09137

citations

#480

OuroMamba: A Data-Free Quantization Framework for Vision Mamba

Akshat Ramachandran, Mingyu Lee, Huan Xu et al.

ICCV 2025posterarXiv:2503.10959

citations

#481

MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models

Young-Jun Lee, Byung-Kwan Lee, Jianshu Zhang et al.

ICCV 2025posterarXiv:2510.16641

citations

#482

Salvaging the Overlooked: Leveraging Class-Aware Contrastive Learning for Multi-Class Anomaly Detection

Lei Fan, Junjie Huang, Donglin Di et al.

ICCV 2025posterarXiv:2412.04769

citations

#483

Edit360: 2D Image Edits to 3D Assets from Any Angle

Junchao Huang, Xinting Hu, Shaoshuai Shi et al.

ICCV 2025highlightarXiv:2506.10507

citations

#484

CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems

Aniket Rege, Zinnia Nie, Unmesh Raskar et al.

ICCV 2025posterarXiv:2506.08071

citations

#485

iManip: Skill-Incremental Learning for Robotic Manipulation

Zexin Zheng, Jia-Feng Cai, Xiao-Ming Wu et al.

ICCV 2025posterarXiv:2503.07087

citations

#486

Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing

Joonghyuk Shin, Alchan Hwang, Yujin Kim et al.

ICCV 2025posterarXiv:2508.07519

citations

#487

DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation

Donglin Di, He Feng, Wenzhang SUN et al.

ICCV 2025posterarXiv:2410.07151

citations

#488

INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

Chenwei Lin, Hanjia Lyu, Xian Xu et al.

ICCV 2025posterarXiv:2406.09105

citations

#489

CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation

Leon Sick, Dominik Engel, Sebastian Hartwig et al.

ICCV 2025posterarXiv:2411.16319

citations

#490

I2V3D: Controllable Image-to-video Generation with 3D Guidance

Zhiyuan Zhang, Dongdong Chen, Jing Liao

ICCV 2025posterarXiv:2503.09733

citations

#491

Self-Calibrated Variance-Stabilizing Transformations for Real-World Image Denoising

Sébastien Herbreteau, Michael Unser

ICCV 2025posterarXiv:2407.17399

citations

#492

DuCos: Duality Constrained Depth Super-Resolution via Foundation Model

Zhiqiang Yan, Zhengxue Wang, Haoye Dong et al.

ICCV 2025posterarXiv:2503.04171

citations

#493

MonoFusion: Sparse-View 4D Reconstruction via Monocular Fusion

Zihan Wang, Jeff Tan, Tarasha Khurana et al.

ICCV 2025posterarXiv:2507.23782

citations

#494

GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology

Saarthak Kapse, Pushpak Pati, Srikar Yellapragada et al.

ICCV 2025highlightarXiv:2504.01009

citations

#495

Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization

Li, Yang Xiao, Jie Ji et al.

ICCV 2025posterarXiv:2504.09039

citations

#496

Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models

Beier Zhu, Ruoyu Wang, Tong Zhao et al.

ICCV 2025posterarXiv:2507.14797

citations

#497

Learning to Generalize without Bias for Open-Vocabulary Action Recognition

Yating Yu, Congqi Cao, Yifan Zhang et al.

ICCV 2025highlightarXiv:2502.20158

citations

#498

Precise Action-to-Video Generation Through Visual Action Prompts

Yuang Wang, Chao Wen, Haoyu Guo et al.

ICCV 2025posterarXiv:2508.13104

citations

#499

Robust Machine Unlearning for Quantized Neural Networks via Adaptive Gradient Reweighting with Similar Labels

Yujia Tong, Yuze Wang, Jingling Yuan et al.

ICCV 2025posterarXiv:2503.13917

citations

#500

VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving

Ruifei Zhang, Wei Zhang, Xiao Tan et al.

ICCV 2025posterarXiv:2511.06256

citations

#501

CAVIS: Context-Aware Video Instance Segmentation

Seunghun Lee, Jiwan Seo, Kiljoon Han et al.

ICCV 2025posterarXiv:2407.03010

citations

#502

VertexRegen: Mesh Generation with Continuous Level of Detail

Xiang Zhang, Yawar Siddiqui, Armen Avetisyan et al.

ICCV 2025posterarXiv:2508.09062

citations

#503

DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding

Wenwen Yu, Zhibo Yang, Yuliang Liu et al.

ICCV 2025posterarXiv:2508.08589

citations

#504

Dynamic Multimodal Prototype Learning in Vision-Language Models

Xingyu Zhu, Shuo Wang, Beier Zhu et al.

ICCV 2025posterarXiv:2507.03657

citations

#505

Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos

Changwoon Choi, Jeongjun Kim, Geonho Cha et al.

ICCV 2025posterarXiv:2412.19089

citations

#506

EgoM2P: Egocentric Multimodal Multitask Pretraining

Gen Li, Yutong Chen, Yiqian Wu et al.

ICCV 2025posterarXiv:2506.07886

citations

#507

Free-Form Motion Control: Controlling the 6D Poses of Camera and Objects in Video Generation

Xincheng Shuai, Henghui Ding, Zhenyuan Qin et al.

ICCV 2025posterarXiv:2501.01425

citations

#508

Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving

Hao Zhou, Zhanning Gao, Zhili Chen et al.

ICCV 2025posterarXiv:2411.13076

citations

#509

ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting

Ruijie Zhu, Mulin Yu, Linning Xu et al.

ICCV 2025posterarXiv:2507.15454

citations

#510

Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation

Jie Xu, Na Zhao, Gang Niu et al.

ICCV 2025posterarXiv:2503.04151

citations

#511

Learning to Unlearn while Retaining: Combating Gradient Conflicts in Machine Unlearning

Gaurav Patel, Qiang Qiu

ICCV 2025posterarXiv:2503.06339

citations

#512

Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation

Tiange Xiang, Kai Li, Chengjiang Long et al.

ICCV 2025posterarXiv:2503.15877

citations

#513

Semantic Watermarking Reinvented: Enhancing Robustness and Generation Quality with Fourier Integrity

Sung Ju Lee, Nam Ik Cho

ICCV 2025posterarXiv:2509.07647

citations

#514

Generative Zoo

Tomasz Niewiadomski, Anastasios Yiannakidis, Hanz Cuevas Velasquez et al.

ICCV 2025posterarXiv:2412.08101

citations

#515

DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer

Yecheng Wu, Han Cai, Junyu Chen et al.

ICCV 2025posterarXiv:2507.04947

citations

#516

Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models

Sangwon Baik, Hyeonwoo Kim, Hanbyul Joo

ICCV 2025posterarXiv:2503.19914

citations

#517

Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning

Tianyi Zhao, Boyang Liu, Yanglei Gao et al.

ICCV 2025posterarXiv:2503.11780

citations

#518

Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning

Qi Wang, Zhipeng Zhang, Baao Xie et al.

ICCV 2025posterarXiv:2503.08751

citations

#519

TurboTrain: Towards Efficient and Balanced Multi-Task Learning for Multi-Agent Perception and Prediction

Zewei Zhou, Zhihao Zhao, Tianhui Cai et al.

ICCV 2025posterarXiv:2508.04682

citations

#520

CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models

Gaoyang Zhang, Bingtao Fu, Qingnan Fan et al.

ICCV 2025posterarXiv:2412.13195

citations

#521

A Token-level Text Image Foundation Model for Document Understanding

Tongkun Guan, Zining Wang, Pei Fu et al.

ICCV 2025posterarXiv:2503.02304

citations

#522

PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Teng Zhou, Xiaoyu Zhang, Yongchuan Tang

ICCV 2025highlightarXiv:2411.15867

citations

#523

VGGSounder: Audio-Visual Evaluations for Foundation Models

Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu et al.

ICCV 2025posterarXiv:2508.08237

citations

#524

Not All Frame Features Are Equal: Video-to-4D Generation via Decoupling Dynamic-Static Features

Liying Yang, Chen Liu, Zhenwei Zhu et al.

ICCV 2025highlightarXiv:2502.08377

citations

#525

Towards Higher Effective Rank in Parameter-Efficient Fine-tuning using Khatri-Rao Product

Paul Albert, Frederic Zhang, Hemanth Saratchandran et al.

ICCV 2025posterarXiv:2508.00230

citations

#526

Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping

Weili Zeng, Ziyuan Huang, Kaixiang Ji et al.

ICCV 2025posterarXiv:2503.21817

citations

#527

GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning

Kelin Yu, Sheng Zhang, Harshit Soora et al.

ICCV 2025posterarXiv:2508.11049

citations

#528

TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning In Text-to-Image Models

Teng-Fang Hsiao, Bo-Kai Ruan, Yi-Lun Wu et al.

ICCV 2025posterarXiv:2503.15283

citations

#529

CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation

Xiangyang Luo, Ye Zhu, Yunfei Liu et al.

ICCV 2025posterarXiv:2507.02691

citations

#530

Synergistic Prompting for Robust Visual Recognition with Missing Modalities

Zhihui Zhang, Luanyuan Dai, Qika Lin et al.

ICCV 2025posterarXiv:2507.07802

citations

#531

Motion Synthesis with Sparse and Flexible Keyjoint Control

Inwoo Hwang, Jinseok Bae, Donggeun Lim et al.

ICCV 2025posterarXiv:2503.15557

citations

#532

Video Motion Graphs

Haiyang Liu, Zhan Xu, Fating Hong et al.

ICCV 2025highlightarXiv:2503.20218

citations

#533

Knowledge Distillation with Refined Logits

Wujie Sun, Defang Chen, Siwei Lyu et al.

ICCV 2025posterarXiv:2408.07703

citations

#534

OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering

Shiyong Liu, Xiao Tang, Zhihao Li et al.

ICCV 2025posterarXiv:2503.16177

citations

#535

MikuDance: Animating Character Art with Mixed Motion Dynamics

Jiaxu Zhang, Xianfang Zeng, Xin Chen et al.

ICCV 2025posterarXiv:2411.08656

citations

#536

TrustMark: Robust Watermarking and Watermark Removal for Arbitrary Resolution Images

Tu Bui, Shruti Agarwal, John Collomosse

ICCV 2025poster

citations

#537

Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning

Jun Li, Jinpeng Wang, Chaolei Tan et al.

ICCV 2025posterarXiv:2507.17402

citations

#538

Adaptive Hyper-Graph Convolution Network for Skeleton-based Human Action Recognition with Virtual Connections

Youwei Zhou, Tianyang Xu, Cong Wu et al.

ICCV 2025posterarXiv:2411.14796

citations

#539

Importance-Based Token Merging for Efficient Image and Video Generation

Haoyu Wu, Jingyi Xu, Hieu Le et al.

ICCV 2025posterarXiv:2411.16720

citations

#540

SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation

Wenjia Wang, Liang Pan, Zhiyang Dou et al.

ICCV 2025posterarXiv:2411.19921

citations

#541

A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision

Chensheng Peng, Ido Sobol, Masayoshi Tomizuka et al.

ICCV 2025posterarXiv:2412.00623

citations

#542

Constraint-Aware Feature Learning for Parametric Point Cloud

Xi Cheng, Ruiqi Lei, Di Huang et al.

ICCV 2025posterarXiv:2411.07747

citations

#543

PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination

Ming Dai, Wenxuan Cheng, Jiedong Zhuang et al.

ICCV 2025posterarXiv:2509.04833

citations

#544

MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration

Zhehui Wu, Yong Chen, Naoto Yokoya et al.

ICCV 2025posterarXiv:2503.09131

citations

#545

Faster and Better 3D Splatting via Group Training

Chengbo Wang, Guozheng Ma, Yizhen Lao et al.

ICCV 2025posterarXiv:2412.07608

citations

#546

Video Individual Counting for Moving Drones

Yaowu Fan, Jia Wan, Tao Han et al.

ICCV 2025highlightarXiv:2503.10701

citations

#547

Monocular Semantic Scene Completion via Masked Recurrent Networks

Xuzhi Wang, Xinran Wu, Song Wang et al.

ICCV 2025posterarXiv:2507.17661

citations

#548

RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case

Baihui Xiao, Chengjian Feng, Zhijian Huang et al.

ICCV 2025posterarXiv:2508.04642

citations

#549

Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation

Zhenjun Yu, Wenqiang Xu, Pengfei Xie et al.

ICCV 2025posterarXiv:2411.09572

citations

#550

CuMPerLay: Learning Cubical Multiparameter Persistence Vectorizations

Caner Korkmaz, Brighton Nuwagira, Baris Coskunuzer et al.

ICCV 2025posterarXiv:2510.12795

citations

#551

SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning

Lin Zhang, Xianfang Zeng, Kangcong Li et al.

ICCV 2025posterarXiv:2508.06125

citations

#552

SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders

Jiahui Geng, Qing Li

ICCV 2025posterarXiv:2503.14530

citations

#553

SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World

Chen Chen, Zhirui Wang, Taowei Sheng et al.

ICCV 2025posterarXiv:2503.16399

citations

#554

Open-ended Hierarchical Streaming Video Understanding with Vision Language Models

Hyolim Kang, Yunsu Park, Youngbeom Yoo et al.

ICCV 2025posterarXiv:2509.12145

citations

#555

NeRF Is a Valuable Assistant for 3D Gaussian Splatting

Shuangkang Fang, I-Chao Shen, Takeo Igarashi et al.

ICCV 2025posterarXiv:2507.23374

citations

#556

Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics

Muleilan Pei, Shaoshuai Shi, Xuesong Chen et al.

ICCV 2025posterarXiv:2507.12083

citations

#557

Disentangled Clothed Avatar Generation with Layered Representation

Weitian Zhang, Yichao Yan, Sijing Wu et al.

ICCV 2025highlightarXiv:2501.04631

citations

#558

GaRe: Relightable 3D Gaussian Splatting for Outdoor Scenes from Unconstrained Photo Collections

Haiyang Bai, Jiaqi Zhu, Songru Jiang et al.

ICCV 2025posterarXiv:2507.20512

citations

#559

VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization

Sihan Yang, Runsen Xu, Chenhang Cui et al.

ICCV 2025posterarXiv:2508.05211

citations

#560

GS-Occ3D: Scaling Vision-only Occupancy Reconstruction with Gaussian Splatting

Baijun Ye, Minghui Qin, Saining Zhang et al.

ICCV 2025posterarXiv:2507.19451

citations

#561

PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection

Xiao Li, Yiming Zhu, Yifan Huang et al.

ICCV 2025posterarXiv:2506.23581

citations

#562

HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly

Chang Liu, Yunfan Ye, Fan Zhang et al.

ICCV 2025posterarXiv:2507.19924

citations

#563

4D Gaussian Splatting SLAM

Yanyan Li, Youxu Fang, Zunjie Zhu et al.

ICCV 2025posterarXiv:2503.16710

citations

#564

I Am Big, You Are Little; I Am Right, You Are Wrong

David A Kelly, Akchunya Chanchal, Nathan Blake

ICCV 2025posterarXiv:2507.23509

citations

#565

FairGen: Enhancing Fairness in Text-to-Image Diffusion Models via Self-Discovering Latent Directions

Yilei Jiang, Wei-Hong Li, Yiyuan Zhang et al.

ICCV 2025posterarXiv:2412.18810

citations

#566

Details Matter for Indoor Open-vocabulary 3D Instance Segmentation

Sanghun Jung, Jingjing Zheng, Ke Zhang et al.

ICCV 2025posterarXiv:2507.23134

citations

#567

X2-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction

Weihao Yu, Yuanhao Cai, Ruyi Zha et al.

ICCV 2025poster

citations

#568

Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow

Ruyang Liu, Shangkun Sun, Haoran Tang et al.

ICCV 2025posterarXiv:2510.05836

citations

#569

Semantic Causality-Aware Vision-Based 3D Occupancy Prediction

Dubing Chen, Huan Zheng, Yucheng Zhou et al.

ICCV 2025posterarXiv:2509.08388

citations

#570

Stable Diffusion Models are Secretly Good at Visual In-Context Learning

Trevine Oorloff, Vishwanath Sindagi, Wele Gedara Chaminda Bandara et al.

ICCV 2025posterarXiv:2508.09949

citations

#571

EA-KD: Entropy-based Adaptive Knowledge Distillation

Chi-Ping Su, Ching-Hsun Tseng, Bin Pu et al.

ICCV 2025posterarXiv:2311.13621

citations

#572

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

Jiahui Wang, Zuyan Liu, Yongming Rao et al.

ICCV 2025posterarXiv:2506.05344

citations

#573

Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation

Congyi Fan, Jian Guan, Xuanjia Zhao et al.

ICCV 2025posterarXiv:2503.17340

citations

#574

Progressive Test Time Energy Adaptation for Medical Image Segmentation

Xiaoran Zhang, Byung-Woo Hong, Hyoungseob Park et al.

ICCV 2025highlightarXiv:2503.16616

citations

#575

Breaking the Encoder Barrier for Seamless Video-Language Understanding

Handong Li, Yiyuan Zhang, Longteng Guo et al.

ICCV 2025posterarXiv:2503.18422

citations

#576

Generalizable Object Re-Identification via Visual In-Context Prompting

Zhizhong Huang, Xiaoming Liu

ICCV 2025posterarXiv:2508.21222

citations

#577

EDiT: Efficient Diffusion Transformers with Linear Compressed Attention

Philipp Becker, Abhinav Mehrotra, Ruchika Chavhan et al.

ICCV 2025posterarXiv:2503.16726

citations

#578

StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion

Ziyu Guo, Young-Yoon Lee, Joseph Liu et al.

ICCV 2025posterarXiv:2503.21775

citations

#579

4D Visual Pre-training for Robot Learning

Chengkai Hou, Yanjie Ze, Yankai Fu et al.

ICCV 2025posterarXiv:2508.17230

citations

#580

Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis

Chen Zhao, Xuan Wang, Tong Zhang et al.

ICCV 2025posterarXiv:2411.00144

citations

#581

ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering

Kaisi Guan, Zhengfeng Lai, Yuchong Sun et al.

ICCV 2025posterarXiv:2503.16867

citations

#582

End-to-End Multi-Modal Diffusion Mamba

Chunhao Lu, Qiang Lu, Meichen Dong et al.

ICCV 2025posterarXiv:2510.13253

citations

#583

How Can Objects Help Video-Language Understanding?

Zitian Tang, Shijie Wang, Junho Cho et al.

ICCV 2025posterarXiv:2504.07454

citations

#584

FROSS: Faster-Than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images

Hao-Yu Hou, Chun-Yi Lee, Motoharu Sonogashira et al.

ICCV 2025posterarXiv:2507.19993

citations

#585

From Panels to Prose: Generating Literary Narratives from Comics

Ragav Sachdeva, Andrew Zisserman

ICCV 2025posterarXiv:2503.23344

citations

#586

Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin

Fangyikang Wang, Hubery Yin, Lei Qian et al.

ICCV 2025posterarXiv:2505.24222

citations

#587

GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation

Wentao Hu, Shunkai Li, Ziqiao Peng et al.

ICCV 2025highlightarXiv:2506.21513

citations

#588

On Large Multimodal Models as Open-World Image Classifiers

Alessandro Conti, Massimiliano Mancini, Enrico Fini et al.

ICCV 2025posterarXiv:2503.21851

citations

#589

RapVerse: Coherent Vocals and Whole-Body Motion Generation from Text

Jiaben Chen, Xin Yan, Yihang Chen et al.

ICCV 2025posterarXiv:2405.20336

citations

#590

ARGUS: Hallucination and Omission Evaluation in Video-LLMs

Ruchit Rawal, Reza Shirkavand, Heng Huang et al.

ICCV 2025posterarXiv:2506.07371

citations

#591

GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes

Pradyumn Goyal, Dmitrii Petrov, Sheldon Andrews et al.

ICCV 2025posterarXiv:2504.02747

citations

#592

Heavy Labels Out! Dataset Distillation with Label Space Lightening

Ruonan Yu, Songhua Liu, Zigeng Chen et al.

ICCV 2025posterarXiv:2408.08201

citations

#593

RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping

Dongming Wu, Yanping Fu, Saike Huang et al.

ICCV 2025posterarXiv:2507.23734

citations

#594

Grouped Speculative Decoding for Autoregressive Image Generation

Junhyuk So, Juncheol Shin, Hyunho Kook et al.

ICCV 2025posterarXiv:2508.07747

citations

#595

Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation

Jiaer Xia, Bingkui Tong, Yuhang Zang et al.

ICCV 2025highlightarXiv:2507.02859

citations

#596

3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection

Yung-Hsu Yang, Luigi Piccinelli, Mattia Segu et al.

ICCV 2025posterarXiv:2507.23567

citations

#597

GenM3: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation

Junyu Shi, Lijiang LIU, Yong Sun et al.

ICCV 2025poster

citations

#598

MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips

SHIBO WANG, Haonan He, Maria Parelli et al.

ICCV 2025posterarXiv:2508.05506

citations

#599

Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues

Francesco Taioli, Edoardo Zorzi, Gianni Franchi et al.

ICCV 2025posterarXiv:2412.01250

citations

#600

Sparse Fine-Tuning of Transformers for Generative Tasks

Wei Chen, Jingxi Yu, Zichen Miao et al.

ICCV 2025posterarXiv:2507.10855

citations

← Previous

1 2 3 4 5...14