Most Cited ICCV "implicit neural networks" Papers

2,701 papers found • Page 3 of 14

#401

CompCap: Improving Multimodal Large Language Models with Composite Captions

Xiaohui Chen, Satya Narayan Shukla, Mahmoud Azab et al.

ICCV 2025posterarXiv:2412.05243
6
citations
#402

StableCodec: Taming One-Step Diffusion for Extreme Image Compression

Tianyu Zhang, Xin Luo, Li Li et al.

ICCV 2025posterarXiv:2506.21977
6
citations
#403

Dissecting Generalized Category Discovery: Multiplex Consensus under Self-Deconstruction

Luyao Tang, Kunze Huang, Yuxuan Yuan et al.

ICCV 2025highlightarXiv:2508.10731
6
citations
#404

CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization

Jan Ackermann, Jonas Kulhanek, Shengqu Cai et al.

ICCV 2025posterarXiv:2506.21117
6
citations
#405

MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile Devices

HAILONG YAN, Ao Li, Xiangtao Zhang et al.

ICCV 2025posterarXiv:2507.01838
6
citations
#406

From Easy to Hard: Progressive Active Learning Framework for Infrared Small Target Detection with Single Point Supervision

Chuang Yu, Jinmiao Zhao, Yunpeng Liu et al.

ICCV 2025posterarXiv:2412.11154
6
citations
#407

AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs

Sanjoy Chowdhury, Hanan Gani, Nishit Anand et al.

ICCV 2025posterarXiv:2503.23219
6
citations
#408

Zeroth-Order Fine-Tuning of LLMs in Random Subspaces

Ziming Yu, Pan Zhou, Sike Wang et al.

ICCV 2025posterarXiv:2410.08989
6
citations
#409

Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection

Ruiyang Zhang, Hu Zhang, Zhedong Zheng

ICCV 2025posterarXiv:2408.00619
6
citations
#410

Event-based Tiny Object Detection: A Benchmark Dataset and Baselines

Nuo Chen, Chao Xiao, Yimian Dai et al.

ICCV 2025posterarXiv:2506.23575
6
citations
#411

AnyCalib: On-Manifold Learning for Model-Agnostic Single-View Camera Calibration

Javier Tirado-Garín, Javier Civera

ICCV 2025posterarXiv:2503.12701
6
citations
#412

FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation

Cui Miao, Tao Chang, meihan wu et al.

ICCV 2025posterarXiv:2508.02190
5
citations
#413

Importance-Based Token Merging for Efficient Image and Video Generation

Haoyu Wu, Jingyi Xu, Hieu Le et al.

ICCV 2025posterarXiv:2411.16720
5
citations
#414

Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations

Jeong Hun Yeo, Minsu Kim, Chae Won Kim et al.

ICCV 2025posterarXiv:2503.06273
5
citations
#415

MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance

Hallee Wong, Jose Javier Gonzalez Ortiz, John Guttag et al.

ICCV 2025posterarXiv:2412.15058
5
citations
#416

TikZero: Zero-Shot Text-Guided Graphics Program Synthesis

Jonas Belouadi, Eddy Ilg, Margret Keuper et al.

ICCV 2025highlightarXiv:2503.11509
5
citations
#417

LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal

Shr-Ruei Tsai, Wei-Cheng Chang, Jie-Ying Lee et al.

ICCV 2025posterarXiv:2510.15868
5
citations
#418

AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction

Junhao Cheng, Yuying Ge, Yixiao Ge et al.

ICCV 2025posterarXiv:2504.01014
5
citations
#419

Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models

Beier Zhu, Ruoyu Wang, Tong Zhao et al.

ICCV 2025posterarXiv:2507.14797
5
citations
#420

MOSCATO: Predicting Multiple Object State Change Through Actions

Parnian Zameni, Yuhan Shen, Ehsan Elhamifar

ICCV 2025poster
5
citations
#421

Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping

Weili Zeng, Ziyuan Huang, Kaixiang Ji et al.

ICCV 2025posterarXiv:2503.21817
5
citations
#422

Learning Interpretable Queries for Explainable Image Classification with Information Pursuit

Stefan Kolek, Aditya Chattopadhyay, Kwan Ho Ryan Chan et al.

ICCV 2025posterarXiv:2312.11548
5
citations
#423

PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling

Hao Zhang, Haolan Xu, Chun Feng et al.

ICCV 2025posterarXiv:2506.20936
5
citations
#424

DLF: Extreme Image Compression with Dual-generative Latent Fusion

Naifu Xue, Zhaoyang Jia, Jiahao Li et al.

ICCV 2025highlightarXiv:2503.01428
5
citations
#425

OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images

Ziyue Huang, Yongchao Feng, Ziqi Liu et al.

ICCV 2025posterarXiv:2503.06146
5
citations
#426

Fine-grained Spatiotemporal Grounding on Egocentric Videos

Shuo LIANG, Yiwu Zhong, Zi-Yuan Hu et al.

ICCV 2025posterarXiv:2508.00518
5
citations
#427

LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models

Yu Cheng, Fajie Yuan

ICCV 2025posterarXiv:2503.14325
5
citations
#428

Latent Diffusion Models with Masked AutoEncoders

Junho Lee, Jeongwoo Shin, Hyungwook Choi et al.

ICCV 2025posterarXiv:2507.09984
5
citations
#429

Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Zhengyao Lyu, Tianlin Pan, Chenyang Si et al.

ICCV 2025posterarXiv:2506.07986
5
citations
#430

Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation

Youwei Zheng, Yuxi Ren, Xin Xia et al.

ICCV 2025posterarXiv:2510.09094
5
citations
#431

BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes

Minkyun Seo, Hyungtae Lim, Kanghee Lee et al.

ICCV 2025highlightarXiv:2503.07940
5
citations
#432

Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation

Yujie Zhang, Bingyang Cui, Qi Yang et al.

ICCV 2025posterarXiv:2412.11170
5
citations
#433

egoPPG: Heart Rate Estimation from Eye-Tracking Cameras in Egocentric Systems to Benefit Downstream Vision Tasks

Björn Braun, Rayan Armani, Manuel Meier et al.

ICCV 2025posterarXiv:2502.20879
5
citations
#434

FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model

Yukang Cao, Chenyang Si, Jinghao Wang et al.

ICCV 2025posterarXiv:2507.01953
5
citations
#435

Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation

Tuna Meral, Enis Simsar, Federico Tombari et al.

ICCV 2025highlightarXiv:2403.19776
5
citations
#436

GaussianUpdate: Continual 3D Gaussian Splatting Update for Changing Environments

Lin Zeng, Boming Zhao, Jiarui Hu et al.

ICCV 2025posterarXiv:2508.08867
5
citations
#437

DreamFuse: Adaptive Image Fusion with Diffusion Transformer

Junjia Huang, Pengxiang Yan, Jiyang Liu et al.

ICCV 2025posterarXiv:2504.08291
5
citations
#438

SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering

Byeongjun Park, Hyojun Go, Hyelin Nam et al.

ICCV 2025posterarXiv:2503.12024
5
citations
#439

Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning

Qi Wang, Zhipeng Zhang, Baao Xie et al.

ICCV 2025posterarXiv:2503.08751
5
citations
#440

Hierarchical Cross-modal Prompt Learning for Vision-Language Models

Hao Zheng, Shunzhi Yang, Zhuoxin He et al.

ICCV 2025posterarXiv:2507.14976
5
citations
#441

Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers

Divyansh Srivastava, Xiang Zhang, He Wen et al.

ICCV 2025posterarXiv:2505.04718
5
citations
#442

Edit360: 2D Image Edits to 3D Assets from Any Angle

Junchao Huang, Xinting Hu, Shaoshuai Shi et al.

ICCV 2025highlightarXiv:2506.10507
5
citations
#443

Rectifying Magnitude Neglect in Linear Attention

Qihang Fan, Huaibo Huang, Yuang Ai et al.

ICCV 2025highlightarXiv:2507.00698
5
citations
#444

Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation

Kaining Ying, Henghui Ding, Guangquan Jie et al.

ICCV 2025posterarXiv:2507.22886
5
citations
#445

SweetTok: Semantic-Aware Spatial-Temporal Tokenizer for Compact Video Discretization

Zhentao Tan, Ben Xue, Jian Jia et al.

ICCV 2025posterarXiv:2412.10443
5
citations
#446

SEGS-SLAM: Structure-enhanced 3D Gaussian Splatting SLAM with Appearance Embedding

Tianci Wen, Zhiang Liu, Yongchun Fang

ICCV 2025posterarXiv:2501.05242
5
citations
#447

Multi-turn Consistent Image Editing

Zijun Zhou, Yingying Deng, Xiangyu He et al.

ICCV 2025posterarXiv:2505.04320
5
citations
#448

Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration

Junyuan Deng, Wei Yin, Xiaoyang Guo et al.

ICCV 2025posterarXiv:2411.17240
5
citations
#449

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding

Rui Hu, Yuxuan Zhang, Lianghui Zhu et al.

ICCV 2025posterarXiv:2503.10596
5
citations
#450

Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation

Ao Ma, Jiasong Feng, Ke Cao et al.

ICCV 2025posterarXiv:2508.08949
5
citations
#451

DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models

hongji yang, Wencheng Han, Yucheng Zhou et al.

ICCV 2025posterarXiv:2502.14779
5
citations
#452

TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation

Yinda Chen, Haoyuan Shi, Xiaoyu Liu et al.

ICCV 2025posterarXiv:2405.16847
5
citations
#453

X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation

jian ma, Qirong Peng, Xu Guo et al.

ICCV 2025posterarXiv:2503.06134
5
citations
#454

WaveMamba: Wavelet-Driven Mamba Fusion for RGB-Infrared Object Detection

Haodong Zhu, Wenhao Dong, Linlin Yang et al.

ICCV 2025posterarXiv:2507.18173
5
citations
#455

Adversarial Robust Memory-Based Continual Learner

Xiaoyue Mi, Fan Tang, Zonghan Yang et al.

ICCV 2025posterarXiv:2311.17608
5
citations
#456

Improving Multimodal Learning via Imbalanced Learning

Shicai Wei, Chunbo Luo, Yang Luo

ICCV 2025posterarXiv:2507.10203
5
citations
#457

PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Model

Jinhua Zhang, Hualian Sheng, Sijia Cai et al.

ICCV 2025posterarXiv:2407.06109
5
citations
#458

Gaussian Splatting with Discretized SDF for Relightable Assets

Zuo-Liang Zhu, jian Yang, Beibei Wang

ICCV 2025posterarXiv:2507.15629
5
citations
#459

PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning

Yan Zhang, Yao Feng, Alpár Cseke et al.

ICCV 2025posterarXiv:2503.17544
5
citations
#460

HERO: Human Reaction Generation from Videos

Chengjun Yu, Wei Zhai, Yuhang Yang et al.

ICCV 2025posterarXiv:2503.08270
5
citations
#461

Adaptive Hyper-Graph Convolution Network for Skeleton-based Human Action Recognition with Virtual Connections

Youwei Zhou, Tianyang Xu, Cong Wu et al.

ICCV 2025posterarXiv:2411.14796
5
citations
#462

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality

Sijie Li, Chen Chen, Jungong Han

ICCV 2025posterarXiv:2507.19264
5
citations
#463

Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models?

Yuru Jia, Valerio Marsocci, Ziyang Gong et al.

ICCV 2025posterarXiv:2503.07890
5
citations
#464

CaO2: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation

Haoxuan Wang, Zhenghao Zhao, Junyi Wu et al.

ICCV 2025poster
5
citations
#465

ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness

Boqian Li, Zeyu Cai, Michael Black et al.

ICCV 2025highlightarXiv:2503.10624
5
citations
#466

GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR

Christophe Bolduc, Yannick Hold-Geoffroy, Jean-Francois Lalonde

ICCV 2025posterarXiv:2504.10809
5
citations
#467

DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Model

Junjia Huang, Pengxiang Yan, Jinhang Cai et al.

ICCV 2025highlight
5
citations
#468

Φ-GAN:Physics-Inspired GAN for Generating SAR Images Under Limited Data

Xidan Zhang, Yihan Zhuang, Qian Guo et al.

ICCV 2025poster
5
citations
#469

Multi-identity Human Image Animation with Structural Video Diffusion

Zhenzhi Wang, Yixuan Li, yanhong zeng et al.

ICCV 2025posterarXiv:2504.04126
5
citations
#470

Towards Immersive Human-X Interaction: A Real-Time Framework for Physically Plausible Motion Synthesis

Kaiyang Ji, Ye Shi, Zichen Jin et al.

ICCV 2025highlightarXiv:2508.02106
5
citations
#471

Frequency-Dynamic Attention Modulation For Dense Prediction

Linwei Chen, Lin Gu, Ying Fu

ICCV 2025posterarXiv:2507.12006
5
citations
#472

Low-Light Image Enhancement using Event-Based Illumination Estimation

Lei Sun, Yuhan Bao, Jiajun Zhai et al.

ICCV 2025posterarXiv:2504.09379
5
citations
#473

SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image

Dimitrije Antić, Georgios Paschalidis, Shashank Tripathi et al.

ICCV 2025posterarXiv:2409.16178
5
citations
#474

CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation

Zhuoyan Luo, Yinghao Wu, Tianheng Cheng et al.

ICCV 2025posterarXiv:2405.15658
5
citations
#475

SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

Gengze Zhou, Yicong Hong, Zun Wang et al.

ICCV 2025posterarXiv:2412.05552
5
citations
#476

Learning Normal Flow Directly From Events

Dehao Yuan, Levi Burner, Jiayi Wu et al.

ICCV 2025posterarXiv:2412.11284
5
citations
#477

DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization

Wenchuan Wang, Mengqi Huang, Yijing Tu et al.

ICCV 2025posterarXiv:2505.02192
5
citations
#478

Make Your Training Flexible: Towards Deployment-Efficient Video Models

Chenting Wang, Kunchang Li, Tianxiang Jiang et al.

ICCV 2025posterarXiv:2503.14237
5
citations
#479

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Wenqi Zhang, Hang Zhang, Xin Li et al.

ICCV 2025highlightarXiv:2501.00958
5
citations
#480

Bokehlicious: Photorealistic Bokeh Rendering with Controllable Apertures

Tim Seizinger, Florin-Alexandru Vasluianu, Marcos Conde et al.

ICCV 2025highlightarXiv:2503.16067
5
citations
#481

OpenM3D: Open Vocabulary Multi-view Indoor 3D Object Detection without Human Annotations

Peng-Hao Hsu, Ke Zhang, Fu-En Wang et al.

ICCV 2025posterarXiv:2508.20063
5
citations
#482

GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts

Minwen Liao, Hao Dong, Xinyi Wang et al.

ICCV 2025posterarXiv:2503.07417
5
citations
#483

NeurOp-Diff: Continuous Remote Sensing Image Super-Resolution via Neural Operator Diffusion

Zihao Xu, Yuzhi Tang, Bowen Xu et al.

ICCV 2025
5
citations
#484

Salvaging the Overlooked: Leveraging Class-Aware Contrastive Learning for Multi-Class Anomaly Detection

Lei Fan, Junjie Huang, Donglin Di et al.

ICCV 2025posterarXiv:2412.04769
5
citations
#485

CAVIS: Context-Aware Video Instance Segmentation

Seunghun Lee, Jiwan Seo, Kiljoon Han et al.

ICCV 2025posterarXiv:2407.03010
4
citations
#486

DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation

Donglin Di, He Feng, Wenzhang SUN et al.

ICCV 2025posterarXiv:2410.07151
4
citations
#487

ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

Shaofeng Yin, Ting Lei, Yang Liu

ICCV 2025posterarXiv:2508.03284
4
citations
#488

Generative Zoo

Tomasz Niewiadomski, Anastasios Yiannakidis, Hanz Cuevas Velasquez et al.

ICCV 2025posterarXiv:2412.08101
4
citations
#489

Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos

Changwoon Choi, Jeongjun Kim, Geonho Cha et al.

ICCV 2025posterarXiv:2412.19089
4
citations
#490

TurboTrain: Towards Efficient and Balanced Multi-Task Learning for Multi-Agent Perception and Prediction

Zewei Zhou, Zhihao Zhao, Tianhui Cai et al.

ICCV 2025posterarXiv:2508.04682
4
citations
#491

OuroMamba: A Data-Free Quantization Framework for Vision Mamba

Akshat Ramachandran, Mingyu Lee, Huan Xu et al.

ICCV 2025posterarXiv:2503.10959
4
citations
#492

CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation

Leon Sick, Dominik Engel, Sebastian Hartwig et al.

ICCV 2025posterarXiv:2411.16319
4
citations
#493

RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping

Dongming Wu, Yanping Fu, Saike Huang et al.

ICCV 2025posterarXiv:2507.23734
4
citations
#494

OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering

Shiyong Liu, Xiao Tang, Zhihao Li et al.

ICCV 2025posterarXiv:2503.16177
4
citations
#495

Learning to Generalize without Bias for Open-Vocabulary Action Recognition

Yating Yu, Congqi Cao, Yifan Zhang et al.

ICCV 2025highlightarXiv:2502.20158
4
citations
#496

Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning

Tianyi Zhao, Boyang Liu, Yanglei Gao et al.

ICCV 2025posterarXiv:2503.11780
4
citations
#497

Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation

Jie Xu, Na Zhao, Gang Niu et al.

ICCV 2025posterarXiv:2503.04151
4
citations
#498

Learning to Unlearn while Retaining: Combating Gradient Conflicts in Machine Unlearning

Gaurav Patel, Qiang Qiu

ICCV 2025posterarXiv:2503.06339
4
citations
#499

INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

Chenwei Lin, Hanjia Lyu, Xian Xu et al.

ICCV 2025posterarXiv:2406.09105
4
citations
#500

DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction

Rui Wang, Quentin Lohmeyer, Mirko Meboldt et al.

ICCV 2025posterarXiv:2503.13176
4
citations
#501

GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology

Saarthak Kapse, Pushpak Pati, Srikar Yellapragada et al.

ICCV 2025highlightarXiv:2504.01009
4
citations
#502

SAM4D: Segment Anything in Camera and LiDAR Streams

Jianyun Xu, Song Wang, Ziqian Ni et al.

ICCV 2025posterarXiv:2506.21547
4
citations
#503

Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing

Joonghyuk Shin, Alchan Hwang, Yujin Kim et al.

ICCV 2025posterarXiv:2508.07519
4
citations
#504

Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving

Hao Zhou, Zhanning Gao, Zhili Chen et al.

ICCV 2025posterarXiv:2411.13076
4
citations
#505

VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges

Yuxuan Wang, Yiqi Song, Cihang Xie et al.

ICCV 2025posterarXiv:2409.01071
4
citations
#506

Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization

Li, Yang Xiao, Jie Ji et al.

ICCV 2025posterarXiv:2504.09039
4
citations
#507

MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips

SHIBO WANG, Haonan He, Maria Parelli et al.

ICCV 2025posterarXiv:2508.05506
4
citations
#508

VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving

Ruifei Zhang, Wei Zhang, Xiao Tan et al.

ICCV 2025posterarXiv:2511.06256
4
citations
#509

A Token-level Text Image Foundation Model for Document Understanding

Tongkun Guan, Zining Wang, Pei Fu et al.

ICCV 2025posterarXiv:2503.02304
4
citations
#510

Semantic Watermarking Reinvented: Enhancing Robustness and Generation Quality with Fourier Integrity

Sung Ju Lee, Nam Ik Cho

ICCV 2025posterarXiv:2509.07647
4
citations
#511

QuickSplat: Fast 3D Surface Reconstruction via Learned Gaussian Initialization

Yueh-Cheng Liu, Lukas Höllein, Matthias Nießner et al.

ICCV 2025posterarXiv:2505.05591
4
citations
#512

SP2T: Sparse Proxy Attention for Dual-stream Point Transformer

Jiaxu Wan, Hong Zhang, Ziqi He et al.

ICCV 2025poster
4
citations
#513

LightSwitch: Multi-view Relighting with Material-guided Diffusion

Yehonathan Litman, Fernando De la Torre, Shubham Tulsiani

ICCV 2025posterarXiv:2508.06494
4
citations
#514

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

Andreas Engelhardt, Mark Boss, Vikram Voleti et al.

ICCV 2025posterarXiv:2510.08271
4
citations
#515

TrustMark: Robust Watermarking and Watermark Removal for Arbitrary Resolution Images

Tu Bui, Shruti Agarwal, John Collomosse

ICCV 2025poster
4
citations
#516

Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models

Sangwon Baik, Hyeonwoo Kim, Hanbyul Joo

ICCV 2025posterarXiv:2503.19914
4
citations
#517

Balanced Image Stylization with Style Matching Score

Yuxin Jiang, Liming Jiang, Shuai Yang et al.

ICCV 2025posterarXiv:2503.07601
4
citations
#518

PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Teng Zhou, Xiaoyu Zhang, Yongchuan Tang

ICCV 2025highlightarXiv:2411.15867
4
citations
#519

NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes

Han-Hung Lee, Qinghong Han, Angel Chang

ICCV 2025posterarXiv:2503.16375
4
citations
#520

PriorMotion: Generative Class-Agnostic Motion Prediction with Raster-Vector Motion Field Priors

Kangan Qian, Jinyu Miao, Xinyu Jiao et al.

ICCV 2025poster
4
citations
#521

ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting

Ruijie Zhu, Mulin Yu, Linning Xu et al.

ICCV 2025posterarXiv:2507.15454
4
citations
#522

DuoLoRA : Cycle-consistent and Rank-disentangled Content-Style Personalization

Aniket Roy, Shubhankar Borse, Shreya Kadambi et al.

ICCV 2025posterarXiv:2504.13206
4
citations
#523

SparseRecon: Neural Implicit Surface Reconstruction from Sparse Views with Feature and Depth Consistencies

Liang Han, Xu Zhang, Haichuan Song et al.

ICCV 2025posterarXiv:2508.00366
4
citations
#524

MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network

Jianfei Jiang, Qiankun Liu, Haochen Yu et al.

ICCV 2025posterarXiv:2507.11333
4
citations
#525

LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

Nan Chen, Mengqi Huang, Yihao Meng et al.

ICCV 2025posterarXiv:2507.01945
4
citations
#526

ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models

Shadi Hamdan, Chonghao Sima, Zetong Yang et al.

ICCV 2025posterarXiv:2506.07725
4
citations
#527

ResGS: Residual Densification of 3D Gaussian for Efficient Detail Recovery

Yanzhe Lyu, Kai Cheng, Kang Xin et al.

ICCV 2025posterarXiv:2412.07494
4
citations
#528

Towards Higher Effective Rank in Parameter-Efficient Fine-tuning using Khatri-Rao Product

Paul Albert, Frederic Zhang, Hemanth Saratchandran et al.

ICCV 2025posterarXiv:2508.00230
4
citations
#529

Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation

Tiange Xiang, Kai Li, Chengjiang Long et al.

ICCV 2025posterarXiv:2503.15877
4
citations
#530

CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems

Aniket Rege, Zinnia Nie, Unmesh Raskar et al.

ICCV 2025posterarXiv:2506.08071
4
citations
#531

StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting

Shakiba Kheradmand, Delio Vicini, George Kopanas et al.

ICCV 2025posterarXiv:2503.24366
4
citations
#532

CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation

Xiangyang Luo, Ye Zhu, Yunfei Liu et al.

ICCV 2025posterarXiv:2507.02691
4
citations
#533

CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models

Gaoyang Zhang, Bingtao Fu, Qingnan Fan et al.

ICCV 2025posterarXiv:2412.13195
4
citations
#534

Knowledge Distillation with Refined Logits

Wujie Sun, Defang Chen, Siwei Lyu et al.

ICCV 2025posterarXiv:2408.07703
4
citations
#535

DistillDrive: End-to-End Multi-Mode Autonomous Driving Distillation by Isomorphic Hetero-Source Planning Model

Rui Yu, Xianghang Zhang, Runkai Zhao et al.

ICCV 2025posterarXiv:2508.05402
4
citations
#536

VGGSounder: Audio-Visual Evaluations for Foundation Models

Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu et al.

ICCV 2025posterarXiv:2508.08237
4
citations
#537

GS-ID: Illumination Decomposition on Gaussian Splatting via Adaptive Light Aggregation and Diffusion-Guided Material Priors

Kang DU, Zhihao Liang, Yulin Shen et al.

ICCV 2025posterarXiv:2408.08524
4
citations
#538

DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding

Wenwen Yu, Zhibo Yang, Yuliang Liu et al.

ICCV 2025posterarXiv:2508.08589
4
citations
#539

Auto-Regressively Generating Multi-View Consistent Images

JiaKui Hu, Yuxiao Yang, Jialun Liu et al.

ICCV 2025posterarXiv:2506.18527
4
citations
#540

GAP: Gaussianize Any Point Clouds with Text Guidance

Weiqi Zhang, Junsheng Zhou, Haotian Geng et al.

ICCV 2025posterarXiv:2508.05631
4
citations
#541

Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models

Zhen Zeng, Leijiang Gu, Xun Yang et al.

ICCV 2025posterarXiv:2411.12790
4
citations
#542

MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models

Young-Jun Lee, Byung-Kwan Lee, Jianshu Zhang et al.

ICCV 2025posterarXiv:2510.16641
4
citations
#543

MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

Shuangkang Fang, I-Chao Shen, Yufeng Wang et al.

ICCV 2025highlightarXiv:2508.01242
4
citations
#544

iManip: Skill-Incremental Learning for Robotic Manipulation

Zexin Zheng, Jia-Feng Cai, Xiao-Ming Wu et al.

ICCV 2025posterarXiv:2503.07087
4
citations
#545

7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting

Zhongpai Gao, Benjamin Planche, Meng Zheng et al.

ICCV 2025posterarXiv:2503.07946
4
citations
#546

SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation

Wenjia Wang, Liang Pan, Zhiyang Dou et al.

ICCV 2025posterarXiv:2411.19921
4
citations
#547

Motion Synthesis with Sparse and Flexible Keyjoint Control

Inwoo Hwang, Jinseok Bae, Donggeun Lim et al.

ICCV 2025posterarXiv:2503.15557
4
citations
#548

I2V3D: Controllable Image-to-video Generation with 3D Guidance

Zhiyuan Zhang, Dongdong Chen, Jing Liao

ICCV 2025posterarXiv:2503.09733
4
citations
#549

HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis

Timo Teufel, xilong zhou, Umar Iqbal et al.

ICCV 2025posterarXiv:2508.09137
4
citations
#550

Dynamic Multimodal Prototype Learning in Vision-Language Models

Xingyu Zhu, Shuo Wang, Beier Zhu et al.

ICCV 2025posterarXiv:2507.03657
4
citations
#551

HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction

Sara Rojas Martinez, Matthieu Armando, Bernard Ghanem et al.

ICCV 2025posterarXiv:2508.16433
4
citations
#552

Precise Action-to-Video Generation Through Visual Action Prompts

Yuang Wang, Chao Wen, Haoyu Guo et al.

ICCV 2025posterarXiv:2508.13104
4
citations
#553

VertexRegen: Mesh Generation with Continuous Level of Detail

Xiang Zhang, Yawar Siddiqui, Armen Avetisyan et al.

ICCV 2025posterarXiv:2508.09062
4
citations
#554

Free-Form Motion Control: Controlling the 6D Poses of Camera and Objects in Video Generation

Xincheng Shuai, Henghui Ding, Zhenyuan Qin et al.

ICCV 2025posterarXiv:2501.01425
4
citations
#555

Auto-Vocabulary Semantic Segmentation

Osman Ülger, Maksymilian Kulicki, Yuki Asano et al.

ICCV 2025posterarXiv:2312.04539
4
citations
#556

DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer

Yecheng Wu, Han Cai, Junyu Chen et al.

ICCV 2025posterarXiv:2507.04947
4
citations
#557

DuCos: Duality Constrained Depth Super-Resolution via Foundation Model

Zhiqiang Yan, Zhengxue Wang, Haoye Dong et al.

ICCV 2025posterarXiv:2503.04171
4
citations
#558

Not All Frame Features Are Equal: Video-to-4D Generation via Decoupling Dynamic-Static Features

Liying Yang, Chen Liu, Zhenwei Zhu et al.

ICCV 2025highlightarXiv:2502.08377
4
citations
#559

Synergistic Prompting for Robust Visual Recognition with Missing Modalities

Zhihui Zhang, Luanyuan Dai, Qika Lin et al.

ICCV 2025posterarXiv:2507.07802
4
citations
#560

EAMamba: Efficient All-Around Vision State Space Model for Image Restoration

Yu-Cheng Lin, Yu-Syuan Xu, Hao-Wei Chen et al.

ICCV 2025posterarXiv:2506.22246
4
citations
#561

VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization

Sihan Yang, Runsen Xu, Chenhang Cui et al.

ICCV 2025posterarXiv:2508.05211
4
citations
#562

Neural Shell Texture Splatting: More Details and Fewer Primitives

Xin Zhang, Anpei Chen, Jincheng Xiong et al.

ICCV 2025posterarXiv:2507.20200
4
citations
#563

Video Motion Graphs

Haiyang Liu, Zhan Xu, Fating Hong et al.

ICCV 2025highlightarXiv:2503.20218
4
citations
#564

TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning In Text-to-Image Models

Teng-Fang Hsiao, Bo-Kai Ruan, Yi-Lun Wu et al.

ICCV 2025posterarXiv:2503.15283
4
citations
#565

GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning

Kelin Yu, Sheng Zhang, Harshit Soora et al.

ICCV 2025posterarXiv:2508.11049
4
citations
#566

Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation by Object Tracking

Qiangqiang Wu, Yi Yu, Chenqi Kong et al.

ICCV 2025posterarXiv:2507.07483
4
citations
#567

EgoM2P: Egocentric Multimodal Multitask Pretraining

Gen Li, Yutong Chen, Yiqian Wu et al.

ICCV 2025posterarXiv:2506.07886
4
citations
#568

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation

Emmanuelle Bourigault, Amir Jamaludin, Abdullah Hamdi

ICCV 2025posterarXiv:2504.06908
4
citations
#569

SMoLoRA: Exploring and Defying Dual Catastrophic Forgetting in Continual Visual Instruction Tuning

Ziqi Wang, Chang Che, Qi Wang et al.

ICCV 2025posterarXiv:2411.13949
4
citations
#570

Self-Calibrated Variance-Stabilizing Transformations for Real-World Image Denoising

Sébastien Herbreteau, Michael Unser

ICCV 2025posterarXiv:2407.17399
4
citations
#571

UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation

Chaitanya Patel, Hiroki Nakamura, Yuta Kyuragi et al.

ICCV 2025posterarXiv:2508.01126
4
citations
#572

Beyond [cls]: Exploring the True Potential of Masked Image Modeling Representations

Marcin Przewięźlikowski, Randall Balestriero, Wojciech Jasiński et al.

ICCV 2025posterarXiv:2412.03215
4
citations
#573

Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning

Jun Li, Jinpeng Wang, Chaolei Tan et al.

ICCV 2025posterarXiv:2507.17402
4
citations
#574

C4D: 4D Made from 3D through Dual Correspondences

Shizun Wang, Zhenxiang Jiang, Xingyi Yang et al.

ICCV 2025posterarXiv:2510.14960
4
citations
#575

VOVTrack: Exploring the Potentiality in Raw Videos for Open-Vocabulary Multi-Object Tracking

Zekun Qian, Ruize Han, Junhui Hou et al.

ICCV 2025poster
4
citations
#576

MonoFusion: Sparse-View 4D Reconstruction via Monocular Fusion

Zihan Wang, Jeff Tan, Tarasha Khurana et al.

ICCV 2025posterarXiv:2507.23782
4
citations
#577

MikuDance: Animating Character Art with Mixed Motion Dynamics

Jiaxu Zhang, Xianfang Zeng, Xin Chen et al.

ICCV 2025posterarXiv:2411.08656
4
citations
#578

JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers

Kwon Byung-Ki, Qi Dai, Lee Hyoseok et al.

ICCV 2025posterarXiv:2505.00482
4
citations
#579

How Can Objects Help Video-Language Understanding?

Zitian Tang, Shijie Wang, Junho Cho et al.

ICCV 2025posterarXiv:2504.07454
3
citations
#580

HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars

Byungjun Kim, Shunsuke Saito, Giljoo Nam et al.

ICCV 2025posterarXiv:2507.19481
3
citations
#581

ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail

Chandan Yeshwanth, David Rozenberszki, Angela Dai

ICCV 2025posterarXiv:2503.17044
3
citations
#582

Exploiting Diffusion Prior for Task-driven Image Restoration

Jaeha Kim, Junghun Oh, Kyoung Mu Lee

ICCV 2025posterarXiv:2507.22459
3
citations
#583

Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis

Chen Zhao, Xuan Wang, Tong Zhang et al.

ICCV 2025posterarXiv:2411.00144
3
citations
#584

Jigsaw++: Imagining Complete Shape Priors for Object Reassembly

Jiaxin Lu, Gang Hua, Qixing Huang

ICCV 2025posterarXiv:2410.11816
3
citations
#585

Heavy Labels Out! Dataset Distillation with Label Space Lightening

Ruonan Yu, Songhua Liu, Zigeng Chen et al.

ICCV 2025posterarXiv:2408.08201
3
citations
#586

MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation

Fu Rong, Meng Lan, Qian Zhang et al.

ICCV 2025posterarXiv:2501.13667
3
citations
#587

Predict-Optimize-Distill: A Self-Improving Cycle for 4D Object Understanding

Mingxuan Wu, Huang Huang, Justin Kerr et al.

ICCV 2025posterarXiv:2504.17441
3
citations
#588

GT-Loc: Unifying When and Where in Images through a Joint Embedding Space

David G. Shatwell, Ishan Rajendrakumar Dave, Swetha Sirnam et al.

ICCV 2025posterarXiv:2507.10473
3
citations
#589

On the Generalization of Representation Uncertainty in Earth Observation

Spyros Kondylatos, Nikolaos Ioannis Bountos, Dimitrios Michail et al.

ICCV 2025posterarXiv:2503.07082
3
citations
#590

Joint Diffusion Models in Continual Learning

Paweł Skierś, Kamil Deja

ICCV 2025posterarXiv:2411.08224
3
citations
#591

ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling

Jinhyung Park, Javier Romero, Shunsuke Saito et al.

ICCV 2025posterarXiv:2508.15767
3
citations
#592

Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features

Chancharik Mitra, Brandon Huang, Tianning Chai et al.

ICCV 2025posterarXiv:2412.00142
3
citations
#593

LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion

Fangfu Liu, Hao Li, Jiawei Chi et al.

ICCV 2025posterarXiv:2507.02813
3
citations
#594

VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding

Minchao Jiang, Shunyu Jia, Jiaming Gu et al.

ICCV 2025posterarXiv:2506.22799
3
citations
#595

ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering

Kaisi Guan, Zhengfeng Lai, Yuchong Sun et al.

ICCV 2025posterarXiv:2503.16867
3
citations
#596

RoboTron-Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction

Yufeng Zhong, Chengjian Feng, Feng yan et al.

ICCV 2025posterarXiv:2503.18525
3
citations
#597

Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA

Zhixuan Li, Hyunse Yoon, Sanghoon Lee et al.

ICCV 2025posterarXiv:2503.10225
3
citations
#598

Joint Self-Supervised Video Alignment and Action Segmentation

Ali Shah Ali, Syed Ahmed Mahmood, Mubin Saeed et al.

ICCV 2025posterarXiv:2503.16832
3
citations
#599

ARGUS: Hallucination and Omission Evaluation in Video-LLMs

Ruchit Rawal, Reza Shirkavand, Heng Huang et al.

ICCV 2025posterarXiv:2506.07371
3
citations
#600

ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers

Nicholas DiBrita, Jason Han, Tirthak Patel

ICCV 2025posterarXiv:2506.21537
3
citations