Most Cited ICCV "exploration rate" Papers

2,701 papers found • Page 2 of 14

#201

FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing

Tianyi Wei, Yifan Zhou, Dongdong Chen et al.

ICCV 2025arXiv:2503.16153
12
citations
#202

MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm

Ziyan Guo, Zeyu HU, Na Zhao et al.

ICCV 2025arXiv:2502.02358
12
citations
#203

Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs

Jeongseok Hyun, Sukjun Hwang, Su Ho Han et al.

ICCV 2025arXiv:2507.07990
12
citations
#204

LUDVIG: Learning-Free Uplifting of 2D Visual Features to Gaussian Splatting Scenes

Juliette Marrie, Romain Menegaux, Michael Arbel et al.

ICCV 2025arXiv:2410.14462
12
citations
#205

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLMs

Xinyu Fang, Zhijian Chen, Kai Lan et al.

ICCV 2025arXiv:2503.14478
12
citations
#206

Mobile Video Diffusion

Haitam Ben Yahia, Denis Korzhenkov, Ioannis Lelekas et al.

ICCV 2025arXiv:2412.07583
12
citations
#207

Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving

Yue Li, Meng Tian, Zhenyu Lin et al.

ICCV 2025arXiv:2503.21505
12
citations
#208

RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models

Yijing Lin, Mengqi Huang, Shuhan Zhuang et al.

ICCV 2025arXiv:2503.10406
12
citations
#209

InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models

Cong Wei, Yujie Zhong, yingsen zeng et al.

ICCV 2025arXiv:2412.14006
12
citations
#210

Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection

Jiawen Zhu, YEW-SOON ONG, Chunhua Shen et al.

ICCV 2025arXiv:2410.10289
12
citations
#211

Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts

Yun Wang, Longguang Wang, Chenghao Zhang et al.

ICCV 2025highlightarXiv:2507.04631
11
citations
#212

Flash-VStream: Efficient Real-Time Understanding for Long Video Streams

Haoji Zhang, Yiqin Wang, Yansong Tang et al.

ICCV 2025arXiv:2506.23825
11
citations
#213

EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model

Shengqi Dang, Yi He, Long Ling et al.

ICCV 2025arXiv:2501.05710
11
citations
#214

SeaS: Few-shot Industrial Anomaly Image Generation with Separation and Sharing Fine-tuning

Zhewei Dai, Shilei Zeng, Haotian Liu et al.

ICCV 2025arXiv:2410.14987
11
citations
#215

Learning Few-Step Diffusion Models by Trajectory Distribution Matching

Yihong Luo, Tianyang Hu, Jiacheng Sun et al.

ICCV 2025arXiv:2503.06674
11
citations
#216

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Fating Hong, Zunnan Xu, Zixiang Zhou et al.

ICCV 2025arXiv:2504.02542
11
citations
#217

Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?

Tianyuan Qu, Longxiang Tang, Bohao PENG et al.

ICCV 2025arXiv:2503.12496
11
citations
#218

Synthetic Video Enhances Physical Fidelity in Video Synthesis

Qi Zhao, Xingyu Ni, Ziyu Wang et al.

ICCV 2025arXiv:2503.20822
11
citations
#219

Effective Training Data Synthesis for Improving MLLM Chart Understanding

Yuwei Yang, Zeyu Zhang, Yunzhong Hou et al.

ICCV 2025arXiv:2508.06492
11
citations
#220

XTrack: Multimodal Training Boosts RGB-X Video Object Trackers

Yuedong Tan, Zongwei Wu, Yuqian Fu et al.

ICCV 2025arXiv:2405.17773
11
citations
#221

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

Yiran Qin, Li Kang, Xiufeng Song et al.

ICCV 2025arXiv:2503.16408
11
citations
#222

Reangle-A-Video: 4D Video Generation as Video-to-Video Translation

Hyeonho Jeong, Suhyeon Lee, Jong Ye

ICCV 2025arXiv:2503.09151
11
citations
#223

Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding

Yiming Zhang, Zhuokai Zhao, Zhaorun Chen et al.

ICCV 2025arXiv:2411.14401
11
citations
#224

UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving

Rui Chen, Zehuan Wu, Yichen Liu et al.

ICCV 2025arXiv:2412.04842
11
citations
#225

LBM: Latent Bridge Matching for Fast Image-to-Image Translation

Clément Chadebec, Onur Tasar, Sanjeev Sreetharan et al.

ICCV 2025highlightarXiv:2503.07535
11
citations
#226

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers

Renshan Zhang, Rui Shao, Gongwei Chen et al.

ICCV 2025arXiv:2501.16297
11
citations
#227

MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild

Xi Fang, Jiankun Wang, Xiaochen Cai et al.

ICCV 2025arXiv:2411.11098
11
citations
#228

RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving

Zhijian Huang, Chengjian Feng, Baihui Xiao et al.

ICCV 2025arXiv:2412.07689
11
citations
#229

ViSpeak: Visual Instruction Feedback in Streaming Videos

Shenghao Fu, Qize Yang, Yuan-Ming Li et al.

ICCV 2025arXiv:2503.12769
11
citations
#230

Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding

Yue Fan, Xiaojian Ma, Rongpeng Su et al.

ICCV 2025highlightarXiv:2501.00358
11
citations
#231

3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding

Tatiana Zemskova, Dmitry Yudin

ICCV 2025arXiv:2412.18450
11
citations
#232

CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models

Hao He, Ceyuan Yang, Shanchuan Lin et al.

ICCV 2025arXiv:2503.10592
10
citations
#233

Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution

Du Chen, Liyi Chen, Zhengqiang ZHANG et al.

ICCV 2025arXiv:2501.06838
10
citations
#234

WildSAT: Learning Satellite Image Representations from Wildlife Observations

Rangel Daroya, Elijah Cole, Oisin Mac Aodha et al.

ICCV 2025arXiv:2412.14428
10
citations
#235

WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image

Yuci Liang, Xinheng Lyu, Meidan Ding et al.

ICCV 2025arXiv:2412.02141
10
citations
#236

AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation

zijie wu, Chaohui Yu, Fan Wang et al.

ICCV 2025arXiv:2506.09982
10
citations
#237

3D Mesh Editing using Masked LRMs

William Gao, Dilin Wang, Yuchen Fan et al.

ICCV 2025arXiv:2412.08641
10
citations
#238

RANKCLIP: Ranking-Consistent Language-Image Pretraining

Yiming Zhang, Zhuokai Zhao, Zhaorun Chen et al.

ICCV 2025arXiv:2404.09387
10
citations
#239

TACO: Taming Diffusion for in-the-wild Video Amodal Completion

Ruijie Lu, Yixin Chen, Yu Liu et al.

ICCV 2025arXiv:2503.12049
10
citations
#240

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

Yikang Zhou, Tao Zhang, Shilin Xu et al.

ICCV 2025arXiv:2501.04670
10
citations
#241

Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling

Fengxiang Wang, Hongzhen Wang, Di Wang et al.

ICCV 2025arXiv:2406.11933
10
citations
#242

MINERVA: Evaluating Complex Video Reasoning

Arsha Nagrani, Sachit Menon, Ahmet Iscen et al.

ICCV 2025arXiv:2505.00681
10
citations
#243

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs

Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta et al.

ICCV 2025arXiv:2501.02135
10
citations
#244

Spectral Image Tokenizer

Carlos Esteves, Mohammed Suhail, Ameesh Makadia

ICCV 2025arXiv:2412.09607
10
citations
#245

X-Dancer: Expressive Music to Human Dance Video Generation

Zeyuan Chen, Hongyi Xu, Guoxian Song et al.

ICCV 2025highlightarXiv:2502.17414
10
citations
#246

SuperDec: 3D Scene Decomposition with Superquadrics Primitives

Elisabetta Fedele, Boyang Sun, Francis Engelmann et al.

ICCV 2025arXiv:2504.00992
10
citations
#247

RomanTex: Decoupling 3D-aware Rotary Positional Embedded Multi-Attention Network for Texture Synthesis

yifei feng, Mx Yang, Shuhui Yang et al.

ICCV 2025arXiv:2503.19011
10
citations
#248

A Recipe for Generating 3D Worlds from a Single Image

Katja Schwarz, Denis Rozumny, Samuel Rota Bulò et al.

ICCV 2025arXiv:2503.16611
10
citations
#249

LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models

Haiwen Huang, Anpei Chen, Volodymyr Havrylov et al.

ICCV 2025arXiv:2504.14032
10
citations
#250

DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers

Hanling Zhang, Rundong Su, Zhihang Yuan et al.

ICCV 2025arXiv:2503.22796
10
citations
#251

Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

Yuxuan Wang, Xuanyu Yi, Haohan Weng et al.

ICCV 2025arXiv:2501.14317
10
citations
#252

Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search

Shuyu Yang, Yaxiong Wang, Li Zhu et al.

ICCV 2025highlightarXiv:2411.17776
10
citations
#253

DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation

Jiangran Lyu, Ziming Li, Xuesong Shi et al.

ICCV 2025arXiv:2503.16806
10
citations
#254

MagicColor: Multi-instance Sketch Colorization

yinhan Zhang, Yue Ma, Bingyuan Wang et al.

ICCV 2025
10
citations
#255

Visual Test-time Scaling for GUI Agent Grounding

Tiange Luo, Lajanugen Logeswaran, Justin Johnson et al.

ICCV 2025highlightarXiv:2505.00684
10
citations
#256

Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program

Minghe Gao, Xuqi Liu, Zhongqi Yue et al.

ICCV 2025arXiv:2504.06606
10
citations
#257

UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence

Jie Feng, Shengyuan Wang, Tianhui Liu et al.

ICCV 2025arXiv:2506.23219
10
citations
#258

RoMo: Robust Motion Segmentation Improves Structure from Motion

Lily Goli, Sara Sabour, Mark Matthews et al.

ICCV 2025arXiv:2411.18650
10
citations
#259

ROADWork: A Dataset and Benchmark for Learning to Recognize, Observe, Analyze and Drive Through Work Zones

Anurag Ghosh, Shen Zheng, Robert Tamburo et al.

ICCV 2025arXiv:2406.07661
10
citations
#260

Di[M]O: Distilling Masked Diffusion Models into One-step Generator

Yuanzhi Zhu, Xi WANG, Stéphane Lathuilière et al.

ICCV 2025
10
citations
#261

Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM

Yatai Ji, Jiacheng Zhang, Jie Wu et al.

ICCV 2025arXiv:2412.15156
10
citations
#262

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition

Xin Ding, Hao Wu, Yifan Yang et al.

ICCV 2025arXiv:2503.06220
9
citations
#263

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

Jiashuo Yu, Yue Wu, Meng Chu et al.

ICCV 2025arXiv:2506.10857
9
citations
#264

GAS: Generative Avatar Synthesis from a Single Image

Yixing Lu, Junting Dong, YoungJoong Kwon et al.

ICCV 2025arXiv:2502.06957
9
citations
#265

HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model

Tao Wang, Changxu Cheng, Lingfeng Wang et al.

ICCV 2025arXiv:2503.13026
9
citations
#266

SuperMat: Physically Consistent PBR Material Estimation at Interactive Rates

Yijia Hong, Yuan-Chen Guo, Ran Yi et al.

ICCV 2025arXiv:2411.17515
9
citations
#267

UAVScenes: A Multi-Modal Dataset for UAVs

Sijie Wang, Siqi Li, Yawei Zhang et al.

ICCV 2025arXiv:2507.22412
9
citations
#268

GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers

Shijie Ma, Yuying Ge, Teng Wang et al.

ICCV 2025arXiv:2503.19480
9
citations
#269

Perspective-Invariant 3D Object Detection

Alan Liang, Lingdong Kong, Dongyue Lu et al.

ICCV 2025arXiv:2507.17665
9
citations
#270

Make Me Happier: Evoking Emotions Through Image Diffusion Models

Qing Lin, Jingfeng Zhang, YEW-SOON ONG et al.

ICCV 2025arXiv:2403.08255
9
citations
#271

SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images

Gencer Sumbul, Chang Xu, Emanuele Dalsasso et al.

ICCV 2025arXiv:2506.19585
9
citations
#272

MultiADS: Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning

Ylli Sadikaj, Hongkuan Zhou, Lavdim Halilaj et al.

ICCV 2025arXiv:2504.06740
9
citations
#273

IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves

Ruofan Wang, Juncheng Li, Yixu Wang et al.

ICCV 2025arXiv:2411.00827
9
citations
#274

Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation

Shengqi Liu, Yuhao Cheng, Zhuo Chen et al.

ICCV 2025arXiv:2412.14453
9
citations
#275

LATINO-PRO: LAtent consisTency INverse sOlver with PRompt Optimization

Alessio Spagnoletti, Jean Prost, Andres Almansa et al.

ICCV 2025arXiv:2503.12615
9
citations
#276

Fine-Tuning Visual Autogressive Models for Subject-Driven Generation

Jiwoo Chung, Sangeek Hyun, Hyunjun Kim et al.

ICCV 2025
9
citations
#277

AutoOcc: Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting

Xiaoyu Zhou, Jingqi Wang, Yongtao Wang et al.

ICCV 2025highlightarXiv:2502.04981
9
citations
#278

Quadratic Gaussian Splatting: High Quality Surface Reconstruction with Second-order Geometric Primitives

ziyu zhang, Binbin Huang, Hanqing Jiang et al.

ICCV 2025arXiv:2411.16392
9
citations
#279

SILO: Solving Inverse Problems with Latent Operators

Ron Raphaeli, Sean Man, Michael Elad

ICCV 2025arXiv:2501.11746
9
citations
#280

FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation in Extreme Cases

Shuai Tan, Bill Gong, Bin Ji et al.

ICCV 2025arXiv:2507.01390
9
citations
#281

Multimodal LLM Guided Exploration and Active Mapping using Fisher Information

Wen Jiang, BOSHU LEI, Katrina Ashton et al.

ICCV 2025arXiv:2410.17422
9
citations
#282

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

Tong Wei, Yijun Yang, Junliang Xing et al.

ICCV 2025arXiv:2503.08525
8
citations
#283

ModSkill: Physical Character Skill Modularization

Yiming Huang, Zhiyang Dou, Lingjie Liu

ICCV 2025arXiv:2502.14140
8
citations
#284

Generating Physically Stable and Buildable Brick Structures from Text

Ava Pun, Kangle Deng, Ruixuan Liu et al.

ICCV 2025arXiv:2505.05469
8
citations
#285

ChatReID: Open-ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models

Ke Niu, Haiyang Yu, Mengyang Zhao et al.

ICCV 2025arXiv:2502.19958
8
citations
#286

SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis

Wenkun He, Yun Liu, Ruitao Liu et al.

ICCV 2025arXiv:2412.20104
8
citations
#287

NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning

Zhixi Cai, Fucai Ke, Simindokht Jahangard et al.

ICCV 2025arXiv:2502.00372
8
citations
#288

ForestFormer3D: A Unified Framework for End-to-End Segmentation of Forest LiDAR 3D Point Clouds

Binbin Xiang, Maciej Wielgosz, Stefano Puliti et al.

ICCV 2025arXiv:2506.16991
8
citations
#289

DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability

Xirui Hu, Jiahao Wang, Hao chen et al.

ICCV 2025arXiv:2503.06505
8
citations
#290

DAViD: Modeling Dynamic Affordance of 3D Objects Using Pre-trained Video Diffusion Models

Hyeonwoo Kim, Sangwon Baik, Hanbyul Joo

ICCV 2025arXiv:2501.08333
8
citations
#291

VideoOrion: Tokenizing Object Dynamics in Videos

Yicheng Feng, Yijiang Li, Wanpeng Zhang et al.

ICCV 2025arXiv:2411.16156
8
citations
#292

SIGMAN: Scaling 3D Human Gaussian Generation with Millions of Assets

Yuhang Yang, Fengqi Liu, Yixing Lu et al.

ICCV 2025
8
citations
#293

Deeply Supervised Flow-Based Generative Models

Inkyu Shin, Chenglin Yang, Liang-Chieh Chen

ICCV 2025arXiv:2503.14494
8
citations
#294

From One to More: Contextual Part Latents for 3D Generation

Shaocong Dong, Lihe Ding, Xiao Chen et al.

ICCV 2025arXiv:2507.08772
8
citations
#295

PRE-Mamba: A 4D State Space Model for Ultra-High-Frequent Event Camera Deraining

Ciyu Ruan, Ruishan Guo, Zihang GONG et al.

ICCV 2025arXiv:2505.05307
8
citations
#296

GeoSplatting: Towards Geometry Guided Gaussian Splatting for Physically-based Inverse Rendering

Kai Ye, Chong Gao, Guanbin Li et al.

ICCV 2025arXiv:2410.24204
8
citations
#297

BANet: Bilateral Aggregation Network for Mobile Stereo Matching

Gangwei Xu, Jiaxin Liu, Xianqi Wang et al.

ICCV 2025arXiv:2503.03259
8
citations
#298

Trace3D: Consistent Segmentation Lifting via Gaussian Instance Tracing

Hongyu Shen, Junfeng Ni, Weishuo Li et al.

ICCV 2025arXiv:2508.03227
8
citations
#299

ZIM: Zero-Shot Image Matting for Anything

Beomyoung Kim, Chanyong Shin, Joonhyun Jeong et al.

ICCV 2025highlightarXiv:2411.00626
8
citations
#300

GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis

Bo Liu, Ke Zou, Li-Ming Zhan et al.

ICCV 2025arXiv:2411.16778
8
citations
#301

NeuralSVG: An Implicit Representation for Text-to-Vector Generation

Sagi Polaczek, Yuval Alaluf, Elad Richardson et al.

ICCV 2025arXiv:2501.03992
8
citations
#302

PhysSplat: Efficient Physics Simulation for 3D Scenes via MLLM-Guided Gaussian Splatting

Haoyu Zhao, Hao Wang, Xingyue Zhao et al.

ICCV 2025
8
citations
#303

Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation

Akshay Krishnan, Xinchen Yan, Vincent Casser et al.

ICCV 2025arXiv:2501.13087
8
citations
#304

NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping

Tianyi Wang, Shuaicheng Niu, Harry Cheng et al.

ICCV 2025arXiv:2503.18678
8
citations
#305

X-Fusion: Introducing New Modality to Frozen Large Language Models

Sicheng Mo, Thao Nguyen, Xun Huang et al.

ICCV 2025arXiv:2504.20996
8
citations
#306

SeqGrowGraph: Learning Lane Topology as a Chain of Graph Expansions

Mengwei Xie, Shuang Zeng, Xinyuan Chang et al.

ICCV 2025arXiv:2507.04822
8
citations
#307

VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models

Taesung Kwon, Jong Ye

ICCV 2025arXiv:2412.00156
8
citations
#308

MUNBa: Machine Unlearning via Nash Bargaining

Jing Wu, Mehrtash Harandi

ICCV 2025arXiv:2411.15537
8
citations
#309

CoopTrack: Exploring End-to-End Learning for Efficient Cooperative Sequential Perception

Jiaru Zhong, Jiahao Wang, Jiahui Xu et al.

ICCV 2025highlightarXiv:2507.19239
8
citations
#310

SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing

Yingying Zhang, Lixiang Ru, Kang Wu et al.

ICCV 2025arXiv:2507.13812
7
citations
#311

ARIG: Autoregressive Interactive Head Generation for Real-time Conversations

Ying Guo, Xi Liu, Cheng Zhen et al.

ICCV 2025arXiv:2507.00472
7
citations
#312

Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing

Yudong Liu, Jingwei Sun, Yueqian Lin et al.

ICCV 2025arXiv:2503.10742
7
citations
#313

GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography

Mengchen Zhang, Tong Wu, Jing Tan et al.

ICCV 2025arXiv:2504.07083
7
citations
#314

Unified Multimodal Understanding via Byte-Pair Visual Encoding

Wanpeng Zhang, Yicheng Feng, Hao Luo et al.

ICCV 2025highlightarXiv:2506.23639
7
citations
#315

Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations

Xiang Xu, Lingdong Kong, Song Wang et al.

ICCV 2025arXiv:2507.05260
7
citations
#316

Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis

Jingjing Ren, Wenbo Li, Zhongdao Wang et al.

ICCV 2025arXiv:2504.14470
7
citations
#317

TR-PTS: Task-Relevant Parameter and Token Selection for Efficient Tuning

Siqi Luo, Haoran Yang, Yi Xin et al.

ICCV 2025arXiv:2507.22872
7
citations
#318

PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models

Runze He, bo cheng, Yuhang Ma et al.

ICCV 2025arXiv:2503.10127
7
citations
#319

VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling

Hyojun Go, Byeongjun Park, Hyelin Nam et al.

ICCV 2025arXiv:2503.15855
7
citations
#320

RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors

Avinash Paliwal, xilong zhou, Wei Ye et al.

ICCV 2025arXiv:2503.10860
7
citations
#321

Vision-Language Models Can't See the Obvious

YASSER ABDELAZIZ DAHOU DJILALI, Ngoc Huynh, Phúc Lê Khắc et al.

ICCV 2025arXiv:2507.04741
7
citations
#322

HAMoBE: Hierarchical and Adaptive Mixture of Biometric Experts for Video-based Person ReID

Yiyang Su, Yunping Shi, Feng Liu et al.

ICCV 2025arXiv:2508.05038
7
citations
#323

Training-Free Text-Guided Image Editing with Visual Autoregressive Model

Yufei Wang, Lanqing Guo, Zhihao Li et al.

ICCV 2025arXiv:2503.23897
7
citations
#324

TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination Via Latent Truthful-Guided Pre-Intervention

Jinhao Duan, Fei Kong, Hao Cheng et al.

ICCV 2025
7
citations
#325

Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos

Rundong Luo, Matthew Wallingford, Ali Farhadi et al.

ICCV 2025arXiv:2504.07940
7
citations
#326

HUMOTO: A 4D Dataset of Mocap Human Object Interactions

Jiaxin Lu, Chun-Hao Huang, Uttaran Bhattacharya et al.

ICCV 2025arXiv:2504.10414
7
citations
#327

DiffSim: Taming Diffusion Models for Evaluating Visual Similarity

Yiren Song, Xiaokang Liu, Mike Zheng Shou

ICCV 2025arXiv:2412.14580
7
citations
#328

Federated Continual Instruction Tuning

Haiyang Guo, Fanhu Zeng, Fei Zhu et al.

ICCV 2025arXiv:2503.12897
7
citations
#329

Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences

Hyojin Bahng, Caroline Chan, Fredo Durand et al.

ICCV 2025arXiv:2506.02095
7
citations
#330

SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data

Xilin He, Cheng Luo, Xiaole Xian et al.

ICCV 2025arXiv:2410.09865
7
citations
#331

Dynamic Typography: Bringing Text to Life via Video Diffusion Prior

Zichen Liu, Yihao Meng, Hao Ouyang et al.

ICCV 2025arXiv:2404.11614
7
citations
#332

Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning

Saemi Moon, Minjong Lee, Sangdon Park et al.

ICCV 2025arXiv:2410.05664
7
citations
#333

LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

Donald Shenaj, Ondrej Bohdal, Mete Ozay et al.

ICCV 2025arXiv:2412.05148
7
citations
#334

DIVE: Taming DINO for Subject-Driven Video Editing

Yi Huang, Wei Xiong, He Zhang et al.

ICCV 2025arXiv:2412.03347
7
citations
#335

StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams

Yang LI, Jinglu Wang, Lei Chu et al.

ICCV 2025arXiv:2503.06235
7
citations
#336

Fine-structure Preserved Real-world Image Super-resolution via Transfer VAE Training

Qiaosi Yi, Shuai Li, Rongyuan Wu et al.

ICCV 2025highlightarXiv:2507.20291
7
citations
#337

Extrapolated Urban View Synthesis Benchmark

Xiangyu Han, Zhen Jia, Boyi Li et al.

ICCV 2025arXiv:2412.05256
7
citations
#338

KinMo: Kinematic-aware Human Motion Understanding and Generation

Pengfei Zhang, Pinxin Liu, Pablo Garrido et al.

ICCV 2025arXiv:2411.15472
7
citations
#339

Fix-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text

Bingchao Wang, Zhiwei Ning, Jianyu Ding et al.

ICCV 2025arXiv:2507.10095
7
citations
#340

MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent

Xinyao Liao, Xianfang Zeng, Liao Wang et al.

ICCV 2025arXiv:2502.03207
7
citations
#341

Language Driven Occupancy Prediction

Zhu Yu, Bowen Pang, Lizhe Liu et al.

ICCV 2025arXiv:2411.16072
7
citations
#342

GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs

Xinli Xu, Wenhang Ge, Dicong Qiu et al.

ICCV 2025arXiv:2412.11258
7
citations
#343

DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding

Jungbin Cho, Junwan Kim, Jisoo Kim et al.

ICCV 2025highlightarXiv:2411.19527
7
citations
#344

GS-LIVM: Real-Time Photo-Realistic LiDAR-Inertial-Visual Mapping with Gaussian Splatting

Yusen XIE, Zhenmin Huang, Jin Wu et al.

ICCV 2025arXiv:2410.17084
7
citations
#345

AgroBench: Vision-Language Model Benchmark in Agriculture

Risa Shinoda, Nakamasa Inoue, Hirokatsu Kataoka et al.

ICCV 2025arXiv:2507.20519
7
citations
#346

MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI

Huanjin Yao, Jiaxing Huang, Yawen Qiu et al.

ICCV 2025arXiv:2506.23563
7
citations
#347

Learned Image Compression with Hierarchical Progressive Context Modeling

Yuqi Li, Haotian Zhang, Li Li et al.

ICCV 2025arXiv:2507.19125
7
citations
#348

Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning

Jingjing Jiang, Chao Ma, Xurui Song et al.

ICCV 2025highlightarXiv:2507.07424
7
citations
#349

Towards Real Unsupervised Anomaly Detection Via Confident Meta-Learning

Muhammad Aqeel, Shakiba Sharifi, Marco Cristani et al.

ICCV 2025arXiv:2508.02293
7
citations
#350

FlowR: Flowing from Sparse to Dense 3D Reconstructions

Tobias Fischer, Samuel Rota Bulò, Yung-Hsu Yang et al.

ICCV 2025highlightarXiv:2504.01647
7
citations
#351

Multimodal LLMs as Customized Reward Models for Text-to-Image Generation

Shijie Zhou, Ruiyi Zhang, Huaisheng Zhu et al.

ICCV 2025arXiv:2507.21391
7
citations
#352

SliderSpace: Decomposing the Visual Capabilities of Diffusion Models

Rohit Gandikota, Zongze Wu, Richard Zhang et al.

ICCV 2025arXiv:2502.01639
7
citations
#353

GaussRender: Learning 3D Occupancy with Gaussian Rendering

Loick Chambon, Eloi Zablocki, Alexandre Boulch et al.

ICCV 2025arXiv:2502.05040
7
citations
#354

TurboReg: TurboClique for Robust and Efficient Point Cloud Registration

Shaocheng Yan, Pengcheng Shi, Zhenjun Zhao et al.

ICCV 2025arXiv:2507.01439
6
citations
#355

DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning

Fucai Ke, Vijay Kumar b g, Xingjian Leng et al.

ICCV 2025arXiv:2503.19263
6
citations
#356

SpatialSplat: Efficient Semantic 3D from Sparse Unposed Images

Yu Sheng, Jiajun Deng, Xinran Zhang et al.

ICCV 2025arXiv:2505.23044
6
citations
#357

``Principal Components" Enable A New Language of Images

Xin Wen, Bingchen Zhao, Ismail Elezi et al.

ICCV 2025
6
citations
#358

SVIP: Semantically Contextualized Visual Patches for Zero-Shot Learning

Zhi Chen, Zecheng Zhao, Jingcai Guo et al.

ICCV 2025arXiv:2503.10252
6
citations
#359

Robust Machine Unlearning for Quantized Neural Networks via Adaptive Gradient Reweighting with Similar Labels

Yujia Tong, Yuze Wang, Jingling Yuan et al.

ICCV 2025arXiv:2503.13917
6
citations
#360

GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting

Xiaobao Wei, Peng Chen, Guangyu Li et al.

ICCV 2025highlightarXiv:2411.12981
6
citations
#361

Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context

Ge Zheng, Jiaye Qian, Jiajin Tang et al.

ICCV 2025arXiv:2510.20229
6
citations
#362

Event-based Tiny Object Detection: A Benchmark Dataset and Baselines

Nuo Chen, Chao Xiao, Yimian Dai et al.

ICCV 2025arXiv:2506.23575
6
citations
#363

Zeroth-Order Fine-Tuning of LLMs in Random Subspaces

Ziming Yu, Pan Zhou, Sike Wang et al.

ICCV 2025arXiv:2410.08989
6
citations
#364

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Xianhang Li, Yanqing Liu, Haoqin Tu et al.

ICCV 2025arXiv:2505.04601
6
citations
#365

Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework

Jian-Jian Jiang, Xiao-Ming Wu, Yi-Xiang He et al.

ICCV 2025arXiv:2503.09186
6
citations
#366

Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis

Letian Zhang, Quan Cui, Bingchen Zhao et al.

ICCV 2025arXiv:2503.08741
6
citations
#367

StableCodec: Taming One-Step Diffusion for Extreme Image Compression

Tianyu Zhang, Xin Luo, Li Li et al.

ICCV 2025arXiv:2506.21977
6
citations
#368

GIViC: Generative Implicit Video Compression

Ge Gao, Siyue Teng, Tianhao Peng et al.

ICCV 2025arXiv:2503.19604
6
citations
#369

Textured 3D Regenerative Morphing with 3D Diffusion Prior

Songlin Yang, Yushi LAN, Honghua Chen et al.

ICCV 2025arXiv:2502.14316
6
citations
#370

MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile Devices

HAILONG YAN, Ao Li, Xiangtao Zhang et al.

ICCV 2025arXiv:2507.01838
6
citations
#371

Emulating Self-attention with Convolution for Efficient Image Super-Resolution

Dongheon Lee, Seokju Yun, Youngmin Ro

ICCV 2025highlightarXiv:2503.06671
6
citations
#372

Snakes and Ladders: Two Steps Up for VideoMamba

Hui Lu, Albert Ali Salah, Ronald Poppe

ICCV 2025arXiv:2406.19006
6
citations
#373

ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving

Yuhang Lu, Jiadong Tu, Yuexin Ma et al.

ICCV 2025arXiv:2507.12499
6
citations
#374

LVFace: Progressive Cluster Optimization for Large Vision Models in Face Recognition

Jinghan You, Shanglin Li, Yuanrui Sun et al.

ICCV 2025highlightarXiv:2501.13420
6
citations
#375

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Xuran Ma, Yexin Liu, Yaofu LIU et al.

ICCV 2025arXiv:2504.03140
6
citations
#376

Bringing RNNs Back to Efficient Open-Ended Video Understanding

Weili Xu, Enxin Song, Wenhao Chai et al.

ICCV 2025arXiv:2507.02591
6
citations
#377

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers For Motion Transfer

Qingyu Shi, Jianzong Wu, Jinbin Bai et al.

ICCV 2025arXiv:2503.17350
6
citations
#378

CompCap: Improving Multimodal Large Language Models with Composite Captions

Xiaohui Chen, Satya Narayan Shukla, Mahmoud Azab et al.

ICCV 2025arXiv:2412.05243
6
citations
#379

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Guanning Zeng, Xiang Zhang, Zirui Wang et al.

ICCV 2025arXiv:2508.00728
6
citations
#380

Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction

Yunheng Li, Yuxuan Li, Quan-Sheng Zeng et al.

ICCV 2025arXiv:2412.06244
6
citations
#381

AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs

Sanjoy Chowdhury, Hanan Gani, Nishit Anand et al.

ICCV 2025arXiv:2503.23219
6
citations
#382

WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images

Yansong Guo, Jie Hu, Yansong Qu et al.

ICCV 2025arXiv:2503.08407
6
citations
#383

DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy

Ming Dai, Wenxuan Cheng, Jiang-Jiang Liu et al.

ICCV 2025arXiv:2507.01738
6
citations
#384

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Yudong Jin, Sida Peng, Xuan Wang et al.

ICCV 2025arXiv:2507.13344
6
citations
#385

MIEB: Massive Image Embedding Benchmark

Chenghao Xiao, Isaac Chung, Imene Kerboua et al.

ICCV 2025arXiv:2504.10471
6
citations
#386

OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models

Huanpeng Chu, Wei Wu, Guanyu Feng et al.

ICCV 2025arXiv:2508.16212
6
citations
#387

REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents

Rui Tian, Qi Dai, Jianmin Bao et al.

ICCV 2025arXiv:2411.13552
6
citations
#388

VALLR: Visual ASR Language Model for Lip Reading

Marshall Thomas, Edward Fish, Richard Bowden

ICCV 2025arXiv:2503.21408
6
citations
#389

PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask

Jeongho Kim, Hoiyeong Jin, Sunghyun Park et al.

ICCV 2025arXiv:2412.16978
6
citations
#390

Open-Vocabulary Octree-Graph for 3D Scene Understanding

Zhigang Wang, Yifei Su, Chenhui Li et al.

ICCV 2025arXiv:2411.16253
6
citations
#391

R-LiViT: A LiDAR-Visual-Thermal Dataset Enabling Vulnerable Road User Focused Roadside Perception

Jonas Mirlach, Lei Wan, Andreas Wiedholz et al.

ICCV 2025arXiv:2503.17122
6
citations
#392

FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution

Gene Chou, Wenqi Xian, Guandao Yang et al.

ICCV 2025highlightarXiv:2504.07093
6
citations
#393

Dual-Process Image Generation

Grace Luo, Jonathan Granskog, Aleksander Holynski et al.

ICCV 2025arXiv:2506.01955
6
citations
#394

Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method

Han Wang, Shengyang Li, Jian Yang et al.

ICCV 2025arXiv:2506.22027
6
citations
#395

AnyCalib: On-Manifold Learning for Model-Agnostic Single-View Camera Calibration

Javier Tirado-Garín, Javier Civera

ICCV 2025arXiv:2503.12701
6
citations
#396

Where, What, Why: Towards Explainable Driver Attention Prediction

Yuchen Zhou, Jiayu Tang, Xiaoyan Xiao et al.

ICCV 2025highlightarXiv:2506.23088
6
citations
#397

ZeroStereo: Zero-shot Stereo Matching from Single Images

Xianqi Wang, Hao Yang, Gangwei Xu et al.

ICCV 2025arXiv:2501.08654
6
citations
#398

CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization

Jan Ackermann, Jonas Kulhanek, Shengqu Cai et al.

ICCV 2025arXiv:2506.21117
6
citations
#399

DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation

Runze Zhang, Guoguang Du, Xiaochuan Li et al.

ICCV 2025highlightarXiv:2503.06053
6
citations
#400

Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment

ying ba, Tianyu Zhang, Yalong Bai et al.

ICCV 2025arXiv:2507.19002
6
citations