Most Cited 2025 "hardware robotic control" Papers

22,274 papers found • Page 81 of 112

#16001

Authentic 4D Driving Simulation with a Video Generation Model

Lening Wang, Wenzhao Zheng, Dalong Du et al.

ICCV 2025
#16002

Generative Model Inversion Through the Lens of the Manifold Hypothesis

Xiong Peng, Bo Han, Fengfei Yu et al.

NEURIPS 2025arXiv:2509.20177
#16003

Lidar Waveforms are Worth 40x128x33 Words

Dominik Scheuble, Hanno Holzhüter, Steven Peters et al.

ICCV 2025highlight
#16004

Spherical Epipolar Rectification for Deep Two-View Absolute Depth Estimation

Pierre-André Brousseau, Sébastien Roy

ICCV 2025
#16005

InstructFlow: Adaptive Symbolic Constraint-Guided Code Generation for Long-Horizon Planning

Haotian Chi, Zeyu Feng, Yueming LYU et al.

NEURIPS 2025oral
#16006

From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos

Chenjian Gao, Lihe Ding, Rui Han et al.

ICCV 2025arXiv:2507.20331
#16007

High-Precision 3D Measurement of Complex Textured Surfaces Using Multiple Filtering Approach

Yuchong Chen, Jian Yu, Shaoyan Gai et al.

ICCV 2025
#16008

Wide2Long: Learning Lens Compression and Perspective Adjustment for Wide-Angle to Telephoto Translation

Soumyadipta Banerjee, Jiaul Paik, Debashis Sen

ICCV 2025
#16009

Polarimetric Neural Field via Unified Complex-Valued Wave Representation

Chu Zhou, Yixin Yang, Junda Liao et al.

ICCV 2025
#16010

Leveraging 2D Priors and SDF Guidance for Urban Scene Rendering

Siddharth Tourani, Jayaram Reddy, Akash Kumbar et al.

ICCV 2025
#16011

Semantic-guided Camera Ray Regression for Visual Localization

Yesheng Zhang, Xu Zhao

ICCV 2025
#16012

SparseLaneSTP: Leveraging Spatio-Temporal Priors with Sparse Transformers for 3D Lane Detection

Maximilian Pittner, Joel Janai, Mario Faigle et al.

ICCV 2025arXiv:2601.04968
#16013

Relative Illumination Fields: Learning Medium and Light Independent Underwater Scenes

Mengkun She, Felix Seegräber, David Nakath et al.

ICCV 2025arXiv:2504.10024
#16014

Super Resolved Imaging with Adaptive Optics

Robin Swanson, Esther Y. H. Lin, Masen Lamb et al.

ICCV 2025highlightarXiv:2508.04648
#16015

HVPUNet: Hybrid-Voxel Point-cloud Upsampling Network

Juhyung Ha, Vibhas Vats, Alimoor Reza et al.

ICCV 2025
#16016

Stealthy Backdoor Attack in Federated Learning via Adaptive Layer-wise Gradient Alignment

Qingqian Yang, Peishen Yan, Xiaoyu Wu et al.

ICCV 2025
#16017

RayletDF: Raylet Distance Fields for Generalizable 3D Surface Reconstruction from Point Clouds or Gaussians

Shenxing Wei, Jinxi Li, Yafei YANG et al.

ICCV 2025highlightarXiv:2508.09830
#16018

Lifting the Structural Morphing for Wide-Angle Images Rectification: Unified Content and Boundary Modeling

Wenting Luan, Siqi Lu, Yongbin Zheng et al.

ICCV 2025
#16019

NGD: Neural Gradient Based Deformation for Monocular Garment Reconstruction

Soham Dasgupta, Shanthika Naik, Preet Savalia et al.

ICCV 2025arXiv:2508.17712
#16020

Knowledge Distillation for Learned Image Compression

Yunuo Chen, Zezheng Lyu, Bing He et al.

ICCV 2025
#16021

EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device

Gunjan Chhablani, Xiaomeng Ye, Muhammad Zubair Irshad et al.

ICCV 2025arXiv:2509.17430
#16022

LLM Interpretability with Identifiable Temporal-Instantaneous Representation

Xiangchen Song, Jiaqi Sun, Zijian Li et al.

NEURIPS 2025oralarXiv:2509.23323
#16023

NormalLoc: Visual Localization on Textureless 3D Models using Surface Normals

Jiro Abe, Gaku Nakano, Kazumine Ogura

ICCV 2025
#16024

InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation

Zhuoran Yang, Xi Guo, Chenjing Ding et al.

ICCV 2025
#16025

RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model

Huiyang Hu, Peijin Wang, Hanbo Bi et al.

ICCV 2025arXiv:2411.17984
#16026

Towards Safer and Understandable Driver Intention Prediction

Mukilan Karuppasamy, Shankar Gangisetty, Shyam Nandan Rai et al.

ICCV 2025arXiv:2510.09200
#16027

REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment

Haonan Han, Rui Yang, Huan Liao et al.

ICCV 2025arXiv:2405.18525
#16028

Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)

Tomer Garber, Tom Tirer

CVPR 2025arXiv:2412.20596
#16029

GSRecon: Efficient Generalizable Gaussian Splatting for Surface Reconstruction from Sparse Views

Hang Yang, Le Hui, Jianjun Qian et al.

ICCV 2025
#16030

Teeth Reconstruction and Performance Capture Using a Phone Camera

Weixi Zheng, Jingwang Ling, Zhibo Wang et al.

ICCV 2025
#16031

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Jianhong Bai, Menghan Xia, Xiao Fu et al.

ICCV 2025arXiv:2503.11647
#16032

SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling

Xianglong He, Zi-Xin Zou, Chia Hao Chen et al.

ICCV 2025arXiv:2503.21732
#16033

RCTDistill: Cross-Modal Knowledge Distillation Framework for Radar-Camera 3D Object Detection with Temporal Fusion

Geonho Bang, Minjae Seong, Jisong Kim et al.

ICCV 2025arXiv:2509.17712
#16034

Diving into the Fusion of Monocular Priors for Generalized Stereo Matching

Chengtang Yao, Lidong Yu, Zhidan Liu et al.

ICCV 2025arXiv:2505.14414
#16035

ChemOrch: Empowering LLMs with Chemical Intelligence via Groundbreaking Synthetic Instructions

Yue Huang, Zhengzhe Jiang, Xiaonan Luo et al.

NEURIPS 2025
#16036

DAA*: Deep Angular A Star for Image-based Path Planning

Zhiwei Xu

ICCV 2025arXiv:2507.09305
#16037

Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models

Sangwon Jang, June Suk Choi, Jaehyeong Jo et al.

CVPR 2025arXiv:2503.09669
#16038

Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning

Haochen Zhang, Zhong Zheng, Lingzhou Xue

NEURIPS 2025arXiv:2506.04626
#16039

Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction

JIXUAN FAN, Wanhua Li, Yifei Han et al.

ICCV 2025arXiv:2412.04887
#16040

ROAR: Reducing Inversion Error in Generative Image Watermarking

Hanyi Wang, Han Fang, Shi-Lin Wang et al.

ICCV 2025
#16041

Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution

Peng Du, Hui Li, Han Xu et al.

ICCV 2025arXiv:2511.01175
#16042

Automated Model Evaluation for Object Detection via Prediction Consistency and Reliability

Seungju Yoo, Hyuk Kwon, Joong-Won Hwang et al.

ICCV 2025arXiv:2508.12082
#16043

LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing

Federico Girella, Davide Talon, Ziyue Liu et al.

ICCV 2025arXiv:2507.22627
#16044

FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models

Vladimir Kulikov, Matan Kleiner, Inbar Huberman-Spiegelglas et al.

ICCV 2025arXiv:2412.08629
#16045

Gaussian-based World Model: Gaussian Priors for Voxel-Based Occupancy Prediction and Future Motion Prediction

Tuo Feng, Wenguan Wang, Yi Yang

ICCV 2025
#16046

Spatially-Varying Autofocus

Yingsi Qin, Aswin Sankaranarayanan, Matthew O'Toole

ICCV 2025
#16047

Event-based Visual Vibrometry

Xinyu Zhou, Peiqi Duan, Yeliduosi Xiaokaiti et al.

ICCV 2025
#16048

Benchmarking Egocentric Visual-Inertial SLAM at City Scale

Anusha Krishnan, Shaohui Liu, Paul-Edouard Sarlin et al.

ICCV 2025highlightarXiv:2509.26639
#16049

Pathways on the Image Manifold: Image Editing via Video Generation

Noam Rotstein, Gal Yona, Daniel Silver et al.

CVPR 2025arXiv:2411.16819
#16050

SynAD: Enhancing Real-World End-to-End Autonomous Driving Models through Synthetic Data Integration

Jongsuk Kim, Jae Young Lee, Gyojin Han et al.

ICCV 2025arXiv:2510.24052
#16051

Correlation Dimension of Autoregressive Large Language Models

Xin Du, Kumiko Tanaka-Ishii

NEURIPS 2025
#16052

Large Scene Generation with Cube-Absorb Discrete Diffusion

Qianjiang Hu, Wei Hu

ICCV 2025
#16053

M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization

Ju-Hyeon Nam, Dong-Hyun Moon, Sang-Chul Lee

ICCV 2025highlightarXiv:2506.20922
#16054

Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description

Anna-Maria Halacheva, Yang Miao, Jan-Nico Zaech et al.

ICCV 2025arXiv:2412.01398
#16055

MMGeo: Multimodal Compositional Geo-Localization for UAVs

Yuxiang Ji, Boyong He, Zhuoyue Tan et al.

ICCV 2025
#16056

RobuSTereo: Robust Zero-Shot Stereo Matching under Adverse Weather

Yuran Wang, Yingping Liang, Yutao Hu et al.

ICCV 2025arXiv:2507.01653
#16057

FastPoint: Accelerating 3D Point Cloud Model Inference via Sample Point Distance Prediction

Donghyun Lee, Dawoon Jeong, Jae W. Lee et al.

ICCV 2025arXiv:2507.23480
#16058

ObjectRelator: Enabling Cross-View Object Relation Understanding Across Ego-Centric and Exo-Centric Perspectives

Yuqian Fu, Runze Wang, Bin Ren et al.

ICCV 2025highlightarXiv:2411.19083
#16059

Dual-S3D: Hierarchical Dual-Path Selective SSM-CNN for High-Fidelity Implicit Reconstruction

Luoxi Zhang, Pragyan Shrestha, Yu Zhou et al.

ICCV 2025
#16060

Scene-agnostic Pose Regression for Visual Localization

Junwei Zheng, Ruiping Liu, Yufan Chen et al.

CVPR 2025arXiv:2503.19543
#16061

Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging

Chongjie Ye, Yushuang Wu, Ziteng Lu et al.

ICCV 2025arXiv:2503.22236
#16062

PRM: Photometric Stereo based Large Reconstruction Model

Wenhang Ge, Jiantao Lin, Guibao SHEN et al.

ICCV 2025highlightarXiv:2412.07371
#16063

Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images

Qi Xun Yeo, Yanyan Li, Gim Hee Lee

ICCV 2025arXiv:2508.06546
#16064

SU-RGS: Relightable 3D Gaussian Splatting from Sparse Views under Unconstrained Illuminations

Qi Zhang, Chi Huang, Qian Zhang et al.

ICCV 2025
#16065

Sibai: A Few-Shot Meta-Classifier for Poisoning Detection in Federated Learning

Melanie Götz, Torsten Krauß, Alexandra Dmitrienko

ICCV 2025
#16066

Gradient Extrapolation for Debiased Representation Learning

Ihab Asaad, Maha Shadaydeh, Joachim Denzler

ICCV 2025arXiv:2503.13236
#16067

PointGAC: Geometric-Aware Codebook for Masked Point Modeling

Abiao Li, Chenlei Lv, Guofeng Mei et al.

ICCV 2025
#16068

Angular Constraint Embedding via SpherePair Loss for Constrained Clustering

Shaojie Zhang, Ke Chen

NEURIPS 2025arXiv:2510.06907
#16069

World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model

Yupeng Zheng, Pengxuan Yang, Zebin Xing et al.

ICCV 2025arXiv:2507.00603
#16070

SG-LDM: Semantic-Guided LiDAR Generation via Latent-Aligned Diffusion

Zhengkang Xiang, Zizhao Li, Amir Khodabandeh et al.

ICCV 2025arXiv:2506.23606
#16071

Scaling Transformer-Based Novel View Synthesis with Models Token Disentanglement and Synthetic Data

Nithin Gopalakrishnan Nair, Srinivas Kaza, Xuan Luo et al.

ICCV 2025
#16072

GLVD: Guided Learned Vertex Descent

Pol Caselles RIco, Francesc Moreno-Noguer

NEURIPS 2025arXiv:2510.06046
#16073

Customizing Domain Adapters for Domain Generalization

Yuyang Ji, Zeyi Huang, Haohan Wang et al.

ICCV 2025
#16074

RESCUE: Crowd Evacuation Simulation via Controlling SDM-United Characters

Xiaolin Liu, Tianyi zhou, Hongbo Kang et al.

ICCV 2025highlightarXiv:2507.20117
#16075

Group Inertial Poser: Multi-Person Pose and Global Translation from Sparse Inertial Sensors and Ultra-Wideband Ranging

Ying Xue, Jiaxi Jiang, Rayan Armani et al.

ICCV 2025arXiv:2510.21654
#16076

U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration

Xiaofan Li, Zhihao Xu, Chenming Wu et al.

ICCV 2025arXiv:2507.04503
#16077

Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression

Jingfeng Wu, Pierre Marion, Peter Bartlett

NEURIPS 2025arXiv:2506.02336
#16078

DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving

Chen Shi, Shaoshuai Shi, Kehua Sheng et al.

ICCV 2025arXiv:2505.19239
#16079

MamV2XCalib: V2X-based Target-less Infrastructure Camera Calibration with State Space Model

Yaoye Zhu, Zhe Wang, Yan Wang

ICCV 2025arXiv:2507.23595
#16080

Soft Separation and Distillation: Toward Global Uniformity in Federated Unsupervised Learning

Hung-Chieh Fang, Hsuan-Tien Lin, Irwin King et al.

ICCV 2025arXiv:2508.01251
#16081

PossLoss: A Reliable and Sensitive Facial Landmark Detection Loss Function

Qikui Zhu

ICCV 2025
#16082

Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image

Jerred Chen, Ronald Clark

ICCV 2025arXiv:2503.17358
#16083

Axis-level Symmetry Detection with Group-Equivariant Representation

Wongyun Yu, Ahyun Seo, Minsu Cho

ICCV 2025arXiv:2508.10740
#16084

PARTE: Part-Guided Texturing for 3D Human Reconstruction from a Single Image

Hyeongjin Nam, Donghwan Kim, Gyeongsik Moon et al.

ICCV 2025arXiv:2507.17332
#16085

AD-GS: Object-Aware B-Spline Gaussian Splatting for Self-Supervised Autonomous Driving

Jiawei Xu, Kai Deng, Zexin Fan et al.

ICCV 2025arXiv:2507.12137
#16086

Boosting MLLM Reasoning with Text-Debiased Hint-GRPO

Qihan Huang, Weilong Dai, Jinlong Liu et al.

ICCV 2025arXiv:2503.23905
#16087

Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts

Zixuan Hu, Dongxiao Li, Xinzhu Ma et al.

ICCV 2025highlightarXiv:2508.20488
#16088

HyperGCT: A Dynamic Hyper-GNN-Learned Geometric Constraint for 3D Registration

Xiyu Zhang, Jiayi Ma, Jianwei Guo et al.

ICCV 2025arXiv:2503.02195
#16089

AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference

Kai Huang, hao zou, Bochen Wang et al.

ICCV 2025arXiv:2503.23956
#16090

PhysAnimator: Physics-Guided Generative Cartoon Animation

Tianyi Xie, Yiwei Zhao, Ying Jiang et al.

CVPR 2025arXiv:2501.16550
#16091

Inverse Image-Based Rendering for Light Field Generation from Single Images

Hyunjun Jung, Hae-Gon Jeon

ICCV 2025highlightarXiv:2510.20132
#16092

FlowStyler: Artistic Video Stylization via Transformation Fields Transports

YuNing Gong, Jiaming Chen, Xiaohua Ren et al.

ICCV 2025
#16093

ShadowHack: Hacking Shadows via Luminance-Color Divide and Conquer

Jin Hu, Mingjia Li, Xiaojie Guo

ICCV 2025arXiv:2412.02545
#16094

Mixture-of-Scores: Robust Image-Text Data Valuation via Three Lines of Code

WU Sitong, Haoru Tan, Yukang Chen et al.

ICCV 2025
#16095

Beyond Losses Reweighting: Empowering Multi-Task Learning via the Generalization Perspective

Hoang Phan, Tung Lam Tran, Quyen Tran et al.

ICCV 2025highlightarXiv:2211.13723
#16096

DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes

Zonglin Di, Jing Shi, Yifei Fan et al.

ICCV 2025
#16097

FastJSMA: Accelerating Jacobian-based Saliency Map Attacks through Gradient Decoupling

Zhenghao Gao, Shengjie Xu, Zijing Li et al.

ICCV 2025
#16098

Toward Fair and Accurate Cross-Domain Medical Image Segmentation: A VLM-Driven Active Domain Adaptation Paradigm

Hongqiu Wang, Wu Chen, Xiangde Luo et al.

ICCV 2025
#16099

Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion

Yidi Liu, Dong Li, Yuxin Ma et al.

ICCV 2025arXiv:2503.12764
#16100

Federated Continuous Category Discovery and Learning

Lixu Wang, Chenxi Liu, Junfeng Guo et al.

ICCV 2025
#16101

DM-EFS: Dynamically Multiplexed Expanded Features Set Form for Robust and Efficient Small Object Detection

Aashish Sharma

ICCV 2025
#16102

B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens

Zhuqiang Lu, Zhenfei Yin, Mengwei He et al.

ICCV 2025arXiv:2412.09919
#16103

BlueNeg: A 35mm Negative Film Dataset for Restoring Channel-Heterogeneous Deterioration

Hanyuan Liu, Chengze Li, Minshan Xie et al.

ICCV 2025
#16104

Rethinking Key-frame-based Micro-expression Recognition: A Robust and Accurate Framework Against Key-frame Errors

Zheyuan Zhang, Weihao Tang, Hong Chen

ICCV 2025highlightarXiv:2508.06640
#16105

ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba

Juncan Deng, Shuaiting Li, Zeyu Wang et al.

ICCV 2025arXiv:2503.09509
#16106

Pretend Benign: A Stealthy Adversarial Attack by Exploiting Vulnerabilities in Cooperative Perception

Hongwei Lin, Dongyu Pan, Qiming Xia et al.

ICCV 2025
#16107

What we need is explicit controllability: Training 3D gaze estimator using only facial images

Tingwei Li, Jun Bao, Zhenzhong Kuang et al.

ICCV 2025
#16108

SemiVisBooster: Boosting Semi-Supervised Learning for Fine-Grained Classification through Pseudo-Label Semantic Guidance

Wenjin Zhang, Xinyu Li, Chenyang Gao et al.

ICCV 2025
#16109

Unbiased Missing-modality Multimodal Learning

Ruiting Dai, Chenxi Li, Yandong Yan et al.

ICCV 2025
#16110

Enhancing Prompt Generation with Adaptive Refinement for Camouflaged Object Detection

Xuehan Chen, Guangyu Ren, Tianhong Dai et al.

ICCV 2025
#16111

Hypergraph Clustering Network with Partial Attribute Imputation

Qianqian Wang, Bowen Zhao, Zhengming Ding et al.

ICCV 2025
#16112

Dual-Path Temporal Decoder for End-to-End Multi-Object Tracking

Hyunseop Kim, Juheon Jeong, Hanul Kim et al.

NEURIPS 2025oral
#16113

SAMPLE: Semantic Alignment through Temporal-Adaptive Multimodal Prompt Learning for Event-Based Open-Vocabulary Action Recognition

Jing Wang, Rui Zhao, Ruiqin Xiong et al.

ICCV 2025
#16114

Learning Null Geodesics for Gravitational Lensing Rendering in General Relativity

Mingyuan Sun, Zheng Fang, Jiaxu Wang et al.

ICCV 2025arXiv:2507.15775
#16115

Object-centric Video Question Answering with Visual Grounding and Referring

Haochen Wang, Qirui Chen, Cilin Yan et al.

ICCV 2025arXiv:2507.19599
#16116

DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness

Ruining Li, Chuanxia Zheng, Christian Rupprecht et al.

ICCV 2025highlightarXiv:2503.22677
#16117

EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds

Lu Chen, Yizhou Wang, SHIXIANG TANG et al.

ICCV 2025arXiv:2502.05857
#16118

Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval

Bangxiang Lan, Ruobing Xie, Ruixiang Zhao et al.

ICCV 2025arXiv:2509.04773
#16119

MEH: A Multi-Style Dataset and Toolkit for Advancing Egyptian Hieroglyph Recognition

Maksim Golyadkin, Rubanova Alexandrovna, Aleksandr Utkov et al.

ICCV 2025
#16120

LIRA: Reasoning Reconstruction via Multimodal Large Language Models

Zhen Zhou, Tong Wang, Yunkai Ma et al.

ICCV 2025
#16121

MaskSAM: Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation

Bin Xie, Hao Tang, Bin Duan et al.

ICCV 2025
#16122

Breaking Grid Constraints: Dynamic Graph Reconstruction Network for Multi-organ Segmentation

Junhao Xiao, Yang Wei, Jingyu Wang et al.

ICCV 2025
#16123

Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training

Wooseong Jeong, Jegyeong Cho, Youngho Yoon et al.

ICCV 2025arXiv:2507.07778
#16124

Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning

Zeyu Xi, Haoying Sun, Yaofei Wu et al.

ICCV 2025arXiv:2507.20163
#16125

Learning an Implicit Physics Model for Image-based Fluid Simulation

Emily Jia, Jiageng Mao, Zhiyuan Gao et al.

ICCV 2025arXiv:2508.08254
#16126

Enrich and Detect: Video Temporal Grounding with Multimodal LLMs

Shraman Pramanick, Effrosyni Mavroudi, Yale Song et al.

ICCV 2025highlightarXiv:2510.17023
#16127

Exploiting Frequency Dynamics for Enhanced Multimodal Event-based Action Recognition

Meiqi Cao, Xiangbo Shu, Xin Jiang et al.

ICCV 2025
#16128

Region-aware Anchoring Mechanism for Efficient Referring Visual Grounding

Shuyi Ouyang, Ziwei Niu, Hongyi Wang et al.

ICCV 2025
#16129

First Attentions Last: Better Exploiting First Attentions for Efficient Parallel Training

Gyudong Kim, Hyukju Na, Jin Kim et al.

NEURIPS 2025
#16130

Token-Efficient VLM: High-Resolution Image Understanding via Dynamic Region Proposal

Yitong Jiang, Jinwei Gu, Tianfan Xue et al.

ICCV 2025highlight
#16131

Teaching AI the Anatomy Behind the Scan: Addressing Anatomical Flaws in Medical Image Segmentation with Learnable Prior

Young Seok Jeon, Hongfei Yang, Huazhu Fu et al.

ICCV 2025arXiv:2403.18878
#16132

How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach

Chirui CHANG, Jiahui Liu, Zhengzhe Liu et al.

ICCV 2025arXiv:2406.19568
#16133

SIC: Similarity-Based Interpretable Image Classification with Neural Networks

Tom Nuno Wolf, Emre Kavak, Fabian Bongratz et al.

ICCV 2025arXiv:2501.17328
#16134

Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data

Qi Chen, Xinze Zhou, Chen Liu et al.

ICCV 2025arXiv:2510.14831
#16135

Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration

Ting Lei, Shaofeng Yin, Qingchao Chen et al.

ICCV 2025arXiv:2508.03207
#16136

LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation

Xinyu Yan, Meijun Sun, Ge-Peng Ji et al.

ICCV 2025arXiv:2508.01152
#16137

What Moves the Eyes: Doubling Mechanistic Model Performance Using Deep Networks to Discover and Test Cognitive Hypotheses

Federico D'Agostino, Lisa Schwetlick, Matthias Bethge et al.

NEURIPS 2025oral
#16138

VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-based Group Relative Policy Optimization

Xinye Cao, Hongcan Guo, Jiawen Qian et al.

ICCV 2025arXiv:2510.06040
#16139

SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting

Shuaiting Li, Juncan Deng, Chengxuan Wang et al.

ICCV 2025arXiv:2503.08668
#16140

WIPES: Wavelet-based Visual Primitives

Wenhao Zhang, Hao Zhu, Delong Wu et al.

ICCV 2025arXiv:2508.12615
#16141

MambaML: Exploring State Space Models for Multi-Label Image Classification

Xuelin Zhu, Jian liu, Jiuxin Cao et al.

ICCV 2025
#16142

Vision-Language Neural Graph Featurization for Extracting Retinal Lesions

Taimur Hassan, Anabia Sohail, Muzammal Naseer et al.

ICCV 2025
#16143

Flow-MIL: Constructing Highly-expressive Latent Feature Space For Whole Slide Image Classification Using Normalizing Flow

Yingfan MA, Bohan An, Ao Shen et al.

ICCV 2025
#16144

Towards Robustness of Person Search against Corruptions

Woojung Son, Yoonki Cho, Guoyuan An et al.

ICCV 2025
#16145

MotionBind: Multi-Modal Human Motion Alignment for Retrieval, Recognition, and Generation

Kaleab Kinfu, Rene Vidal

NEURIPS 2025oral
#16146

CoSMIC: Continual Self-supervised Learning for Multi-Domain Medical Imaging via Conditional Mutual Information Maximization

Yihang Liu, Ying Wen, Longzhen Yang et al.

ICCV 2025
#16147

VIPerson: Flexibly Generating Virtual Identity for Person Re-Identification

Xiao-Wen Zhang, Delong Zhang, Yi-Xing Peng et al.

ICCV 2025
#16148

UINavBench: A Framework for Comprehensive Evaluation of Interactive Digital Agents

Harsh Agrawal, Eldon Schoop, Xinlei Pan et al.

ICCV 2025
#16149

SEAL: Semantic Aware Image Watermarking

Kasra Arabi, R. Teal Witter, Chinmay Hegde et al.

ICCV 2025arXiv:2503.12172
#16150

ArchiSet: Benchmarking Editable and Consistent Single-View 3D Reconstruction of Buildings with Specific Window-to-Wall Ratios

Jun Yin, Pengyu Zeng, Licheng Shen et al.

ICCV 2025
#16151

How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?

Yujian Lee, Peng Gao, Yongqi Xu et al.

ICCV 2025arXiv:2601.08133
#16152

Unsupervised Identification of Protein Compositions and Conformations via Implicit Content-Transformation Disentanglement

Mostofa Rafid Uddin, Jana Armouti, Min Xu

ICCV 2025
#16153

Unsupervised Histopathological Image Semantic Segmentation with Overlapping Patches Consistency Constraint

Wentian Cai, Weizhao Weng, Zihao Huang et al.

ICCV 2025
#16154

Splat-based 3D Scene Reconstruction with Extreme Motion-blur

Hyeonjoong Jang, Dongyoung Choi, Donggun Kim et al.

ICCV 2025
#16155

Diffusion Curriculum: Synthetic-to-Real Data Curriculum via Image-Guided Diffusion

Yijun Liang, Shweta Bhardwaj, Tianyi Zhou

ICCV 2025arXiv:2410.13674
#16156

VISO: Accelerating In-orbit Object Detection with Language-Guided Mask Learning and Sparse Inference

Meiqi Wang, Han Qiu

ICCV 2025
#16157

FIND: Few-Shot Anomaly Inspection with Normal-Only Multi-Modal Data

YITING LI, Fayao Liu, Jingyi Liao et al.

ICCV 2025
#16158

Advancing Textual Prompt Learning with Anchored Attributes

Zheng Li, Yibing Song, Ming-Ming Cheng et al.

ICCV 2025arXiv:2412.09442
#16159

DC-TTA: Divide-and-Conquer Framework for Test-Time Adaptation of Interactive Segmentation

Jihun Kim, Hoyong Kwon, Hyeokjun Kweon et al.

ICCV 2025arXiv:2506.23104
#16160

AR-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction

Xuying Zhang, Yupeng Zhou, Kai Wang et al.

ICCV 2025
#16161

From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment

Yucheng Suo, Fan Ma, Linchao Zhu et al.

ICCV 2025arXiv:2503.20472
#16162

Dual-Rate Dynamic Teacher for Source-Free Domain Adaptive Object Detection

Qi He, Xiao Wu, Jun-Yan He et al.

ICCV 2025
#16163

ReferEverything: Towards Segmenting Everything We Can Speak of in Videos

Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang et al.

ICCV 2025arXiv:2410.23287
#16164

OV3D-CG: Open-vocabulary 3D Instance Segmentation with Contextual Guidance

Mingquan Zhou, Chen He, Ruiping Wang et al.

ICCV 2025
#16165

Accelerate 3D Object Detection Models via Zero-Shot Attention Key Pruning

Lizhen Xu, Xiuxiu Bai, Xiaojun Jia et al.

ICCV 2025arXiv:2503.08101
#16166

Neural Collapse under Gradient Flow on Shallow ReLU Networks for Orthogonally Separable Data

Hancheng Min, Zhihui Zhu, Rene Vidal

NEURIPS 2025arXiv:2510.21078
#16167

Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis

Peng Zheng, Junke Wang, Yi Chang et al.

ICCV 2025arXiv:2507.01756
#16168

CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model

Yuxuan Luo, Jiaqi Tang, Chenyi Huang et al.

ICCV 2025arXiv:2503.06472
#16169

CogCM: Cognition-Inspired Contextual Modeling for Audio-Visual Speech Enhancement

Feixiang Wang, Shuang Yang, Shiguang Shan et al.

ICCV 2025
#16170

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Zhisheng Zhong, Chengyao Wang, Yuqi Liu et al.

ICCV 2025arXiv:2412.09501
#16171

Similarity Memory Prior is All You Need for Medical Image Segmentation

Hao Tang, Zhiqing Guo, Liejun Wang et al.

ICCV 2025highlightarXiv:2507.00585
#16172

EDFFDNet: Towards Accurate and Efficient Unsupervised Multi-Grid Image Registration

Haokai Zhu, Bo Qu, Si-Yuan Cao et al.

ICCV 2025arXiv:2509.07662
#16173

Enhancing Mamba Decoder with Bidirectional Interaction in Multi-Task Dense Prediction

Mang Cao, Sanping Zhou, Yizhe Li et al.

ICCV 2025arXiv:2508.20376
#16174

Leveraging Debiased Cross-modal Attention Maps and Code-based Reasoning for Zero-shot Referring Expression Comprehension

Juntao Chen, Wen Shen, Zhihua Wei et al.

ICCV 2025
#16175

UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling

Peiming Li, Ziyi Wang, Yulin Yuan et al.

ICCV 2025arXiv:2508.14604
#16176

SITE: towards Spatial Intelligence Thorough Evaluation

Wenqi Wang, Reuben Tan, Pengyue Zhu et al.

ICCV 2025arXiv:2505.05456
#16177

SHIFT: Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models

Sudong Wang, Yunjian Zhang, Yao Zhu et al.

ICCV 2025
#16178

ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches

Nandish Chattopadhyay, Amira Guesmi, Muhammad Abdullah Hanif et al.

ICCV 2025arXiv:2311.12084
#16179

Debiasing Trace Guidance: Top-down Trace Distillation and Bottom-up Velocity Alignment for Unsupervised Anomaly Detection

Xingjian Wang, Li Chai, Jiming Chen

ICCV 2025
#16180

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring

Yufei Zhan, Shurong Zheng, Yousong Zhu et al.

ICCV 2025arXiv:2403.09333
#16181

Conformal Prediction for Zero-Shot Models

Julio Silva-Rodríguez, Ismail Ben Ayed, Jose Dolz

CVPR 2025arXiv:2505.24693
#16182

Automated Red Teaming for Text-to-Image Models through Feedback-Guided Prompt Iteration with Vision-Language Models

Wei Xu, Kangjie Chen, Jiawei Qiu et al.

ICCV 2025
#16183

Convergence Rates for Gradient Descent on the Edge of Stability for Overparametrised Least Squares

Lachlan MacDonald, Hancheng Min, Leandro Palma et al.

NEURIPS 2025arXiv:2510.17506
#16184

Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation

Zhenhua Ning, Zhuotao Tian, Shaoshuai Shi et al.

ICCV 2025arXiv:2506.23120
#16185

OVG-HQ: Online Video Grounding with Hybrid-modal Queries

Runhao Zeng, Jiaqi Mao, Minghao Lai et al.

ICCV 2025arXiv:2508.11903
#16186

HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?

Yusen Zhang, Wenliang Zheng, Aashrith Madasu et al.

ICCV 2025arXiv:2504.18406
#16187

BézierGS: Dynamic Urban Scene Reconstruction with Bézier Curve Gaussian Splatting

Zipei Ma, Junzhe Jiang, Yurui Chen et al.

ICCV 2025arXiv:2506.22099
#16188

SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

Buyun Liang, Liangzu Peng, Jinqi Luo et al.

NEURIPS 2025arXiv:2510.04398
#16189

CLIPSym: Delving into Symmetry Detection with CLIP

Tinghan Yang, Md Ashiqur Rahman, Raymond A. Yeh

ICCV 2025arXiv:2508.14197
#16190

HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics

Gueter Josmy Faure, Jia-Fong Yeh, Min-Hung Chen et al.

ICCV 2025arXiv:2408.17443
#16191

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

Yuzhang Shang, Mu Cai, Bingxin Xu et al.

ICCV 2025arXiv:2403.15388
#16192

Text2VDM: Text to Vector Displacement Maps for Expressive and Interactive 3D Sculpting

Hengyu Meng, Duotun Wang, Zhijing Shao et al.

ICCV 2025arXiv:2502.20045
#16193

Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

Mark Endo, Xiaohan Wang, Serena Yeung-Levy

ICCV 2025arXiv:2412.13180
#16194

Towards Comprehensive Lecture Slides Understanding: Large-scale Dataset and Effective Method

Enming Zhang, Yuzhe Li, Yuliang Liu et al.

ICCV 2025
#16195

A Unified Interpretation of Training-Time Out-of-Distribution Detection

Xu Cheng, Xin Jiang, Zechao Li

ICCV 2025highlight
#16196

Attention on the Sphere

Boris Bonev, Max Rietmann, Andrea Paris et al.

NEURIPS 2025arXiv:2505.11157
#16197

Federated Domain Generalization with Domain-specific Soft Prompts Generation

Jianhan Wu, Xiaoyang Qu, Zhangcheng Huang et al.

ICCV 2025arXiv:2509.20807
#16198

Removing Out-of-Focus Reflective Flares via Color Alignment

Fengbo Lan, Chang Wen Chen

ICCV 2025
#16199

ForgeLens: Data-Efficient Forgery Focus for Generalizable Forgery Image Detection

Yingjian Chen, Lei Zhang, Yakun Niu

ICCV 2025arXiv:2408.13697
#16200

DIH-CLIP: Unleashing the Diversity of Multi-Head Self-Attention for Training-Free Open-Vocabulary Semantic Segmentation

Songsong Duan, Xi Yang, Nannan Wang

ICCV 2025