Most Cited 2025 "open-ended benchmarks" Papers

22,274 papers found • Page 110 of 112

#21801

B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens

Zhuqiang Lu, Zhenfei Yin, Mengwei He et al.

ICCV 2025posterarXiv:2412.09919
#21802

Pretend Benign: A Stealthy Adversarial Attack by Exploiting Vulnerabilities in Cooperative Perception

Hongwei Lin, Dongyu Pan, Qiming Xia et al.

ICCV 2025poster
#21803

What we need is explicit controllability: Training 3D gaze estimator using only facial images

Tingwei Li, Jun Bao, Zhenzhong Kuang et al.

ICCV 2025poster
#21804

SemiVisBooster: Boosting Semi-Supervised Learning for Fine-Grained Classification through Pseudo-Label Semantic Guidance

Wenjin Zhang, Xinyu Li, Chenyang Gao et al.

ICCV 2025poster
#21805

ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba

Juncan Deng, Shuaiting Li, Zeyu Wang et al.

ICCV 2025posterarXiv:2503.09509
#21806

Enhancing Prompt Generation with Adaptive Refinement for Camouflaged Object Detection

Xuehan Chen, Guangyu Ren, Tianhong Dai et al.

ICCV 2025poster
#21807

Hypergraph Clustering Network with Partial Attribute Imputation

Qianqian Wang, Bowen Zhao, Zhengming Ding et al.

ICCV 2025poster
#21808

Dual-Path Temporal Decoder for End-to-End Multi-Object Tracking

Hyunseop Kim, Juheon Jeong, Hanul Kim et al.

NEURIPS 2025oral
#21809

SAMPLE: Semantic Alignment through Temporal-Adaptive Multimodal Prompt Learning for Event-Based Open-Vocabulary Action Recognition

Jing Wang, Rui Zhao, Ruiqin Xiong et al.

ICCV 2025poster
#21810

Learning Null Geodesics for Gravitational Lensing Rendering in General Relativity

Mingyuan Sun, Zheng Fang, Jiaxu Wang et al.

ICCV 2025posterarXiv:2507.15775
#21811

Object-centric Video Question Answering with Visual Grounding and Referring

Haochen Wang, Qirui Chen, Cilin Yan et al.

ICCV 2025posterarXiv:2507.19599
#21812

DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness

Ruining Li, Chuanxia Zheng, Christian Rupprecht et al.

ICCV 2025highlightarXiv:2503.22677
#21813

EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds

Lu Chen, Yizhou Wang, SHIXIANG TANG et al.

ICCV 2025posterarXiv:2502.05857
#21814

Unbiased Missing-modality Multimodal Learning

Ruiting Dai, Chenxi Li, Yandong Yan et al.

ICCV 2025poster
#21815

Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval

Bangxiang Lan, Ruobing Xie, Ruixiang Zhao et al.

ICCV 2025posterarXiv:2509.04773
#21816

LIRA: Reasoning Reconstruction via Multimodal Large Language Models

Zhen Zhou, Tong Wang, Yunkai Ma et al.

ICCV 2025poster
#21817

MEH: A Multi-Style Dataset and Toolkit for Advancing Egyptian Hieroglyph Recognition

Maksim Golyadkin, Rubanova Alexandrovna, Aleksandr Utkov et al.

ICCV 2025poster
#21818

MaskSAM: Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation

Bin Xie, Hao Tang, Bin Duan et al.

ICCV 2025poster
#21819

Breaking Grid Constraints: Dynamic Graph Reconstruction Network for Multi-organ Segmentation

Junhao Xiao, Yang Wei, Jingyu Wang et al.

ICCV 2025poster
#21820

Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training

Wooseong Jeong, Jegyeong Cho, Youngho Yoon et al.

ICCV 2025posterarXiv:2507.07778
#21821

Learning an Implicit Physics Model for Image-based Fluid Simulation

Emily Jia, Jiageng Mao, Zhiyuan Gao et al.

ICCV 2025posterarXiv:2508.08254
#21822

Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning

Zeyu Xi, Haoying Sun, Yaofei Wu et al.

ICCV 2025posterarXiv:2507.20163
#21823

Exploiting Frequency Dynamics for Enhanced Multimodal Event-based Action Recognition

Meiqi Cao, Xiangbo Shu, Xin Jiang et al.

ICCV 2025poster
#21824

Enrich and Detect: Video Temporal Grounding with Multimodal LLMs

Shraman Pramanick, Effrosyni Mavroudi, Yale Song et al.

ICCV 2025highlightarXiv:2510.17023
#21825

First Attentions Last: Better Exploiting First Attentions for Efficient Parallel Training

Gyudong Kim, Hyukju Na, Jin Kim et al.

NEURIPS 2025poster
#21826

Region-aware Anchoring Mechanism for Efficient Referring Visual Grounding

Shuyi Ouyang, Ziwei Niu, Hongyi Wang et al.

ICCV 2025poster
#21827

Token-Efficient VLM: High-Resolution Image Understanding via Dynamic Region Proposal

Yitong Jiang, Jinwei Gu, Tianfan Xue et al.

ICCV 2025highlight
#21828

How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach

Chirui CHANG, Jiahui Liu, Zhengzhe Liu et al.

ICCV 2025posterarXiv:2406.19568
#21829

SIC: Similarity-Based Interpretable Image Classification with Neural Networks

Tom Nuno Wolf, Emre Kavak, Fabian Bongratz et al.

ICCV 2025posterarXiv:2501.17328
#21830

Teaching AI the Anatomy Behind the Scan: Addressing Anatomical Flaws in Medical Image Segmentation with Learnable Prior

Young Seok Jeon, Hongfei Yang, Huazhu Fu et al.

ICCV 2025posterarXiv:2403.18878
#21831

Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data

Qi Chen, Xinze Zhou, Chen Liu et al.

ICCV 2025posterarXiv:2510.14831
#21832

Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration

Ting Lei, Shaofeng Yin, Qingchao Chen et al.

ICCV 2025posterarXiv:2508.03207
#21833

What Moves the Eyes: Doubling Mechanistic Model Performance Using Deep Networks to Discover and Test Cognitive Hypotheses

Federico D'Agostino, Lisa Schwetlick, Matthias Bethge et al.

NEURIPS 2025oral
#21834

LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation

Xinyu Yan, Meijun Sun, Ge-Peng Ji et al.

ICCV 2025posterarXiv:2508.01152
#21835

VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-based Group Relative Policy Optimization

Xinye Cao, Hongcan Guo, Jiawen Qian et al.

ICCV 2025posterarXiv:2510.06040
#21836

WIPES: Wavelet-based Visual Primitives

Wenhao Zhang, Hao Zhu, Delong Wu et al.

ICCV 2025posterarXiv:2508.12615
#21837

MambaML: Exploring State Space Models for Multi-Label Image Classification

Xuelin Zhu, Jian liu, Jiuxin Cao et al.

ICCV 2025poster
#21838

SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting

Shuaiting Li, Juncan Deng, Chengxuan Wang et al.

ICCV 2025posterarXiv:2503.08668
#21839

Vision-Language Neural Graph Featurization for Extracting Retinal Lesions

Taimur Hassan, Anabia Sohail, Muzammal Naseer et al.

ICCV 2025poster
#21840

Flow-MIL: Constructing Highly-expressive Latent Feature Space For Whole Slide Image Classification Using Normalizing Flow

Yingfan MA, Bohan An, Ao Shen et al.

ICCV 2025poster
#21841

MotionBind: Multi-Modal Human Motion Alignment for Retrieval, Recognition, and Generation

Kaleab Kinfu, Rene Vidal

NEURIPS 2025oral
#21842

CoSMIC: Continual Self-supervised Learning for Multi-Domain Medical Imaging via Conditional Mutual Information Maximization

Yihang Liu, Ying Wen, Longzhen Yang et al.

ICCV 2025poster
#21843

Towards Robustness of Person Search against Corruptions

Woojung Son, Yoonki Cho, Guoyuan An et al.

ICCV 2025poster
#21844

VIPerson: Flexibly Generating Virtual Identity for Person Re-Identification

Xiao-Wen Zhang, Delong Zhang, Yi-Xing Peng et al.

ICCV 2025poster
#21845

SEAL: Semantic Aware Image Watermarking

Kasra Arabi, R. Teal Witter, Chinmay Hegde et al.

ICCV 2025posterarXiv:2503.12172
#21846

ArchiSet: Benchmarking Editable and Consistent Single-View 3D Reconstruction of Buildings with Specific Window-to-Wall Ratios

Jun Yin, Pengyu Zeng, Licheng Shen et al.

ICCV 2025poster
#21847

UINavBench: A Framework for Comprehensive Evaluation of Interactive Digital Agents

Harsh Agrawal, Eldon Schoop, Xinlei Pan et al.

ICCV 2025poster
#21848

Unsupervised Identification of Protein Compositions and Conformations via Implicit Content-Transformation Disentanglement

Mostofa Rafid Uddin, Jana Armouti, Min Xu

ICCV 2025poster
#21849

How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?

Yujian Lee, Peng Gao, Yongqi Xu et al.

ICCV 2025posterarXiv:2601.08133
#21850

Splat-based 3D Scene Reconstruction with Extreme Motion-blur

Hyeonjoong Jang, Dongyoung Choi, Donggun Kim et al.

ICCV 2025poster
#21851

Diffusion Curriculum: Synthetic-to-Real Data Curriculum via Image-Guided Diffusion

Yijun Liang, Shweta Bhardwaj, Tianyi Zhou

ICCV 2025posterarXiv:2410.13674
#21852

Unsupervised Histopathological Image Semantic Segmentation with Overlapping Patches Consistency Constraint

Wentian Cai, Weizhao Weng, Zihao Huang et al.

ICCV 2025poster
#21853

VISO: Accelerating In-orbit Object Detection with Language-Guided Mask Learning and Sparse Inference

Meiqi Wang, Han Qiu

ICCV 2025poster
#21854

Advancing Textual Prompt Learning with Anchored Attributes

Zheng Li, Yibing Song, Ming-Ming Cheng et al.

ICCV 2025posterarXiv:2412.09442
#21855

FIND: Few-Shot Anomaly Inspection with Normal-Only Multi-Modal Data

YITING LI, Fayao Liu, Jingyi Liao et al.

ICCV 2025poster
#21856

AR-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction

Xuying Zhang, Yupeng Zhou, Kai Wang et al.

ICCV 2025poster
#21857

DC-TTA: Divide-and-Conquer Framework for Test-Time Adaptation of Interactive Segmentation

Jihun Kim, Hoyong Kwon, Hyeokjun Kweon et al.

ICCV 2025posterarXiv:2506.23104
#21858

Dual-Rate Dynamic Teacher for Source-Free Domain Adaptive Object Detection

Qi He, Xiao Wu, Jun-Yan He et al.

ICCV 2025poster
#21859

From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment

Yucheng Suo, Fan Ma, Linchao Zhu et al.

ICCV 2025posterarXiv:2503.20472
#21860

OV3D-CG: Open-vocabulary 3D Instance Segmentation with Contextual Guidance

Mingquan Zhou, Chen He, Ruiping Wang et al.

ICCV 2025poster
#21861

ReferEverything: Towards Segmenting Everything We Can Speak of in Videos

Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang et al.

ICCV 2025posterarXiv:2410.23287
#21862

Neural Collapse under Gradient Flow on Shallow ReLU Networks for Orthogonally Separable Data

Hancheng Min, Zhihui Zhu, Rene Vidal

NEURIPS 2025posterarXiv:2510.21078
#21863

Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis

Peng Zheng, Junke Wang, Yi Chang et al.

ICCV 2025posterarXiv:2507.01756
#21864

Accelerate 3D Object Detection Models via Zero-Shot Attention Key Pruning

Lizhen Xu, Xiuxiu Bai, Xiaojun Jia et al.

ICCV 2025posterarXiv:2503.08101
#21865

CogCM: Cognition-Inspired Contextual Modeling for Audio-Visual Speech Enhancement

Feixiang Wang, Shuang Yang, Shiguang Shan et al.

ICCV 2025poster
#21866

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Zhisheng Zhong, Chengyao Wang, Yuqi Liu et al.

ICCV 2025posterarXiv:2412.09501
#21867

CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model

Yuxuan Luo, Jiaqi Tang, Chenyi Huang et al.

ICCV 2025posterarXiv:2503.06472
#21868

EDFFDNet: Towards Accurate and Efficient Unsupervised Multi-Grid Image Registration

Haokai Zhu, Bo Qu, Si-Yuan Cao et al.

ICCV 2025posterarXiv:2509.07662
#21869

Enhancing Mamba Decoder with Bidirectional Interaction in Multi-Task Dense Prediction

Mang Cao, Sanping Zhou, Yizhe Li et al.

ICCV 2025posterarXiv:2508.20376
#21870

Leveraging Debiased Cross-modal Attention Maps and Code-based Reasoning for Zero-shot Referring Expression Comprehension

Juntao Chen, Wen Shen, Zhihua Wei et al.

ICCV 2025poster
#21871

UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling

Peiming Li, Ziyi Wang, Yulin Yuan et al.

ICCV 2025posterarXiv:2508.14604
#21872

SITE: towards Spatial Intelligence Thorough Evaluation

Wenqi Wang, Reuben Tan, Pengyue Zhu et al.

ICCV 2025posterarXiv:2505.05456
#21873

SHIFT: Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models

Sudong Wang, Yunjian Zhang, Yao Zhu et al.

ICCV 2025poster
#21874

Similarity Memory Prior is All You Need for Medical Image Segmentation

Hao Tang, Zhiqing Guo, Liejun Wang et al.

ICCV 2025highlightarXiv:2507.00585
#21875

ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches

Nandish Chattopadhyay, Amira Guesmi, Muhammad Abdullah Hanif et al.

ICCV 2025posterarXiv:2311.12084
#21876

Debiasing Trace Guidance: Top-down Trace Distillation and Bottom-up Velocity Alignment for Unsupervised Anomaly Detection

Xingjian Wang, Li Chai, Jiming Chen

ICCV 2025
#21877

Conformal Prediction for Zero-Shot Models

Julio Silva-Rodríguez, Ismail Ben Ayed, Jose Dolz

CVPR 2025posterarXiv:2505.24693
#21878

Automated Red Teaming for Text-to-Image Models through Feedback-Guided Prompt Iteration with Vision-Language Models

Wei Xu, Kangjie Chen, Jiawei Qiu et al.

ICCV 2025poster
#21879

Convergence Rates for Gradient Descent on the Edge of Stability for Overparametrised Least Squares

Lachlan MacDonald, Hancheng Min, Leandro Palma et al.

NEURIPS 2025posterarXiv:2510.17506
#21880

Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation

Zhenhua Ning, Zhuotao Tian, Shaoshuai Shi et al.

ICCV 2025posterarXiv:2506.23120
#21881

OVG-HQ: Online Video Grounding with Hybrid-modal Queries

Runhao Zeng, Jiaqi Mao, Minghao Lai et al.

ICCV 2025posterarXiv:2508.11903
#21882

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring

Yufei Zhan, Shurong Zheng, Yousong Zhu et al.

ICCV 2025posterarXiv:2403.09333
#21883

BézierGS: Dynamic Urban Scene Reconstruction with Bézier Curve Gaussian Splatting

Zipei Ma, Junzhe Jiang, Yurui Chen et al.

ICCV 2025posterarXiv:2506.22099
#21884

SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

Buyun Liang, Liangzu Peng, Jinqi Luo et al.

NEURIPS 2025posterarXiv:2510.04398
#21885

CLIPSym: Delving into Symmetry Detection with CLIP

Tinghan Yang, Md Ashiqur Rahman, Raymond A. Yeh

ICCV 2025posterarXiv:2508.14197
#21886

HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?

Yusen Zhang, Wenliang Zheng, Aashrith Madasu et al.

ICCV 2025posterarXiv:2504.18406
#21887

HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics

Gueter Josmy Faure, Jia-Fong Yeh, Min-Hung Chen et al.

ICCV 2025posterarXiv:2408.17443
#21888

Text2VDM: Text to Vector Displacement Maps for Expressive and Interactive 3D Sculpting

Hengyu Meng, Duotun Wang, Zhijing Shao et al.

ICCV 2025posterarXiv:2502.20045
#21889

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

Yuzhang Shang, Mu Cai, Bingxin Xu et al.

ICCV 2025posterarXiv:2403.15388
#21890

Towards Comprehensive Lecture Slides Understanding: Large-scale Dataset and Effective Method

Enming Zhang, Yuzhe Li, Yuliang Liu et al.

ICCV 2025poster
#21891

A Unified Interpretation of Training-Time Out-of-Distribution Detection

Xu Cheng, Xin Jiang, Zechao Li

ICCV 2025highlight
#21892

Attention on the Sphere

Boris Bonev, Max Rietmann, Andrea Paris et al.

NEURIPS 2025posterarXiv:2505.11157
#21893

Federated Domain Generalization with Domain-specific Soft Prompts Generation

Jianhan Wu, Xiaoyang Qu, Zhangcheng Huang et al.

ICCV 2025posterarXiv:2509.20807
#21894

Removing Out-of-Focus Reflective Flares via Color Alignment

Fengbo Lan, Chang Wen Chen

ICCV 2025poster
#21895

ForgeLens: Data-Efficient Forgery Focus for Generalizable Forgery Image Detection

Yingjian Chen, Lei Zhang, Yakun Niu

ICCV 2025posterarXiv:2408.13697
#21896

Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

Mark Endo, Xiaohan Wang, Serena Yeung-Levy

ICCV 2025posterarXiv:2412.13180
#21897

Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures

Guoxing Sun, Rishabh Dabral, Heming Zhu et al.

CVPR 2025highlightarXiv:2412.13183
#21898

Mamba-3VL: Taming State Space Model for 3D Vision Language Learning

Yuan Wang, Yuxin Chen, Zhongang Qi et al.

ICCV 2025poster
#21899

Embodied Representation Alignment with Mirror Neurons

Wentao Zhu, Zhining Zhang, Yuwei Ren et al.

ICCV 2025posterarXiv:2509.21136
#21900

DIH-CLIP: Unleashing the Diversity of Multi-Head Self-Attention for Training-Free Open-Vocabulary Semantic Segmentation

Songsong Duan, Xi Yang, Nannan Wang

ICCV 2025poster
#21901

Selective Contrastive Learning for Weakly Supervised Affordance Grounding

WonJun Moon, Hyun Seok Seong, Jae-Pil Heo

ICCV 2025posterarXiv:2508.07877
#21902

DASH: Detection and Assessment of Systematic Hallucinations of VLMs

Maximilian Augustin, Yannic Neuhaus, Matthias Hein

ICCV 2025posterarXiv:2503.23573
#21903

M2EIT: Multi-Domain Mixture of Experts for Robust Neural Inertial Tracking

Yan Li, Yang Xu, Changhao Chen et al.

ICCV 2025poster
#21904

MobileViCLIP: An Efficient Video-Text Model for Mobile Devices

Min Yang, Zihan Jia, Zhilin Dai et al.

ICCV 2025posterarXiv:2508.07312
#21905

scGeneScope: A Treatment-Matched Single Cell Imaging and Transcriptomics Dataset and Benchmark for Treatment Response Modeling

Joel Dapello, Marcel Nassar, Ridvan Eksi et al.

NEURIPS 2025poster
#21906

No More Sibling Rivalry: Debiasing Human-Object Interaction Detection

Bin Yang, Yulin Zhang, Hong-Yu Zhou et al.

ICCV 2025posterarXiv:2509.00760
#21907

Memory-Efficient 4-bit Preconditioned Stochastic Optimization

Jingyang Li, Kuangyu Ding, Kim-chuan Toh et al.

ICCV 2025posterarXiv:2412.10663
#21908

Prompt-driven Transferable Adversarial Attack on Person Re-Identification with Attribute-aware Textual Inversion

Yuan Bian, Min Liu, Yunqi Yi et al.

ICCV 2025posterarXiv:2502.19697
#21909

EVOLVE: Event-Guided Deformable Feature Transfer and Dual-Memory Refinement for Low-Light Video Object Segmentation

Jong Hyeon Baek, Jiwon oh, Yeong Jun Koh

ICCV 2025poster
#21910

MATE: Motion-Augmented Temporal Consistency for Event-based Point Tracking

Han Han, Wei Zhai, Yang Cao et al.

ICCV 2025posterarXiv:2412.01300
#21911

Asynchronous Event Error-Minimizing Noise for Safeguarding Event Dataset

Ruofei WANG, Peiqi Duan, Boxin Shi et al.

ICCV 2025highlightarXiv:2507.05728
#21912

AG2aussian: Anchor-Graph Structured Gaussian Splatting for Instance-Level 3D Scene Understanding and Editing

Zhaonan Wang, Manyi Li, Changhe Tu

ICCV 2025poster
#21913

Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision

Yuting He, Shuo Li

ICCV 2025posterarXiv:2506.20850
#21914

InterGSEdit: Interactive 3D Gaussian Splatting Editing with 3D Geometry-Consistent Attention Prior

Minghao Wen, Shengjie Wu, Kangkan Wang et al.

ICCV 2025posterarXiv:2507.04961
#21915

Anomaly Detection of Integrated Circuits Package Substrates Using the Large Vision Model SAIC: Dataset Construction, Methodology, and Application

Ruiyun Yu, Bingyang Guo, Haoyuan Li

ICCV 2025poster
#21916

Learnable Retrieval Enhanced Visual-Text Alignment and Fusion for Radiology Report Generation

Qin Zhou, Guoyan Liang, Xindi Li et al.

ICCV 2025posterarXiv:2507.07568
#21917

Benchmarking Multimodal Large Language Models Against Image Corruptions

Xinkuan Qiu, Meina Kan, Yongbin Zhou et al.

ICCV 2025poster
#21918

Temporal-aware Query Routing for Real-time Video Instance Segmentation

Zesen Cheng, Kehan Li, Yian Zhao et al.

ICCV 2025poster
#21919

Weak-to-Strong Generalization under Distribution Shifts

Myeongho Jeon, Jan Sobotka, Suhwan Choi et al.

NEURIPS 2025posterarXiv:2510.21332
#21920

RvLLM: LLM Runtime Verification with Domain Knowledge

Yedi Zhang, Sun Emma, Annabelle En et al.

NEURIPS 2025posterarXiv:2505.18585
#21921

UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image

Xingyu Liu, Gu Wang, Ruida Zhang et al.

CVPR 2025posterarXiv:2411.16106
#21922

Dynamic Dictionary Learning for Remote Sensing Image Segmentation

Xuechao Zou, Yue Li, Shun Zhang et al.

ICCV 2025posterarXiv:2503.06683
#21923

Efficient Fine-Tuning of Large Models via Nested Low-Rank Adaptation

Lujun Li, Cheng Lin, Dezhi Li et al.

ICCV 2025poster
#21924

Dual-level Prototype Learning for Composite Degraded Image Restoration

Zhongze Wang, Haitao Zhao, Lujian Yao et al.

ICCV 2025poster
#21925

HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models

ZHIXIANG WEI, Guangting Wang, Xiaoxiao Ma et al.

ICCV 2025posterarXiv:2507.22431
#21926

Worse than Zero-shot? A Fact-Checking Dataset for Evaluating the Robustness of RAG Against Misleading Retrievals

Linda Zeng, Rithwik Gupta, Divij Motwani et al.

NEURIPS 2025posterarXiv:2502.16101
#21927

AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting

Chung-Ho Wu, Yang-Jung Chen, Ying-Huan Chen et al.

CVPR 2025posterarXiv:2502.05176
#21928

Deterministic Object Pose Confidence Region Estimation

Jinghao Wang, Zhang Li, Zi Wang et al.

ICCV 2025posterarXiv:2506.22720
#21929

Is CLIP ideal? No. Can we fix it? Yes!

Raphaela Kang, Yue Song, Georgia Gkioxari et al.

ICCV 2025posterarXiv:2503.08723
#21930

Learning Beyond Still Frames: Scaling Vision-Language Models with Video

Yiyuan Zhang, Handong Li, Jing Liu et al.

ICCV 2025poster
#21931

Efficient Input-level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Variation

Shengfang ZHAI, Jiajun Li, Yue Liu et al.

ICCV 2025highlightarXiv:2503.06453
#21932

Decoupled Multi-Predictor Optimization for Inference-Efficient Model Tuning

Liwei Luo, Shuaitengyuan Li, Dongwei Ren et al.

ICCV 2025posterarXiv:2511.03245
#21933

Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment

Shi-Chen Zhang, Yunheng Li, Yu-Huan Wu et al.

ICCV 2025posterarXiv:2508.08811
#21934

ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation

Qizhen Lan, Qing Tian

ICCV 2025posterarXiv:2503.06307
#21935

GReg: Geometry-Aware Region Refinement for Sign Language Video Generation

Tongkai Shi, Lianyu Hu, Fanhua Shang et al.

ICCV 2025poster
#21936

Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding

Haoran Zhou, Gim Hee Lee

NEURIPS 2025oralarXiv:2512.03601
#21937

Unsupervised Part Discovery via Descriptor-Based Masked Image Restoration with Optimized Constraints

Jiahao Xia, Yike Wu, Wenjian Huang et al.

ICCV 2025posterarXiv:2507.11985
#21938

NETracer: A Topology-Aware Iterative Tracing Approach for Tubular Structure Extraction

Chao Liu, Yangbo Jiang, Nenggan Zheng

ICCV 2025poster
#21939

Interpretable point cloud classification using multiple instance learning

Matt De Vries, Reed Naidoo, Olga Fourkioti et al.

ICCV 2025highlight
#21940

MotionCtrl: A Real-time Controllable Vision-Language-Motion Model

Bin Cao, Sipeng Zheng, Ye Wang et al.

ICCV 2025poster
#21941

UIPro: Unleashing Superior Interaction Capability For GUI Agents

Hongxin Li, Jingran Su, Jingfan CHEN et al.

ICCV 2025posterarXiv:2509.17328
#21942

SALAD -- Semantics-Aware Logical Anomaly Detection

Matic Fučka, Vitjan Zavrtanik, Danijel Skocaj

ICCV 2025posterarXiv:2509.02101
#21943

FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing

Bizhu Wu, Jinheng Xie, Meidan Ding et al.

ICCV 2025posterarXiv:2507.19850
#21944

Controllable Latent Space Augmentation for Digital Pathology

Sofiène Boutaj, Marin Scalbert, Pierre Marza et al.

ICCV 2025posterarXiv:2508.14588
#21945

Advancing Visual Large Language Model for Multi-granular Versatile Perception

Wentao Xiang, Haoxian Tan, Cong Wei et al.

ICCV 2025posterarXiv:2507.16213
#21946

Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection

Ji Du, Xin WANG, Fangwei Hao et al.

ICCV 2025posterarXiv:2510.18437
#21947

Pseudo-Riemannian Graph Transformer

Viet Quan Le, Cuong Viet Ta

NEURIPS 2025poster
#21948

Modeling Saliency Dataset Bias

Matthias Kümmerer, Harneet Singh Khanuja, Matthias Bethge

ICCV 2025highlightarXiv:2505.10169
#21949

VLR-Driver: Large Vision-Language-Reasoning Models for Embodied Autonomous Driving

Fanjie Kong, Yitong Li, Weihuang Chen et al.

ICCV 2025poster
#21950

Vid-Group: Temporal Video Grounding Pretraining from Unlabeled Videos in the Wild

Peijun Bao, Chenqi Kong, SIYUAN YANG et al.

ICCV 2025poster
#21951

CARIM: Caption-Based Autonomous Driving Scene Retrieval via Inclusive Text Matching

Minjoo Ki, Dae Jung Kim, Kisung Kim et al.

ICCV 2025poster
#21952

Knowledge Transfer from Interaction Learning

Yilin Gao, Kangyi Chen, Zhongxing Peng et al.

ICCV 2025posterarXiv:2509.18733
#21953

WIR3D: Visually-Informed and Geometry-Aware 3D Shape Abstraction

Richard Liu, Daniel Fu, Noah Tan et al.

ICCV 2025posterarXiv:2505.04813
#21954

Temperature in Cosine-based Softmax Loss

Takumi Kobayashi

ICCV 2025poster
#21955

Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders

Mengyu Ye, Jun Suzuki, Tatsuro Inaba et al.

NEURIPS 2025posterarXiv:2510.22332
#21956

Multi-modal Segment Anything Model for Camouflaged Scene Segmentation

Guangyu Ren, Hengyan Liu, Michalis Lazarou et al.

ICCV 2025poster
#21957

WeaveSeg: Iterative Contrast-weaving and Spectral Feature-refining for Nuclei Instance Segmentation

Jiajia Li, Huisi Wu, Jing Qin

ICCV 2025highlight
#21958

DisTime: Distribution-based Time Representation for Video Large Language Models

yingsen zeng, Zepeng Huang, Yujie Zhong et al.

ICCV 2025posterarXiv:2505.24329
#21959

Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection

Jinglun Li, Kaixun Jiang, Zhaoyu Chen et al.

ICCV 2025highlightarXiv:2507.10225
#21960

Cassic: Towards Content-Adaptive State-Space Models for Learned Image Compression

Shiyu Qin, Jinpeng Wang, Yimin Zhou et al.

ICCV 2025poster
#21961

SpectralAR: Spectral Autoregressive Visual Generation

Yuanhui Huang, Weiliang Chen, Wenzhao Zheng et al.

ICCV 2025posterarXiv:2506.10962
#21962

Bridging the Gap between Brain and Machine in Interpreting Visual Semantics: Towards Self-adaptive Brain-to-Text Decoding

Jiaxuan Chen, Yu Qi, Yueming Wang et al.

ICCV 2025poster
#21963

Boosting Adversarial Transferability via Negative Hessian Trace Regularization

Yunfei Long, Zilin Tian, Liguo Zhang et al.

ICCV 2025poster
#21964

AcZeroTS: Active Learning for Zero-shot Tissue Segmentation in Pathology Images

Jiao Tang, Junjie Zhou, Bo Qian et al.

ICCV 2025poster
#21965

OneGT: One-Shot Geometry-Texture Neural Rendering for Head Avatars

Jinshu Chen, Bingchuan Li, Fan Zhang et al.

ICCV 2025poster
#21966

On the sample complexity of semi-supervised multi-objective learning

Tobias Wegel, Geelon So, Junhyung Park et al.

NEURIPS 2025spotlightarXiv:2508.17152
#21967

Unsupervised Visible-Infrared Person Re-identification under Unpaired Settings

Haoyu Yao, Bin Yang, Wenke Huang et al.

ICCV 2025poster
#21968

Adaptive Prompt Learning via Gaussian Outlier Synthesis for Out-of-distribution Detection

Yongkang Zhang, Dongyu She, Zhong Zhou

ICCV 2025poster
#21969

Auto-Controlled Image Perception in MLLMs via Visual Perception Tokens

Runpeng Yu, Xinyin Ma, Xinchao Wang

ICCV 2025poster
#21970

Can We Achieve Efficient Diffusion Without Self-Attention? Distilling Self-Attention into Convolutions

ZiYi Dong, Chengxing Zhou, Weijian Deng et al.

ICCV 2025posterarXiv:2504.21292
#21971

Ultra-Precision 6DoF Pose Estimation Using 2-D Interpolated Discrete Fourier Transform

Guowei Shi, Zian Mao, Peisen Huang

ICCV 2025poster
#21972

Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval

WonJun Moon, Cheol-Ho Cho, Woojin Jun et al.

ICCV 2025posterarXiv:2504.13035
#21973

Information-theoretic Generalization Analysis for VQ-VAEs: A Role of Latent Variables

Futoshi Futami, Masahiro Fujisawa

NEURIPS 2025posterarXiv:2505.19470
#21974

A Differentiable Wave Optics Model for End-to-End Computational Imaging System Optimization

Chi-Jui Ho, Yash Belhe, Steve Rotenberg et al.

ICCV 2025posterarXiv:2412.09774
#21975

Exploring Probabilistic Modeling Beyond Domain Generalization for Semantic Segmentation

I-Hsiang Chen, Hua-En Chang, Wei-Ting Chen et al.

ICCV 2025posterarXiv:2507.21367
#21976

AMDANet: Attention-Driven Multi-Perspective Discrepancy Alignment for RGB-Infrared Image Fusion and Segmentation

Haifeng Zhong, Fan Tang, Zhuo Chen et al.

ICCV 2025poster
#21977

DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs

JIAHE ZHAO, rongkun Zheng, Yi Wang et al.

ICCV 2025posterarXiv:2507.10302
#21978

OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics

YeonJi Song, Jaein Kim, Suhyung Choi et al.

ICCV 2025posterarXiv:2404.18423
#21979

Contextual Dynamic Pricing with Heterogeneous Buyers

Thodoris Lykouris, Sloan Nietert, Princewill Okoroafor et al.

NEURIPS 2025posterarXiv:2512.09513
#21980

Prompt Guidance and Human Proximal Perception for HOT Prediction with Regional Joint Loss

Yuxiao Wang, Yu Lei, Zhenao WEI et al.

ICCV 2025posterarXiv:2507.01630
#21981

RA-BUSSeg: Relation-aware Semi-supervised Breast Ultrasound Image Segmentation via Adjacent Propagation and Cross-layer Alignment

Wanting ZHANG, Zhenhui Ding, Guilian Chen et al.

ICCV 2025poster
#21982

Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding

Minghang Zheng, Yuxin Peng, Benyuan Sun et al.

ICCV 2025posterarXiv:2508.04546
#21983

Coupling the Generator with Teacher for Effective Data-Free Knowledge Distillation

Xu Chen, Yang Li, Yahong Han et al.

ICCV 2025poster
#21984

Towards a Universal Image Degradation Model via Content-Degradation Disentanglement

Wenbo Yang, Zhongling Wang, Zhou Wang

ICCV 2025posterarXiv:2505.12860
#21985

Intra-view and Inter-view Correlation Guided Multi-view Novel Class Discovery

Xinhang Wan, Jiyuan Liu, Qian Qu et al.

ICCV 2025posterarXiv:2507.12029
#21986

HUST: High-Fidelity Unbiased Skin Tone Estimation via Texture Quantization

Zimin Ran, Xingyu Ren, Xiang An et al.

ICCV 2025poster
#21987

Few-Shot Pattern Detection via Template Matching and Regression

Eunchan Jo, Dahyun Kang, Sanghyun Kim et al.

ICCV 2025highlightarXiv:2508.17636
#21988

Know Your Attention Maps: Class-specific Token Masking for Weakly Supervised Semantic Segmentation

Joëlle Hanna, Damian Borth

ICCV 2025posterarXiv:2507.06848
#21989

Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation

Toshinori Kitamura, Arnob Ghosh, Tadashi Kozuno et al.

NEURIPS 2025spotlightarXiv:2502.10138
#21990

Structure-Guided Diffusion Models for High-Fidelity Portrait Shadow Removal

wanchang Yu, Qing Zhang, Rongjia Zheng et al.

ICCV 2025posterarXiv:2507.04692
#21991

FreeDNA: Endowing Domain Adaptation of Diffusion-Based Dense Prediction with Training-Free Domain Noise Alignment

Hang Xu, Jie Huang, Linjiang Huang et al.

ICCV 2025posterarXiv:2506.22509
#21992

ProbMED: A Probabilistic Framework for Medical Multimodal Binding

Yuan Gao, Sangwook Kim, Jianzhong You et al.

ICCV 2025posterarXiv:2509.25711
#21993

DecAD: Decoupling Anomalies in Latent Space for Multi-Class Unsupervised Anomaly Detection

Xiaolei Wang, Xiaoyang Wang, Huihui Bai et al.

ICCV 2025poster
#21994

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Yingyu Liang, Zhizhou Sha, Zhenmei Shi et al.

ICCV 2025posterarXiv:2405.16418
#21995

SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction

Enrico Pallotta, Sina Mokhtarzadeh Azar, Shuai Li et al.

CVPR 2025posterarXiv:2503.18933
#21996

MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs

Jiawei Mao, Yuhan Wang, Yucheng Tang et al.

ICCV 2025posterarXiv:2504.06897
#21997

FDPT: Federated Discrete Prompt Tuning for Black-Box Visual-Language Models

Jiaqi Wu, Simin Chen, Jing Tang et al.

ICCV 2025poster
#21998

STDDNet: Harnessing Mamba for Video Polyp Segmentation via Spatial-aligned Temporal Modeling and Discriminative Dynamic Representation Learning

Guilian Chen, Huisi Wu, Jing Qin

ICCV 2025poster
#21999

CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning

Duo Wu, Jinghe Wang, Yuan Meng et al.

ICCV 2025posterarXiv:2411.16313
#22000

Dynamic Group Detection using VLM-augmented Temporal Groupness Graph

Kaname Yokoyama, Chihiro Nakatani, Norimichi Ukita

ICCV 2025posterarXiv:2509.04758