Most Cited ICCV "global camera token" Papers

2,701 papers found • Page 11 of 14

#2001

A3GS: Arbitrary Artistic Style into Arbitrary 3D Gaussian Splatting

Zhiyuan Fang, Rengan Xie, Xuancheng Jin et al.

ICCV 2025
#2002

HouseTour: A Virtual Real Estate A(I)gent

Ata Çelen, Iro Armeni, Daniel Barath et al.

ICCV 2025arXiv:2510.18054
#2003

HyTIP: Hybrid Temporal Information Propagation for Masked Conditional Residual Video Coding

Yi-Hsin Chen, Yi-Chen Yao, Kuan-Wei Ho et al.

ICCV 2025arXiv:2508.02072
#2004

DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images

Kazuma Nagata, Naoshi Kaneko

ICCV 2025arXiv:2509.14685
#2005

Free2Guide: Training-Free Text-to-Video Alignment using Image LVLM

Jaemin Kim, Bryan Sangwoo Kim, Jong Ye

ICCV 2025
#2006

VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE

Yazhou Xing, Yang Fei, Yingqing He et al.

ICCV 2025
#2007

DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models

SeungHoo Hong, GeonHo Son, Juhun Lee et al.

ICCV 2025arXiv:2510.00778
#2008

GFPack++: Attention-Driven Gradient Fields for Optimizing 2D Irregular Packing

Tianyang Xue, Lin Lu, Yang Liu et al.

ICCV 2025highlight
#2009

LACONIC: A 3D Layout Adapter for Controllable Image Creation

Léopold Maillard, Tom Durand, Adrien RAMANANA RAHARY et al.

ICCV 2025arXiv:2507.03257
#2010

Preserve Anything: Controllable Image Synthesis with Object Preservation

Prasen Kumar Sharma, Neeraj Matiyali, Siddharth Srivastava et al.

ICCV 2025arXiv:2506.22531
#2011

Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing

Taihang Hu, Linxuan Li, Kai Wang et al.

ICCV 2025arXiv:2504.10434
#2012

Parametric Shadow Control for Portrait Generation in Text-to-Image Diffusion Models

Haoming Cai, Tsung-Wei Huang, Shiv Gehlot et al.

ICCV 2025arXiv:2503.21943
#2013

EEGMirror: Leveraging EEG data in the wild via Montage-Agnostic Self-Supervision for EEG to Video Decoding

Xuan-Hao Liu, Bao-liang Lu, Wei-Long Zheng

ICCV 2025
#2014

Accelerating Diffusion Sampling via Exploiting Local Transition Coherence

shangwen zhu, Han Zhang, Zhantao Yang et al.

ICCV 2025arXiv:2503.09675
#2015

SA-LUT: Spatial Adaptive 4D Look-Up Table for Photorealistic Style Transfer

Zerui Gong, Zhonghua Wu, Qingyi Tao et al.

ICCV 2025arXiv:2506.13465
#2016

UniversalBooth: Model-Agnostic Personalized Text-to-Image Generation

Songhua Liu, Ruonan Yu, Xinchao Wang

ICCV 2025
#2017

UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis

Yuanrui Wang, Cong Han, Yafei Li et al.

ICCV 2025arXiv:2507.00992
#2018

Semantic Discrepancy-aware Detector for Image Forgery Identification

Wang Ziye, Minghang Yu, Chunyan Xu et al.

ICCV 2025arXiv:2508.12341
#2019

CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching

Zizhuo Li, Yifan Lu, Linfeng Tang et al.

ICCV 2025highlightarXiv:2503.23925
#2020

Gain-MLP: Improving HDR Gain Map Encoding via a Lightweight MLP

Trevor Canham, SaiKiran Tedla, Michael Murdoch et al.

ICCV 2025arXiv:2503.11883
#2021

Less-to-More Generalization: Unlocking More Controllability by In-Context Generation

shaojin wu, Mengqi Huang, wenxu wu et al.

ICCV 2025arXiv:2504.02160
#2022

FlexGen: Flexible Multi-View Generation from Text and Image Inputs

Xinli Xu, Wenhang Ge, Jiantao Lin et al.

ICCV 2025arXiv:2410.10745
#2023

LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing

Achint Soni, Meet Soni, Sirisha Rambhatla

ICCV 2025arXiv:2503.21541
#2024

Teleportraits: Training-Free People Insertion into Any Scene

Jialu Gao, Joseph K J, Fernando De la Torre

ICCV 2025arXiv:2510.05660
#2025

DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution

Zheng-Peng Duan, jiawei zhang, Xin Jin et al.

ICCV 2025arXiv:2503.23580
#2026

FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning

Hang Guo, Yawei Li, Taolin Zhang et al.

ICCV 2025arXiv:2503.23367
#2027

QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing

Tiancheng SHEN, Jun Hao Liew, Zilong Huang et al.

ICCV 2025
#2028

Beyond Isolated Words: Diffusion Brush for Handwritten Text-Line Generation

Gang Dai, Yifan Zhang, Yutao Qin et al.

ICCV 2025arXiv:2508.03256
#2029

BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation

Ruotong Wang, Mingli Zhu, Jiarong Ou et al.

ICCV 2025arXiv:2504.16907
#2030

Pretrained Reversible Generation as Unsupervised Visual Representation Learning

Rongkun Xue, Jinouwen Zhang, Yazhe Niu et al.

ICCV 2025arXiv:2412.01787
#2031

Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection

Yichen Lu, Siwei Nie, Minlong Lu et al.

ICCV 2025
#2032

Beyond Brain Decoding: Visual-Semantic Reconstructions to Mental Creation Extension Based on fMRI

Haodong Jing, Dongyao Jiang, Yongqiang Ma et al.

ICCV 2025
#2033

PixTalk: Controlling Photorealistic Image Processing and Editing with Language

Marcos Conde, Zihao Lu, Radu Timofte

ICCV 2025
#2034

ADCD-Net: Robust Document Image Forgery Localization via Adaptive DCT Feature and Hierarchical Content Disentanglement

KA WONG, Jicheng Zhou, Haiwei Wu et al.

ICCV 2025arXiv:2507.16397
#2035

Towards Robust Defense against Customization via Protective Perturbation Resistant to Diffusion-based Purification

Wenkui Yang, Jie Cao, Junxian Duan et al.

ICCV 2025highlightarXiv:2509.13922
#2036

A Unified Framework for Industrial Cel-Animation Colorization with Temporal-Structural Awareness

Xiaoyi Feng, Tao Huang, Peng Wang et al.

ICCV 2025
#2037

Generative Video Bi-flow

Chen Liu, Tobias Ritschel

ICCV 2025arXiv:2503.06364
#2038

AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation

Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin et al.

ICCV 2025arXiv:2412.15191
#2039

T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation

Chieh-Yun Chen, Min Shi, Gong Zhang et al.

ICCV 2025arXiv:2507.20536
#2040

LayerLock: Non-collapsing Representation Learning with Progressive Freezing

Goker Erdogan, Nikhil Parthasarathy, Catalin Ionescu et al.

ICCV 2025arXiv:2509.10156
#2041

JPEG Processing Neural Operator for Backward-Compatible Coding

Woo Kyoung Han, Yongjun Lee, Byeonghun Lee et al.

ICCV 2025arXiv:2507.23521
#2042

All Parts Matter: A Unified Mask-Free Virtual Try-On Framework

Chenghu Du, Shengwu Xiong, Yi Rong

ICCV 2025
#2043

Function-centric Bayesian Network for Zero-Shot Object Goal Navigation

Sixian Zhang, Xinyao Yu, Xinhang Song et al.

ICCV 2025
#2044

Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions

Yiting Qu, Ziqing Yang, Yihan Ma et al.

ICCV 2025arXiv:2507.22617
#2045

DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space

Junyu Chen, Dongyun Zou, Wenkun He et al.

ICCV 2025arXiv:2508.00413
#2046

MH-LVC: Multi-Hypothesis Temporal Prediction for Learned Conditional Residual Video Coding

Gao Zong lin, Huu-Tai Phung, Yi-Chen Yao et al.

ICCV 2025arXiv:2510.12479
#2047

On the Provable Importance of Gradients for Autonomous Language-Assisted Image Clustering

Bo Peng, Jie Lu, Guangquan Zhang et al.

ICCV 2025highlight
#2048

Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive Segmentation

You Huang, Lichao Chen, Jiayi Ji et al.

ICCV 2025
#2049

HiERO: Understanding the Hierarchy of Human Behavior Enhances Reasoning on Egocentric Videos

Simone Alberto Peirone, Francesca Pistilli, Giuseppe Averta

ICCV 2025arXiv:2505.12911
#2050

CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning

Kuniaki Saito, Donghyun Kim, Kwanyong Park et al.

ICCV 2025highlightarXiv:2507.01409
#2051

An Efficient Hybrid Vision Transformer for TinyML Applications

Fanhong Zeng, Huanan LI, Juntao Guan et al.

ICCV 2025
#2052

Graph Domain Adaptation with Dual-branch Encoder and Two-level Alignment for Whole Slide Image-based Survival Prediction

Yuntao Shou, Xiangyong Cao, PeiqiangYan PeiqiangYan et al.

ICCV 2025arXiv:2411.14001
#2053

CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts

Olaf Dünkel, Artur Jesslen, Jiahao Xie et al.

ICCV 2025arXiv:2507.17651
#2054

Multi-Schema Proximity Network for Composed Image Retrieval

Jiangming Shi, Xiangbo Yin, yeyunchen yeyunchen et al.

ICCV 2025
#2055

ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations

Tianming Liang, Kun-Yu Lin, Chaolei Tan et al.

ICCV 2025arXiv:2501.14607
#2056

ESCNet:Edge-Semantic Collaborative Network for Camouflaged Object Detection

Sheng Ye, Xin Chen, Yan Zhang et al.

ICCV 2025
#2057

Test-time Adaptation for Foundation Medical Segmentation Model Without Parametric Updates

Kecheng Chen, Xinyu Luo, Tiexin Qin et al.

ICCV 2025highlightarXiv:2504.02008
#2058

Moment Quantization for Video Temporal Grounding

Xiaolong Sun, Le Wang, Sanping Zhou et al.

ICCV 2025arXiv:2504.02286
#2059

SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition

Yongkun Du, Zhineng Chen, Hongtao Xie et al.

ICCV 2025arXiv:2411.15858
#2060

ROVI: A VLM-LLM Re-Captioned Dataset for Open-Vocabulary Instance-Grounded Text-to-Image Generation

Cihang Peng, Qiming HOU, Zhong Ren et al.

ICCV 2025arXiv:2508.01008
#2061

Feature Purification Matters: Suppressing Outlier Propagation for Training-Free Open-Vocabulary Semantic Segmentation

Shuo Jin, Siyue Yu, Bingfeng Zhang et al.

ICCV 2025highlight
#2062

DiffPS: Leveraging Prior Knowledge of Diffusion Model for Person Search

Giyeol Kim, Sooyoung Yang, Jihyong Oh et al.

ICCV 2025highlight
#2063

Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching

Yuhan Liu, Jingwen Fu, Yang Wu et al.

ICCV 2025arXiv:2507.10318
#2064

Representation Shift: Unifying Token Compression with FlashAttention

Joonmyung Choi, Sanghyeok Lee, Byungoh Ko et al.

ICCV 2025arXiv:2508.00367
#2065

ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity

Yefei He, Feng Chen, Jing Liu et al.

ICCV 2025
#2066

ProSAM: Enhancing the Robustness of SAM-based Visual Reference Segmentation with Probabilistic Prompts

Xiaoqi Wang, Clint Sebastian, Wenbin He et al.

ICCV 2025arXiv:2506.21835
#2067

LaCoOT: Layer Collapse through Optimal Transport

Victor Quétu, Zhu LIAO, Nour Hezbri et al.

ICCV 2025arXiv:2406.08933
#2068

Zero-Shot Compositional Video Learning with Coding Rate Reduction

Heeseok Jung, Jun-Hyeon Bak, Yujin Jeong et al.

ICCV 2025
#2069

Fuzzy Contrastive Decoding to Alleviate Object Hallucination in Large Vision-Language Models

Jieun Kim, Jinmyeong Kim, Yoonji Kim et al.

ICCV 2025
#2070

Semantic versus Identity: A Divide-and-Conquer Approach towards Adjustable Medical Image De-Identification

Yuan Tian, Shuo Wang, Rongzhao Zhang et al.

ICCV 2025arXiv:2507.21703
#2071

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding

Yuanhan Zhang, Yunice Chew, Yuhao Dong et al.

ICCV 2025arXiv:2507.15028
#2072

Cross-View Isolated Sign Language Recognition via View Synthesis and Feature Disentanglement

Xin Shen, Xinyu Wang, Lei Shen et al.

ICCV 2025
#2073

Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction

Zeren Jiang, Chuanxia Zheng, Iro Laina et al.

ICCV 2025highlightarXiv:2504.07961
#2074

Superpowering Open-Vocabulary Object Detectors for X-ray Vision

Pablo Garcia-Fernandez, Lorenzo Vaquero, Mingxuan Liu et al.

ICCV 2025arXiv:2503.17071
#2075

RhythmGuassian: Repurposing Generalizable Gaussian Model For Remote Physiological Measurement

Hao LU, Yuting Zhang, Jiaqi Tang et al.

ICCV 2025highlight
#2076

On the Recovery of Cameras from Fundamental Matrices

Rakshith Madhavan, Federica Arrigoni

ICCV 2025highlight
#2077

The Devil is in the Spurious Correlations: Boosting Moment Retrieval with Dynamic Learning

Xinyang Zhou, Fanyue Wei, Lixin Duan et al.

ICCV 2025arXiv:2501.07305
#2078

CABLD: Contrast-Agnostic Brain Landmark Detection with Consistency-Based Regularization

Soorena Salari, Arash Harirpoush, Hassan Rivaz et al.

ICCV 2025arXiv:2411.17845
#2079

Robustifying Zero-Shot Vision Language Models by Subspaces Alignment

Junhao Dong, Piotr Koniusz, Liaoyuan Feng et al.

ICCV 2025
#2080

SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images

Yichi Zhang, Le Xue, Wenbo zhang et al.

ICCV 2025arXiv:2502.14351
#2081

Multi-View Slot Attention Using Paraphrased Texts for Face Anti-Spoofing

Jeongmin Yu, Susang Kim, Kisu Lee et al.

ICCV 2025arXiv:2509.06336
#2082

4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding

Wenxuan Zhu, Bing Li, Cheng Zheng et al.

ICCV 2025arXiv:2503.17827
#2083

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Weiming Ren, Wentao Ma, Huan Yang et al.

ICCV 2025arXiv:2503.11579
#2084

SAMora: Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images

Shuhang Chen, Hangjie Yuan, Pengwei Liu et al.

ICCV 2025arXiv:2511.08626
#2085

FE-CLIP: Frequency Enhanced CLIP Model for Zero-Shot Anomaly Detection and Segmentation

Tao Gong, Qi Chu, Bin Liu et al.

ICCV 2025
#2086

Text-guided Visual Prompt DINO for Generic Segmentation

Yuchen Guan, Chong Sun, Canmiao Fu et al.

ICCV 2025arXiv:2508.06146
#2087

Bias-Resilient Weakly Supervised Semantic Segmentation Using Normalizing Flows

Xianglin Qiu, Xiaoyang Wang, Zhen Zhang et al.

ICCV 2025
#2088

Cracking Instance Jigsaw Puzzles: A Superior Alternative to Multiple Instance Learning for Whole Slide Image Analysis

Xiwen Chen, Peijie Qiu, Wenhui Zhu et al.

ICCV 2025
#2089

STDDNet: Harnessing Mamba for Video Polyp Segmentation via Spatial-aligned Temporal Modeling and Discriminative Dynamic Representation Learning

Guilian Chen, Huisi Wu, Jing Qin

ICCV 2025
#2090

MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs

Jiawei Mao, Yuhan Wang, Yucheng Tang et al.

ICCV 2025arXiv:2504.06897
#2091

DecAD: Decoupling Anomalies in Latent Space for Multi-Class Unsupervised Anomaly Detection

Xiaolei Wang, Xiaoyang Wang, Huihui Bai et al.

ICCV 2025
#2092

Few-Shot Pattern Detection via Template Matching and Regression

Eunchan Jo, Dahyun Kang, Sanghyun Kim et al.

ICCV 2025highlightarXiv:2508.17636
#2093

Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding

Minghang Zheng, Yuxin Peng, Benyuan Sun et al.

ICCV 2025arXiv:2508.04546
#2094

RA-BUSSeg: Relation-aware Semi-supervised Breast Ultrasound Image Segmentation via Adjacent Propagation and Cross-layer Alignment

Wanting ZHANG, Zhenhui Ding, Guilian Chen et al.

ICCV 2025
#2095

DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs

JIAHE ZHAO, rongkun Zheng, Yi Wang et al.

ICCV 2025arXiv:2507.10302
#2096

Exploring Probabilistic Modeling Beyond Domain Generalization for Semantic Segmentation

I-Hsiang Chen, Hua-En Chang, Wei-Ting Chen et al.

ICCV 2025arXiv:2507.21367
#2097

Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval

WonJun Moon, Cheol-Ho Cho, Woojin Jun et al.

ICCV 2025arXiv:2504.13035
#2098

Auto-Controlled Image Perception in MLLMs via Visual Perception Tokens

Runpeng Yu, Xinyin Ma, Xinchao Wang

ICCV 2025
#2099

Bridging the Gap between Brain and Machine in Interpreting Visual Semantics: Towards Self-adaptive Brain-to-Text Decoding

Jiaxuan Chen, Yu Qi, Yueming Wang et al.

ICCV 2025
#2100

DisTime: Distribution-based Time Representation for Video Large Language Models

yingsen zeng, Zepeng Huang, Yujie Zhong et al.

ICCV 2025arXiv:2505.24329
#2101

WeaveSeg: Iterative Contrast-weaving and Spectral Feature-refining for Nuclei Instance Segmentation

Jiajia Li, Huisi Wu, Jing Qin

ICCV 2025highlight
#2102

CARIM: Caption-Based Autonomous Driving Scene Retrieval via Inclusive Text Matching

Minjoo Ki, Dae Jung Kim, Kisung Kim et al.

ICCV 2025
#2103

Modeling Saliency Dataset Bias

Matthias Kümmerer, Harneet Singh Khanuja, Matthias Bethge

ICCV 2025highlightarXiv:2505.10169
#2104

Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection

Ji Du, Xin WANG, Fangwei Hao et al.

ICCV 2025arXiv:2510.18437
#2105

Advancing Visual Large Language Model for Multi-granular Versatile Perception

Wentao Xiang, Haoxian Tan, Cong Wei et al.

ICCV 2025arXiv:2507.16213
#2106

Controllable Latent Space Augmentation for Digital Pathology

Sofiène Boutaj, Marin Scalbert, Pierre Marza et al.

ICCV 2025arXiv:2508.14588
#2107

Interpretable point cloud classification using multiple instance learning

Matt De Vries, Reed Naidoo, Olga Fourkioti et al.

ICCV 2025highlight
#2108

Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment

Shi-Chen Zhang, Yunheng Li, Yu-Huan Wu et al.

ICCV 2025arXiv:2508.08811
#2109

Learning Beyond Still Frames: Scaling Vision-Language Models with Video

Yiyuan Zhang, Handong Li, Jing Liu et al.

ICCV 2025
#2110

Is CLIP ideal? No. Can we fix it? Yes!

Raphaela Kang, Yue Song, Georgia Gkioxari et al.

ICCV 2025arXiv:2503.08723
#2111

HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models

ZHIXIANG WEI, Guangting Wang, Xiaoxiao Ma et al.

ICCV 2025arXiv:2507.22431
#2112

Dynamic Dictionary Learning for Remote Sensing Image Segmentation

Xuechao Zou, Yue Li, Shun Zhang et al.

ICCV 2025arXiv:2503.06683
#2113

Temporal-aware Query Routing for Real-time Video Instance Segmentation

Zesen Cheng, Kehan Li, Yian Zhao et al.

ICCV 2025
#2114

Learnable Retrieval Enhanced Visual-Text Alignment and Fusion for Radiology Report Generation

Qin Zhou, Guoyan Liang, Xindi Li et al.

ICCV 2025arXiv:2507.07568
#2115

Anomaly Detection of Integrated Circuits Package Substrates Using the Large Vision Model SAIC: Dataset Construction, Methodology, and Application

Ruiyun Yu, Bingyang Guo, Haoyuan Li

ICCV 2025
#2116

Prompt-driven Transferable Adversarial Attack on Person Re-Identification with Attribute-aware Textual Inversion

Yuan Bian, Min Liu, Yunqi Yi et al.

ICCV 2025arXiv:2502.19697
#2117

Memory-Efficient 4-bit Preconditioned Stochastic Optimization

Jingyang Li, Kuangyu Ding, Kim-chuan Toh et al.

ICCV 2025arXiv:2412.10663
#2118

No More Sibling Rivalry: Debiasing Human-Object Interaction Detection

Bin Yang, Yulin Zhang, Hong-Yu Zhou et al.

ICCV 2025arXiv:2509.00760
#2119

DASH: Detection and Assessment of Systematic Hallucinations of VLMs

Maximilian Augustin, Yannic Neuhaus, Matthias Hein

ICCV 2025arXiv:2503.23573
#2120

DIH-CLIP: Unleashing the Diversity of Multi-Head Self-Attention for Training-Free Open-Vocabulary Semantic Segmentation

Songsong Duan, Xi Yang, Nannan Wang

ICCV 2025
#2121

Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

Mark Endo, Xiaohan Wang, Serena Yeung-Levy

ICCV 2025arXiv:2412.13180
#2122

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

Yuzhang Shang, Mu Cai, Bingxin Xu et al.

ICCV 2025arXiv:2403.15388
#2123

HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics

Gueter Josmy Faure, Jia-Fong Yeh, Min-Hung Chen et al.

ICCV 2025arXiv:2408.17443
#2124

HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?

Yusen Zhang, Wenliang Zheng, Aashrith Madasu et al.

ICCV 2025arXiv:2504.18406
#2125

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring

Yufei Zhan, Shurong Zheng, Yousong Zhu et al.

ICCV 2025arXiv:2403.09333
#2126

Debiasing Trace Guidance: Top-down Trace Distillation and Bottom-up Velocity Alignment for Unsupervised Anomaly Detection

Xingjian Wang, Li Chai, Jiming Chen

ICCV 2025
#2127

ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches

Nandish Chattopadhyay, Amira Guesmi, Muhammad Abdullah Hanif et al.

ICCV 2025arXiv:2311.12084
#2128

Similarity Memory Prior is All You Need for Medical Image Segmentation

Hao Tang, Zhiqing Guo, Liejun Wang et al.

ICCV 2025highlightarXiv:2507.00585
#2129

CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model

Yuxuan Luo, Jiaqi Tang, Chenyi Huang et al.

ICCV 2025arXiv:2503.06472
#2130

Accelerate 3D Object Detection Models via Zero-Shot Attention Key Pruning

Lizhen Xu, Xiuxiu Bai, Xiaojun Jia et al.

ICCV 2025arXiv:2503.08101
#2131

ReferEverything: Towards Segmenting Everything We Can Speak of in Videos

Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang et al.

ICCV 2025arXiv:2410.23287
#2132

From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment

Yucheng Suo, Fan Ma, Linchao Zhu et al.

ICCV 2025arXiv:2503.20472
#2133

DC-TTA: Divide-and-Conquer Framework for Test-Time Adaptation of Interactive Segmentation

Jihun Kim, Hoyong Kwon, Hyeokjun Kweon et al.

ICCV 2025arXiv:2506.23104
#2134

FIND: Few-Shot Anomaly Inspection with Normal-Only Multi-Modal Data

YITING LI, Fayao Liu, Jingyi Liao et al.

ICCV 2025
#2135

VISO: Accelerating In-orbit Object Detection with Language-Guided Mask Learning and Sparse Inference

Meiqi Wang, Han Qiu

ICCV 2025
#2136

Unsupervised Histopathological Image Semantic Segmentation with Overlapping Patches Consistency Constraint

Wentian Cai, Weizhao Weng, Zihao Huang et al.

ICCV 2025
#2137

How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?

Yujian Lee, Peng Gao, Yongqi Xu et al.

ICCV 2025arXiv:2601.08133
#2138

UINavBench: A Framework for Comprehensive Evaluation of Interactive Digital Agents

Harsh Agrawal, Eldon Schoop, Xinlei Pan et al.

ICCV 2025
#2139

VIPerson: Flexibly Generating Virtual Identity for Person Re-Identification

Xiao-Wen Zhang, Delong Zhang, Yi-Xing Peng et al.

ICCV 2025
#2140

Towards Robustness of Person Search against Corruptions

Woojung Son, Yoonki Cho, Guoyuan An et al.

ICCV 2025
#2141

Flow-MIL: Constructing Highly-expressive Latent Feature Space For Whole Slide Image Classification Using Normalizing Flow

Yingfan MA, Bohan An, Ao Shen et al.

ICCV 2025
#2142

Vision-Language Neural Graph Featurization for Extracting Retinal Lesions

Taimur Hassan, Anabia Sohail, Muzammal Naseer et al.

ICCV 2025
#2143

SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting

Shuaiting Li, Juncan Deng, Chengxuan Wang et al.

ICCV 2025arXiv:2503.08668
#2144

VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-based Group Relative Policy Optimization

Xinye Cao, Hongcan Guo, Jiawen Qian et al.

ICCV 2025arXiv:2510.06040
#2145

LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation

Xinyu Yan, Meijun Sun, Ge-Peng Ji et al.

ICCV 2025arXiv:2508.01152
#2146

Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration

Ting Lei, Shaofeng Yin, Qingchao Chen et al.

ICCV 2025arXiv:2508.03207
#2147

Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data

Qi Chen, Xinze Zhou, Chen Liu et al.

ICCV 2025arXiv:2510.14831
#2148

Teaching AI the Anatomy Behind the Scan: Addressing Anatomical Flaws in Medical Image Segmentation with Learnable Prior

Young Seok Jeon, Hongfei Yang, Huazhu Fu et al.

ICCV 2025arXiv:2403.18878
#2149

Token-Efficient VLM: High-Resolution Image Understanding via Dynamic Region Proposal

Yitong Jiang, Jinwei Gu, Tianfan Xue et al.

ICCV 2025highlight
#2150

Region-aware Anchoring Mechanism for Efficient Referring Visual Grounding

Shuyi Ouyang, Ziwei Niu, Hongyi Wang et al.

ICCV 2025
#2151

Enrich and Detect: Video Temporal Grounding with Multimodal LLMs

Shraman Pramanick, Effrosyni Mavroudi, Yale Song et al.

ICCV 2025highlightarXiv:2510.17023
#2152

Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning

Zeyu Xi, Haoying Sun, Yaofei Wu et al.

ICCV 2025arXiv:2507.20163
#2153

Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training

Wooseong Jeong, Jegyeong Cho, Youngho Yoon et al.

ICCV 2025arXiv:2507.07778
#2154

Breaking Grid Constraints: Dynamic Graph Reconstruction Network for Multi-organ Segmentation

Junhao Xiao, Yang Wei, Jingyu Wang et al.

ICCV 2025
#2155

MaskSAM: Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation

Bin Xie, Hao Tang, Bin Duan et al.

ICCV 2025
#2156

MEH: A Multi-Style Dataset and Toolkit for Advancing Egyptian Hieroglyph Recognition

Maksim Golyadkin, Rubanova Alexandrovna, Aleksandr Utkov et al.

ICCV 2025
#2157

Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval

Bangxiang Lan, Ruobing Xie, Ruixiang Zhao et al.

ICCV 2025arXiv:2509.04773
#2158

Unbiased Missing-modality Multimodal Learning

Ruiting Dai, Chenxi Li, Yandong Yan et al.

ICCV 2025
#2159

ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba

Juncan Deng, Shuaiting Li, Zeyu Wang et al.

ICCV 2025arXiv:2503.09509
#2160

B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens

Zhuqiang Lu, Zhenfei Yin, Mengwei He et al.

ICCV 2025arXiv:2412.09919
#2161

DM-EFS: Dynamically Multiplexed Expanded Features Set Form for Robust and Efficient Small Object Detection

Aashish Sharma

ICCV 2025
#2162

DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes

Zonglin Di, Jing Shi, Yifei Fan et al.

ICCV 2025
#2163

Mixture-of-Scores: Robust Image-Text Data Valuation via Three Lines of Code

WU Sitong, Haoru Tan, Yukang Chen et al.

ICCV 2025
#2164

Inverse Image-Based Rendering for Light Field Generation from Single Images

Hyunjun Jung, Hae-Gon Jeon

ICCV 2025highlightarXiv:2510.20132
#2165

HyperGCT: A Dynamic Hyper-GNN-Learned Geometric Constraint for 3D Registration

Xiyu Zhang, Jiayi Ma, Jianwei Guo et al.

ICCV 2025arXiv:2503.02195
#2166

AD-GS: Object-Aware B-Spline Gaussian Splatting for Self-Supervised Autonomous Driving

Jiawei Xu, Kai Deng, Zexin Fan et al.

ICCV 2025arXiv:2507.12137
#2167

Axis-level Symmetry Detection with Group-Equivariant Representation

Wongyun Yu, Ahyun Seo, Minsu Cho

ICCV 2025arXiv:2508.10740
#2168

PossLoss: A Reliable and Sensitive Facial Landmark Detection Loss Function

Qikui Zhu

ICCV 2025
#2169

U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration

Xiaofan Li, Zhihao Xu, Chenming Wu et al.

ICCV 2025arXiv:2507.04503
#2170

Group Inertial Poser: Multi-Person Pose and Global Translation from Sparse Inertial Sensors and Ultra-Wideband Ranging

Ying Xue, Jiaxi Jiang, Rayan Armani et al.

ICCV 2025arXiv:2510.21654
#2171

RESCUE: Crowd Evacuation Simulation via Controlling SDM-United Characters

Xiaolin Liu, Tianyi zhou, Hongbo Kang et al.

ICCV 2025highlightarXiv:2507.20117
#2172

SG-LDM: Semantic-Guided LiDAR Generation via Latent-Aligned Diffusion

Zhengkang Xiang, Zizhao Li, Amir Khodabandeh et al.

ICCV 2025arXiv:2506.23606
#2173

PointGAC: Geometric-Aware Codebook for Masked Point Modeling

Abiao Li, Chenlei Lv, Guofeng Mei et al.

ICCV 2025
#2174

Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images

Qi Xun Yeo, Yanyan Li, Gim Hee Lee

ICCV 2025arXiv:2508.06546
#2175

PRM: Photometric Stereo based Large Reconstruction Model

Wenhang Ge, Jiantao Lin, Guibao SHEN et al.

ICCV 2025highlightarXiv:2412.07371
#2176

Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging

Chongjie Ye, Yushuang Wu, Ziteng Lu et al.

ICCV 2025arXiv:2503.22236
#2177

Dual-S3D: Hierarchical Dual-Path Selective SSM-CNN for High-Fidelity Implicit Reconstruction

Luoxi Zhang, Pragyan Shrestha, Yu Zhou et al.

ICCV 2025
#2178

FastPoint: Accelerating 3D Point Cloud Model Inference via Sample Point Distance Prediction

Donghyun Lee, Dawoon Jeong, Jae W. Lee et al.

ICCV 2025arXiv:2507.23480
#2179

RobuSTereo: Robust Zero-Shot Stereo Matching under Adverse Weather

Yuran Wang, Yingping Liang, Yutao Hu et al.

ICCV 2025arXiv:2507.01653
#2180

MMGeo: Multimodal Compositional Geo-Localization for UAVs

Yuxiang Ji, Boyong He, Zhuoyue Tan et al.

ICCV 2025
#2181

Large Scene Generation with Cube-Absorb Discrete Diffusion

Qianjiang Hu, Wei Hu

ICCV 2025
#2182

SynAD: Enhancing Real-World End-to-End Autonomous Driving Models through Synthetic Data Integration

Jongsuk Kim, Jae Young Lee, Gyojin Han et al.

ICCV 2025arXiv:2510.24052
#2183

Benchmarking Egocentric Visual-Inertial SLAM at City Scale

Anusha Krishnan, Shaohui Liu, Paul-Edouard Sarlin et al.

ICCV 2025highlightarXiv:2509.26639
#2184

Gaussian-based World Model: Gaussian Priors for Voxel-Based Occupancy Prediction and Future Motion Prediction

Tuo Feng, Wenguan Wang, Yi Yang

ICCV 2025
#2185

Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction

JIXUAN FAN, Wanhua Li, Yifei Han et al.

ICCV 2025arXiv:2412.04887
#2186

DAA*: Deep Angular A Star for Image-based Path Planning

Zhiwei Xu

ICCV 2025arXiv:2507.09305
#2187

RCTDistill: Cross-Modal Knowledge Distillation Framework for Radar-Camera 3D Object Detection with Temporal Fusion

Geonho Bang, Minjae Seong, Jisong Kim et al.

ICCV 2025arXiv:2509.17712
#2188

GSRecon: Efficient Generalizable Gaussian Splatting for Surface Reconstruction from Sparse Views

Hang Yang, Le Hui, Jianjun Qian et al.

ICCV 2025
#2189

REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment

Haonan Han, Rui Yang, Huan Liao et al.

ICCV 2025arXiv:2405.18525
#2190

Towards Safer and Understandable Driver Intention Prediction

Mukilan Karuppasamy, Shankar Gangisetty, Shyam Nandan Rai et al.

ICCV 2025arXiv:2510.09200
#2191

InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation

Zhuoran Yang, Xi Guo, Chenjing Ding et al.

ICCV 2025
#2192

NormalLoc: Visual Localization on Textureless 3D Models using Surface Normals

Jiro Abe, Gaku Nakano, Kazumine Ogura

ICCV 2025
#2193

EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device

Gunjan Chhablani, Xiaomeng Ye, Muhammad Zubair Irshad et al.

ICCV 2025arXiv:2509.17430
#2194

NGD: Neural Gradient Based Deformation for Monocular Garment Reconstruction

Soham Dasgupta, Shanthika Naik, Preet Savalia et al.

ICCV 2025arXiv:2508.17712
#2195

Lifting the Structural Morphing for Wide-Angle Images Rectification: Unified Content and Boundary Modeling

Wenting Luan, Siqi Lu, Yongbin Zheng et al.

ICCV 2025
#2196

RayletDF: Raylet Distance Fields for Generalizable 3D Surface Reconstruction from Point Clouds or Gaussians

Shenxing Wei, Jinxi Li, Yafei YANG et al.

ICCV 2025highlightarXiv:2508.09830
#2197

Semantic-guided Camera Ray Regression for Visual Localization

Yesheng Zhang, Xu Zhao

ICCV 2025
#2198

Polarimetric Neural Field via Unified Complex-Valued Wave Representation

Chu Zhou, Yixin Yang, Junda Liao et al.

ICCV 2025
#2199

High-Precision 3D Measurement of Complex Textured Surfaces Using Multiple Filtering Approach

Yuchong Chen, Jian Yu, Shaoyan Gai et al.

ICCV 2025
#2200

From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos

Chenjian Gao, Lihe Ding, Rui Han et al.

ICCV 2025arXiv:2507.20331