Most Cited ICCV "trajectory entropy maximization" Papers

2,701 papers found • Page 4 of 14

#601

Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues

Francesco Taioli, Edoardo Zorzi, Gianni Franchi et al.

ICCV 2025posterarXiv:2412.01250
3
citations
#602

RoboTron-Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction

Yufeng Zhong, Chengjian Feng, Feng yan et al.

ICCV 2025posterarXiv:2503.18525
3
citations
#603

Sparse Fine-Tuning of Transformers for Generative Tasks

Wei Chen, Jingxi Yu, Zichen Miao et al.

ICCV 2025posterarXiv:2507.10855
3
citations
#604

VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions

Marko Mihajlovic, Siwei Zhang, Gen Li et al.

ICCV 2025highlightarXiv:2506.23236
3
citations
#605

Jigsaw++: Imagining Complete Shape Priors for Object Reassembly

Jiaxin Lu, Gang Hua, Qixing Huang

ICCV 2025posterarXiv:2410.11816
3
citations
#606

ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation

Sherry Chen, Yi Wei, Luowei Zhou et al.

ICCV 2025posterarXiv:2507.07317
3
citations
#607

Generate, Refine, and Encode: Leveraging Synthesized Novel Samples for On-the-Fly Fine-Grained Category Discovery

Xiao Liu, Nan Pu, Haiyang Zheng et al.

ICCV 2025posterarXiv:2507.04051
3
citations
#608

Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description

Mahmoud Ahmed, Junjie Fei, Jian Ding et al.

ICCV 2025posterarXiv:2405.18937
3
citations
#609

Visual Modality Prompt for Adapting Vision-Language Object Detectors

Heitor Rapela Medeiros, Atif Belal, Srikanth Muralidharan et al.

ICCV 2025posterarXiv:2412.00622
3
citations
#610

SMoLoRA: Exploring and Defying Dual Catastrophic Forgetting in Continual Visual Instruction Tuning

Ziqi Wang, Chang Che, Qi Wang et al.

ICCV 2025posterarXiv:2411.13949
3
citations
#611

Predict-Optimize-Distill: A Self-Improving Cycle for 4D Object Understanding

Mingxuan Wu, Huang Huang, Justin Kerr et al.

ICCV 2025posterarXiv:2504.17441
3
citations
#612

Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens

Qihang Fan, Huaibo Huang, Mingrui Chen et al.

ICCV 2025posterarXiv:2405.13337
3
citations
#613

ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling

Jinhyung Park, Javier Romero, Shunsuke Saito et al.

ICCV 2025posterarXiv:2508.15767
3
citations
#614

On the Generalization of Representation Uncertainty in Earth Observation

Spyros Kondylatos, Nikolaos Ioannis Bountos, Dimitrios Michail et al.

ICCV 2025posterarXiv:2503.07082
3
citations
#615

LayerD: Decomposing Raster Graphic Designs into Layers

Tomoyuki Suzuki, Kang-Jun Liu, Naoto Inoue et al.

ICCV 2025posterarXiv:2509.25134
3
citations
#616

PriOr-Flow: Enhancing Primitive Panoramic Optical Flow with Orthogonal View

Longliang Liu, Miaojie Feng, Junda Cheng et al.

ICCV 2025highlightarXiv:2506.23897
3
citations
#617

CAP: Evaluation of Persuasive and Creative Image Generation

Aysan Aghazadeh, Adriana Kovashka

ICCV 2025posterarXiv:2412.10426
3
citations
#618

VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding

Minchao Jiang, Shunyu Jia, Jiaming Gu et al.

ICCV 2025posterarXiv:2506.22799
3
citations
#619

TokensGen: Harnessing Condensed Tokens for Long Video Generation

Wenqi Ouyang, Zeqi Xiao, Danni Yang et al.

ICCV 2025posterarXiv:2507.15728
3
citations
#620

From Image to Video: An Empirical Study of Diffusion Representations

Pedro Vélez, Luisa Polania Cabrera, Yi Yang et al.

ICCV 2025highlightarXiv:2502.07001
3
citations
#621

PlugMark: A Plug-in Zero-Watermarking Framework for Diffusion Models

Pengzhen Chen, Yanwei Liu, Xiaoyan Gu et al.

ICCV 2025poster
3
citations
#622

Joint Diffusion Models in Continual Learning

Paweł Skierś, Kamil Deja

ICCV 2025posterarXiv:2411.08224
3
citations
#623

Colors See Colors Ignore: Clothes Changing ReID with Color Disentanglement

Priyank Pathak, Yogesh Rawat

ICCV 2025posterarXiv:2507.07230
3
citations
#624

DuoLoRA : Cycle-consistent and Rank-disentangled Content-Style Personalization

Aniket Roy, Shubhankar Borse, Shreya Kadambi et al.

ICCV 2025posterarXiv:2504.13206
3
citations
#625

Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation

Jungeun Kim, Hyeongwoo Jeon, Jongseong Bae et al.

ICCV 2025posterarXiv:2411.16789
3
citations
#626

DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction

Rui Wang, Quentin Lohmeyer, Mirko Meboldt et al.

ICCV 2025posterarXiv:2503.13176
3
citations
#627

SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency

Yangyang Guo, Mohan Kankanhalli

ICCV 2025posterarXiv:2411.09126
3
citations
#628

O-MaMa: Learning Object Mask Matching between Egocentric and Exocentric Views

Lorenzo Mur-Labadia, Maria Santos-Villafranca, Jesus Bermudez-cameo et al.

ICCV 2025posterarXiv:2506.06026
3
citations
#629

SceneMI: Motion In-betweening for Modeling Human-Scene Interaction

Inwoo Hwang, Bing Zhou, Young Min Kim et al.

ICCV 2025highlightarXiv:2503.16289
3
citations
#630

Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation

Junyu Xie, Tengda Han, Max Bain et al.

ICCV 2025posterarXiv:2504.01020
3
citations
#631

Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting

Guangben Lu, Yuzhen N/A, Zhimin Sun et al.

ICCV 2025posterarXiv:2412.03812
3
citations
#632

Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping

Jingyi Lu, Kai Han

ICCV 2025posterarXiv:2509.04582
3
citations
#633

VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges

Yuxuan Wang, Yiqi Song, Cihang Xie et al.

ICCV 2025posterarXiv:2409.01071
3
citations
#634

Task Vector Quantization for Memory-Efficient Model Merging

Youngeun Kim, Seunghwan Lee, Aecheon Jung et al.

ICCV 2025posterarXiv:2503.06921
3
citations
#635

From Imitation to Innovation: The Emergence of AI's Unique Artistic Styles and the Challenge of Copyright Protection

Zexi Jia, Chuanwei Huang, Hongyan Fei et al.

ICCV 2025posterarXiv:2507.04769
3
citations
#636

Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features

Chancharik Mitra, Brandon Huang, Tianning Chai et al.

ICCV 2025posterarXiv:2412.00142
3
citations
#637

Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention

Jeonghoon Park, Juyoung Lee, Chaeyeon Chung et al.

ICCV 2025posterarXiv:2506.13298
3
citations
#638

GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models

Jonathan Roberts, Kai Han, Samuel Albanie

ICCV 2025posterarXiv:2408.11817
3
citations
#639

Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations

Hai Huang, Yan Xia, Sashuai Zhou et al.

ICCV 2025posterarXiv:2507.03304
3
citations
#640

SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders

Jiahui Geng, Qing Li

ICCV 2025posterarXiv:2503.14530
3
citations
#641

MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration

Zhehui Wu, Yong Chen, Naoto Yokoya et al.

ICCV 2025posterarXiv:2503.09131
3
citations
#642

PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination

Ming Dai, Wenxuan Cheng, Jiedong Zhuang et al.

ICCV 2025posterarXiv:2509.04833
3
citations
#643

Joint Self-Supervised Video Alignment and Action Segmentation

Ali Shah Ali, Syed Ahmed Mahmood, Mubin Saeed et al.

ICCV 2025posterarXiv:2503.16832
3
citations
#644

Open-ended Hierarchical Streaming Video Understanding with Vision Language Models

Hyolim Kang, Yunsu Park, Youngbeom Yoo et al.

ICCV 2025posterarXiv:2509.12145
3
citations
#645

Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning

Yafei Zhang, Lingqi Kong, Huafeng Li et al.

ICCV 2025posterarXiv:2507.12942
3
citations
#646

Monocular Semantic Scene Completion via Masked Recurrent Networks

Xuzhi Wang, Xinran Wu, Song Wang et al.

ICCV 2025posterarXiv:2507.17661
3
citations
#647

Integrating Visual Interpretation and Linguistic Reasoning for Geometric Problem Solving

Zixian Guo, Ming Liu, Qilong Wang et al.

ICCV 2025poster
3
citations
#648

MVGBench: a Comprehensive Benchmark for Multi-view Generation Models

Xianghui Xie, Jan Lenssen, Gerard Pons-Moll

ICCV 2025poster
3
citations
#649

Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints

Guanjie Chen, Xinyu Zhao, Yucheng Zhou et al.

ICCV 2025posterarXiv:2411.17616
3
citations
#650

You Think, You ACT: The New Task of Arbitrary Text to Motion Generation

Runqi Wang, Caoyuan Ma, Guopeng Li et al.

ICCV 2025posterarXiv:2404.14745
3
citations
#651

INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling

Xin Dong, Shichao Dong, Jin Wang et al.

ICCV 2025posterarXiv:2507.05056
3
citations
#652

TriDi: Trilateral Diffusion of 3D Humans, Objects, and Interactions

Ilya A. Petrov, Riccardo Marin, Julian Chibane et al.

ICCV 2025posterarXiv:2412.06334
3
citations
#653

TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation

Zonglin Lyu, Chen Chen

ICCV 2025posterarXiv:2507.04984
3
citations
#654

FREE-Merging: Fourier Transform for Efficient Model Merging

Shenghe Zheng, Hongzhi Wang

ICCV 2025posterarXiv:2411.16815
3
citations
#655

Cross-Subject Mind Decoding from Inaccurate Representations

Yangyang Xu, Bangzhen Liu, Wenqi Shao et al.

ICCV 2025posterarXiv:2507.19071
3
citations
#656

Efficient Unsupervised Shortcut Learning Detection and Mitigation in Transformers

Lukas Kuhn, sari sadiya, Jörg Schlötterer et al.

ICCV 2025posterarXiv:2501.00942
3
citations
#657

MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation

Fu Rong, Meng Lan, Qian Zhang et al.

ICCV 2025posterarXiv:2501.13667
3
citations
#658

TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis

Tri Ton, Ji Woo Hong, Chang Yoo

ICCV 2025posterarXiv:2504.05684
3
citations
#659

Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting

Jiaxin Huang, Sheng Miao, Bangbang Yang et al.

ICCV 2025posterarXiv:2504.11092
3
citations
#660

SHeaP: Self-supervised Head Geometry Predictor Learned via 2D Gaussians

Liam Schoneveld, Zhe Chen, Davide Davoli et al.

ICCV 2025posterarXiv:2504.12292
3
citations
#661

Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle

Miroslav Purkrabek, Jiri Matas

ICCV 2025posterarXiv:2412.01562
3
citations
#662

Learning Streaming Video Representation via Multitask Training

Yibin Yan, Jilan Xu, Shangzhe Di et al.

ICCV 2025posterarXiv:2504.20041
3
citations
#663

FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation

Yunpeng Bai, Qixing Huang

ICCV 2025posterarXiv:2412.00671
3
citations
#664

Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation

Qi Guo, Zhen Tian, Minghao Yao et al.

ICCV 2025posterarXiv:2410.06848
3
citations
#665

DMesh++: An Efficient Differentiable Mesh for Complex Shapes

Sanghyun Son, Matheus Gadelha, Yang Zhou et al.

ICCV 2025posterarXiv:2412.16776
3
citations
#666

What You Have is What You Track: Adaptive and Robust Multimodal Tracking

Yuedong Tan, Jiawei Shao, Eduard Zamfir et al.

ICCV 2025posterarXiv:2507.05899
3
citations
#667

A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets

Zexi Jia, Chuanwei Huang, Yeshuang Zhu et al.

ICCV 2025posterarXiv:2507.04699
3
citations
#668

PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection

Xiao Li, Yiming Zhu, Yifan Huang et al.

ICCV 2025posterarXiv:2506.23581
3
citations
#669

LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling

Li Huaqiu, Yong Wang, Tongwen Huang et al.

ICCV 2025posterarXiv:2507.00790
3
citations
#670

Spatial-Temporal Aware Visuomotor Diffusion Policy Learning

Zhenyang Liu, Yikai Wang, Kuanning Wang et al.

ICCV 2025posterarXiv:2507.06710
3
citations
#671

HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars

Byungjun Kim, Shunsuke Saito, Giljoo Nam et al.

ICCV 2025posterarXiv:2507.19481
3
citations
#672

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models

Quang-Binh Nguyen, Minh Luu, Quang Nguyen et al.

ICCV 2025posterarXiv:2507.13984
3
citations
#673

ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment

Chong Xia, Shengjun Zhang, Fangfu Liu et al.

ICCV 2025posterarXiv:2507.19058
2
citations
#674

Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views

Xiangdong Zhang, Shaofeng Zhang, Junchi Yan

ICCV 2025posterarXiv:2509.01250
2
citations
#675

LINR-PCGC: Lossless Implicit Neural Representations for Point Cloud Geometry Compression

Wenjie Huang, Qi Yang, Shuting Xia et al.

ICCV 2025posterarXiv:2507.15686
2
citations
#676

What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning

Chi-Hsi Kung, Frangil Ramirez, Juhyung Ha et al.

ICCV 2025posterarXiv:2503.21055
2
citations
#677

An Inversion-based Measure of Memorization for Diffusion Models

Zhe Ma, Qingming Li, Xuhong Zhang et al.

ICCV 2025posterarXiv:2405.05846
2
citations
#678

Demeter: A Parametric Model of Crop Plant Morphology from the Real World

Tianhang Cheng, Albert Zhai, Evan Chen et al.

ICCV 2025posterarXiv:2510.16377
2
citations
#679

RTMap: Real-Time Recursive Mapping with Change Detection and Localization

Yuheng Du, Sheng Yang, Lingxuan Wang et al.

ICCV 2025posterarXiv:2507.00980
2
citations
#680

Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads

Yingjie Zhou, Jiezhang Cao, Zicheng Zhang et al.

ICCV 2025posterarXiv:2507.23343
2
citations
#681

SMGDiff: Soccer Motion Generation using Diffusion Probabilistic Models

Hongdi Yang, Chengyang Li, Zhenxuan Wu et al.

ICCV 2025posterarXiv:2411.16216
2
citations
#682

EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba

Quang Nguyen, Nhat Le, Baoru Huang et al.

ICCV 2025posterarXiv:2508.10522
2
citations
#683

Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization

Kangle Deng, Hsueh-Ti Derek Liu, Yiheng Zhu et al.

ICCV 2025posterarXiv:2504.02817
2
citations
#684

You Are Your Own Best Teacher: Achieving Centralized-level Performance in Federated Learning under Heterogeneous and Long-tailed Data

Shanshan Yan, Zexi Li, Chao Wu et al.

ICCV 2025posterarXiv:2503.06916
2
citations
#685

Supercharging Floorplan Localization with Semantic Rays

Yuval Grader, Hadar Averbuch-Elor

ICCV 2025posterarXiv:2507.09291
2
citations
#686

PseudoMapTrainer: Learning Online Mapping without HD Maps

Christian Löwens, Thorben Funke, Jingchao Xie et al.

ICCV 2025posterarXiv:2508.18788
2
citations
#687

MOSAIC: Generating Consistent, Privacy-Preserving Scenes from Multiple Depth Views in Multi-Room Environments

Zhixuan Liu, Haokun Zhu, Rui Chen et al.

ICCV 2025posterarXiv:2503.13816
2
citations
#688

DIP: Unsupervised Dense In-Context Post-training of Visual Representations

Sophia Sirko-Galouchenko, Spyros Gidaris, Antonin Vobecky et al.

ICCV 2025posterarXiv:2506.18463
2
citations
#689

Towards Open-World Generation of Stereo Images and Unsupervised Matching

Feng Qiao, Zhexiao Xiong, Eric Xing et al.

ICCV 2025posterarXiv:2503.12720
2
citations
#690

Leveraging Local Patch Alignment to Seam-cutting for Large Parallax Image Stitching

Tianli Liao, Chenyang Zhao, Lei Li et al.

ICCV 2025posterarXiv:2311.18564
2
citations
#691

Diffusion Image Prior

Hamadi Chihaoui, Paolo Favaro

ICCV 2025posterarXiv:2503.21410
2
citations
#692

Improving Rectified Flow with Boundary Conditions

Xixi Hu, Runlong Liao, Bo Liu et al.

ICCV 2025posterarXiv:2506.15864
2
citations
#693

InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes

Zesong Yang, Bangbang Yang, Wenqi Dong et al.

ICCV 2025posterarXiv:2507.08416
2
citations
#694

F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration

Lu Liu, Huiyu Duan, Qiang Hu et al.

ICCV 2025highlightarXiv:2412.13155
2
citations
#695

Degradation-Modeled Multipath Diffusion for Tunable Metalens Photography

Jianing Zhang, Jiayi Zhu, Feiyu Ji et al.

ICCV 2025highlightarXiv:2506.22753
2
citations
#696

EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception

Sanjoy Chowdhury, Subrata Biswas, Sayan Nag et al.

ICCV 2025posterarXiv:2506.21080
2
citations
#697

FB-Diff: Fourier Basis-guided Diffusion for Temporal Interpolation of 4D Medical Imaging

Xin You, Runze Yang, Chuyan Zhang et al.

ICCV 2025posterarXiv:2507.04547
2
citations
#698

Noise2Score3D: Tweedie's Approach for Unsupervised Point Cloud Denoising

Xiangbin Wei, Yuanfeng Wang, Ao XU et al.

ICCV 2025posterarXiv:2503.09283
2
citations
#699

OpenAnimals: Revisiting Person Re-Identification for Animals Towards Better Generalization

Saihui Hou, Panjian Huang, Zengbin Wang et al.

ICCV 2025posterarXiv:2410.00204
2
citations
#700

Consensus-Driven Active Model Selection

Justin Kay, Grant Horn, Subhransu Maji et al.

ICCV 2025highlightarXiv:2507.23771
2
citations
#701

LookOut: Real-World Humanoid Egocentric Navigation

Boxiao Pan, Adam Harley, Francis Engelmann et al.

ICCV 2025posterarXiv:2508.14466
2
citations
#702

Generative Active Learning for Long-tail Trajectory Prediction via Controllable Diffusion Model

Daehee Park, Monu Surana, Pranav Desai et al.

ICCV 2025posterarXiv:2507.22615
2
citations
#703

Prior2Former - Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation

Sebastian Schmidt, Julius Koerner, Dominik Fuchsgruber et al.

ICCV 2025highlightarXiv:2504.04841
2
citations
#704

CoMoGaussian: Continuous Motion-Aware Gaussian Splatting from Motion-Blurred Images

Jungho Lee, DongHyeong Kim, Dogyoon Lee et al.

ICCV 2025posterarXiv:2503.05332
2
citations
#705

SketchSplat: 3D Edge Reconstruction via Differentiable Multi-view Sketch Splatting

Haiyang Ying, Matthias Zwicker

ICCV 2025posterarXiv:2503.14786
2
citations
#706

SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation

Hao Ban, Gokul Ram Subramani, Kaiyi Ji

ICCV 2025posterarXiv:2507.07883
2
citations
#707

Object-level Correlation for Few-Shot Segmentation

chunlin wen, Yu Zhang, Jie Fan et al.

ICCV 2025posterarXiv:2509.07917
2
citations
#708

GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion

Li-Heng Chen, Zi-Xin Zou, Chang Liu et al.

ICCV 2025posterarXiv:2503.22349
2
citations
#709

TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models

Ziyang Luo, Nian Liu, Xuguang Yang et al.

ICCV 2025posterarXiv:2506.11436
2
citations
#710

Teaching VLMs to Localize Specific Objects from In-context Examples

Sivan Doveh, Nimrod Shabtay, Eli Schwartz et al.

ICCV 2025posterarXiv:2411.13317
2
citations
#711

Generative Adversarial Diffusion

U-Chae Jun, Jaeeun Ko, Jiwoo Kang

ICCV 2025poster
2
citations
#712

Cross-Architecture Distillation Made Simple with Redundancy Suppression

Weijia Zhang, Yuehao Liu, Wu Ran et al.

ICCV 2025highlightarXiv:2507.21844
2
citations
#713

Boosting Vision Semantic Density with Anatomy Normality Modeling for Medical Vision-language Pre-training

Weiwei Cao, Jianpeng Zhang, Zhongyi Shui et al.

ICCV 2025posterarXiv:2508.03742
2
citations
#714

VideoAds for Fast-Paced Video Understanding

Zheyuan Zhang, Wanying Dou, Linkai Peng et al.

ICCV 2025posterarXiv:2504.09282
2
citations
#715

Sim-DETR: Unlock DETR for Temporal Sentence Grounding

Jiajin Tang, Zhengxuan Wei, Yuchen Zhu et al.

ICCV 2025posterarXiv:2509.23867
2
citations
#716

COIN: Confidence Score-Guided Distillation for Annotation-Free Cell Segmentation

Sanghyun Jo, Seo Lee, Seungwoo Lee et al.

ICCV 2025posterarXiv:2503.11439
2
citations
#717

ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models

Guoyizhe Wei, Rama Chellappa

ICCV 2025posterarXiv:2504.00037
2
citations
#718

Stable-Sim2Real: Exploring Simulation of Real-Captured 3D Data with Two-Stage Depth Diffusion

Mutian Xu, Chongjie Ye, Haolin Liu et al.

ICCV 2025highlightarXiv:2507.23483
2
citations
#719

ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis

Onkar Susladkar, Gayatri Deshmukh, Yalcin Tur et al.

ICCV 2025posterarXiv:2505.04963
2
citations
#720

Timestep-Aware Diffusion Model for Extreme Image Rescaling

Ce Wang, Zhenyu Hu, Wanjie Sun et al.

ICCV 2025posterarXiv:2408.09151
2
citations
#721

TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos

Jinxi Li, Ziyang Song, Bo Yang

ICCV 2025posterarXiv:2508.09811
2
citations
#722

Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic Segmentation

Seogkyu Jeon, Kibeom Hong, Hyeran Byun

ICCV 2025posterarXiv:2512.03508
2
citations
#723

Alleviating Textual Reliance in Medical Language-guided Segmentation via Prototype-driven Semantic Approximation

Shuchang Ye, Usman Naseem, Mingyuan Meng et al.

ICCV 2025posterarXiv:2507.11055
2
citations
#724

Quantifying and Narrowing the Unknown: Interactive Text-to-Video Retrieval via Uncertainty Minimization

Bingqing Zhang, Zhuo Cao, Heming Du et al.

ICCV 2025posterarXiv:2507.15504
2
citations
#725

AMD: Adaptive Momentum and Decoupled Contrastive Learning Framework for Robust Long-Tail Trajectory Prediction

Bin Rao, Haicheng Liao, Yanchen Guan et al.

ICCV 2025posterarXiv:2507.01801
2
citations
#726

PASTA: Part-Aware Sketch-to-3D Shape Generation with Text-Aligned Prior

Seunggwan Lee, Hwanhee Jung, ByoungSoo Koh et al.

ICCV 2025posterarXiv:2503.12834
2
citations
#727

Everything is a Video: Unifying Modalities through Next-Frame Prediction

G Thomas Hudson, Dean Slack, Thomas Winterbottom et al.

ICCV 2025posterarXiv:2411.10503
2
citations
#728

CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation

Xiao Lin, Yun Peng, Liuyi Wang et al.

ICCV 2025posterarXiv:2502.01312
2
citations
#729

CAFA: a Controllable Automatic Foley Artist

Roi Benita, Michael Finkelson, Tavi Halperin et al.

ICCV 2025posterarXiv:2504.06778
2
citations
#730

SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting

Paschalis Giakoumoglou, Dimitrios Karageorgiou, Symeon Papadopoulos et al.

ICCV 2025posterarXiv:2502.06593
2
citations
#731

MixA-Q: Revisiting Activation Sparsity for Vision Transformers from a Mixed-Precision Quantization Perspective

Weitian Wang, Shubham rai, Cecilia De la Parra et al.

ICCV 2025posterarXiv:2507.19131
2
citations
#732

Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation

Zhaorui Tan, Xi Yang, Tan Pan et al.

ICCV 2025posterarXiv:2411.06106
2
citations
#733

Refer to Any Segmentation Mask Group With Vision-Language Prompts

Shengcao Cao, Zijun Wei, Jason Kuen et al.

ICCV 2025posterarXiv:2506.05342
2
citations
#734

Denoising Token Prediction in Masked Autoregressive Models

Ting Yao, Yehao Li, Yingwei Pan et al.

ICCV 2025poster
2
citations
#735

MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation

Vladislav Bargatin, Egor Chistov, Alexander Yakovenko et al.

ICCV 2025highlightarXiv:2506.23151
2
citations
#736

Understanding Co-speech Gestures in-the-wild

Sindhu Hegde, K R Prajwal, Taein Kwon et al.

ICCV 2025posterarXiv:2503.22668
2
citations
#737

Subjective Camera 1.0: Bridging Human Cognition and Visual Reconstruction through Sequence-Aware Sketch-Guided Diffusion

Haoyang Chen, Dongfang Sun, Caoyuan Ma et al.

ICCV 2025posterarXiv:2506.23711
2
citations
#738

Global Regulation and Excitation via Attention Tuning for Stereo Matching

Jiahao LI, Xinhong Chen, Zhengmin JIANG et al.

ICCV 2025posterarXiv:2509.15891
2
citations
#739

Training-Free Generation of Temporally Consistent Rewards from VLMs

Yinuo Zhao, Jiale Yuan, Zhiyuan Xu et al.

ICCV 2025posterarXiv:2507.04789
2
citations
#740

Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models

Xiao Liang, Di Wang, Zhicheng Jiao et al.

ICCV 2025posterarXiv:2507.09209
2
citations
#741

IDF: Iterative Dynamic Filtering Networks for Generalizable Image Denoising

Dongjin Kim, Jaekyun Ko, Muhammad Kashif Ali et al.

ICCV 2025posterarXiv:2508.19649
2
citations
#742

A Hidden Stumbling Block in Generalized Category Discovery: Distracted Attention

Qiyu Xu, Zhanxuan Hu, Yu Duan et al.

ICCV 2025posterarXiv:2507.14315
2
citations
#743

SVG-Head: Hybrid Surface-Volumetric Gaussians for High-Fidelity Head Reconstruction and Real-Time Editing

Heyi Sun, Cong Wang, Tian-Xing Xu et al.

ICCV 2025posterarXiv:2508.09597
2
citations
#744

SDMatte: Grafting Diffusion Models for Interactive Matting

Longfei Huang, Yu Liang, Hao Zhang et al.

ICCV 2025posterarXiv:2508.00443
2
citations
#745

UniConvNet: Expanding Effective Receptive Field while Maintaining Asymptotically Gaussian Distribution for ConvNets of Any Scale

Yuhao Wang, Wei Xi

ICCV 2025posterarXiv:2508.09000
2
citations
#746

Identity Preserving 3D Head Stylization with Multiview Score Distillation

Bahri Batuhan Bilecen, Ahmet Berke Gokmen, Furkan Güzelant et al.

ICCV 2025posterarXiv:2411.13536
2
citations
#747

Trokens: Semantic-Aware Relational Trajectory Tokens for Few-Shot Action Recognition

Pulkit Kumar, Shuaiyi Huang, Matthew Walmer et al.

ICCV 2025posterarXiv:2508.03695
2
citations
#748

MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling

Yingyue Li, Bencheng Liao, Wenyu Liu et al.

ICCV 2025posterarXiv:2503.13440
2
citations
#749

G2SF: Geometry-Guided Score Fusion for Multimodal Industrial Anomaly Detection

Chengyu Tao, Xuanming Cao, Juan Du

ICCV 2025poster
2
citations
#750

Structure-aware Semantic Discrepancy and Consistency for 3D Medical Image Self-supervised Learning

Tan Pan, Zhaorui Tan, Kaiyu Guo et al.

ICCV 2025posterarXiv:2507.02581
2
citations
#751

RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions

Bimsara Pathiraja, Maitreya Patel, Shivam Singh et al.

ICCV 2025posterarXiv:2506.03448
2
citations
#752

VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting

Hao Chen, Tao Han, Song Guo et al.

ICCV 2025posterarXiv:2412.02503
2
citations
#753

FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image

Fei Yin, Mallikarjun Reddy, Chun-Han Yao et al.

ICCV 2025posterarXiv:2504.15179
2
citations
#754

ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers

Nicholas DiBrita, Jason Han, Tirthak Patel

ICCV 2025posterarXiv:2506.21537
2
citations
#755

Efficient Multi-Person Motion Prediction by Lightweight Spatial and Temporal Interactions

Yuanhong Zheng, Ruixuan Yu, Jian Sun

ICCV 2025posterarXiv:2507.09446
2
citations
#756

EvolvingGrasp: Evolutionary Grasp Generation via Efficient Preference Alignment

Yufei Zhu, Yiming Zhong, Zemin Yang et al.

ICCV 2025posterarXiv:2503.14329
2
citations
#757

Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation

Shaowei Liu, chuan guo, Bing Zhou et al.

ICCV 2025posterarXiv:2510.14976
2
citations
#758

Sequential Gaussian Avatars with Hierarchical Motion Context

Wangze Xu, Yifan Zhan, Zhihang Zhong et al.

ICCV 2025posterarXiv:2411.16768
2
citations
#759

Boosting Adversarial Transferability via Residual Perturbation Attack

Jinjia Peng, Zeze Tao, Huibing Wang et al.

ICCV 2025posterarXiv:2508.05689
2
citations
#760

Sequential keypoint density estimator: an overlooked baseline of skeleton-based video anomaly detection

Anja Delić, Matej Grcic, Siniša Šegvić

ICCV 2025highlightarXiv:2506.18368
2
citations
#761

GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects

Yidi Shao, Mu Huang, Chen Change Loy et al.

ICCV 2025posterarXiv:2412.17804
2
citations
#762

Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection

Taehoon Kim, Jongwook Choi, Yonghyun Jeong et al.

ICCV 2025highlightarXiv:2507.02398
2
citations
#763

Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator

Ronglai Zuo, Rolandos Alexandros Potamias, Evangelos Ververas et al.

ICCV 2025posterarXiv:2411.17799
2
citations
#764

Learnable Feature Patches and Vectors for Boosting Low-light Image Enhancement without External Knowledge

Xiaogang Xu, Jiafei Wu, Qingsen Yan et al.

ICCV 2025poster
2
citations
#765

Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models

Xudong Li, Zihao Huang, Yan Zhang et al.

ICCV 2025posterarXiv:2409.05381
2
citations
#766

Multi-modal Multi-platform Person Re-Identification: Benchmark and Method

Ruiyang Ha, Songyi Jiang, Bin Li et al.

ICCV 2025posterarXiv:2503.17096
2
citations
#767

PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement

Tewodros W. Ayalew, Xiao Zhang, Kevin Y Wu et al.

ICCV 2025posterarXiv:2411.17764
2
citations
#768

Adaptive Routing of Text-to-Image Generation Requests Between Large Cloud Model and Light-Weight Edge Model

Zewei Xin, Qinya Li, Chaoyue Niu et al.

ICCV 2025posterarXiv:2411.13787
2
citations
#769

Can3Tok: Canonical 3D Tokenization and Latent Modeling of Scene-Level 3D Gaussians

Quankai Gao, Iliyan Georgiev, Tuanfeng Wang et al.

ICCV 2025posterarXiv:2508.01464
2
citations
#770

RoboPearls: Editable Video Simulation for Robot Manipulation

Tao Tang, Likui Zhang, Youpeng Wen et al.

ICCV 2025posterarXiv:2506.22756
2
citations
#771

VSC: Visual Search Compositional Text-to-Image Diffusion Model

Do Dat, Nam Hyeon-Woo, Po-Yuan Mao et al.

ICCV 2025posterarXiv:2505.01104
2
citations
#772

Exploiting Diffusion Prior for Task-driven Image Restoration

Jaeha Kim, Junghun Oh, Kyoung Mu Lee

ICCV 2025posterarXiv:2507.22459
2
citations
#773

Unlocking Constraints: Source-Free Occlusion-Aware Seamless Segmentation

Yihong Cao, Jiaming Zhang, Xu Zheng et al.

ICCV 2025posterarXiv:2506.21198
2
citations
#774

A Unified Framework for Motion Reasoning and Generation in Human Interaction

Jeongeun Park, Sungjoon Choi, Sangdoo Yun

ICCV 2025posterarXiv:2410.05628
2
citations
#775

Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling

Qirui Wu, Denys Iliash, Daniel Ritchie et al.

ICCV 2025highlightarXiv:2411.19492
2
citations
#776

GraspCoT: Integrating Physical Property Reasoning for 6-DoF Grasping under Flexible Language Instructions

Xiaomeng Chu, Jiajun Deng, Guoliang You et al.

ICCV 2025posterarXiv:2503.16013
2
citations
#777

Multi-Object Sketch Animation by Scene Decomposition and Motion Planning

Jingyu Liu, Zijie Xin, Yuhan Fu et al.

ICCV 2025posterarXiv:2503.19351
2
citations
#778

TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation

Yinda Chen, Haoyuan Shi, Xiaoyu Liu et al.

ICCV 2025posterarXiv:2405.16847
2
citations
#779

From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning

Yuhui Zeng, Haoxiang Wu, Wenjie Nie et al.

ICCV 2025posterarXiv:2502.05843
2
citations
#780

AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?

Shouwei Ruan, Hanqing Liu, Yao Huang et al.

ICCV 2025highlightarXiv:2412.03002
2
citations
#781

Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models

Vahid Balazadeh, Mohammadmehdi Ataei, Hyunmin Cheong et al.

ICCV 2025posterarXiv:2412.08619
2
citations
#782

DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing

Aniruddha Bala, Rohit Chowdhury, Rohan Jaiswal et al.

ICCV 2025highlightarXiv:2504.17894
2
citations
#783

A Structure-aware and Motion-adaptive Framework for 3D Human Pose Estimation with Mamba

Ye Lu, Jie Wang, Jianjun Gao et al.

ICCV 2025posterarXiv:2507.19852
2
citations
#784

Color Matching Using Hypernetwork-Based Kolmogorov-Arnold Networks

Artem Nikonorov, Georgy Perevozchikov, Andrei Korepanov et al.

ICCV 2025posterarXiv:2503.11781
2
citations
#785

ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering

Duong T. Tran, Trung-Kien Tran, Manfred Hauswirth et al.

ICCV 2025posterarXiv:2507.16403
2
citations
#786

LaRender: Training-Free Occlusion Control in Image Generation via Latent Rendering

Xiaohang Zhan, Dingming Liu

ICCV 2025posterarXiv:2508.07647
2
citations
#787

DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image

Jijun Xiang, Xuan Zhu, Xianqi Wang et al.

ICCV 2025posterarXiv:2504.01596
2
citations
#788

TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition

Xingsong Ye, Yongkun Du, Yunbo Tao et al.

ICCV 2025posterarXiv:2412.01137
2
citations
#789

CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation

Jianyu Wu, Yizhou Wang, Xiangyu Yue et al.

ICCV 2025posterarXiv:2504.20830
2
citations
#790

Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion

Enyu Liu, En Yu, Sijia Chen et al.

ICCV 2025posterarXiv:2507.08555
2
citations
#791

GT-Loc: Unifying When and Where in Images through a Joint Embedding Space

David G. Shatwell, Ishan Rajendrakumar Dave, Swetha Sirnam et al.

ICCV 2025posterarXiv:2507.10473
2
citations
#792

ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition

Sanjoy Kundu, Shanmukha Vellamcheti, Sathyanarayanan Aakur

ICCV 2025posterarXiv:2504.03948
2
citations
#793

Hybrid-grained Feature Aggregation with Coare-to-fine Language Guidance for Self-supervised Monocular Depth Estimation

Wenyao Zhang, Hongsi Liu, Bohan Li et al.

ICCV 2025poster
2
citations
#794

Trust but Verify: Programmatic VLM Evaluation in the Wild

Viraj Prabhu, Senthil Purushwalkam, An Yan et al.

ICCV 2025posterarXiv:2410.13121
2
citations
#795

ETA: Energy-based Test-time Adaptation for Depth Completion

Younjoon Chung, Hyoungseob Park, Patrick Rim et al.

ICCV 2025posterarXiv:2508.05989
2
citations
#796

Supercharged One-step Text-to-Image Diffusion Models with Negative Prompts

Viet Nguyen, Anh Nguyen, Trung Dao et al.

ICCV 2025posterarXiv:2412.02687
2
citations
#797

InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

Tao Han, Wanghan Xu, Junchao Gong et al.

ICCV 2025posterarXiv:2509.10441
2
citations
#798

Hallucinatory Image Tokens: A Training-free EAZY Approach to Detecting and Mitigating Object Hallucinations in LVLMs

Liwei Che, Qingze T Liu, Jing Jia et al.

ICCV 2025posterarXiv:2503.07772
2
citations
#799

Generate, Transduct, Adapt: Iterative Transduction with VLMs

Oindrila Saha, Logan Lawrence, Grant Horn et al.

ICCV 2025posterarXiv:2501.06031
2
citations
#800

Differentiable Room Acoustic Rendering with Multi-View Vision Priors

Derong Jin, Ruohan Gao

ICCV 2025posterarXiv:2504.21847
2
citations