Most Cited ICCV "bioweapons development" Papers

2,701 papers found • Page 4 of 14

#601

HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models

ZHIXIANG WEI, Guangting Wang, Xiaoxiao Ma et al.

ICCV 2025arXiv:2507.22431
6
citations
#602

AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs

Sanjoy Chowdhury, Hanan Gani, Nishit Anand et al.

ICCV 2025arXiv:2503.23219
6
citations
#603

Frequency-Dynamic Attention Modulation For Dense Prediction

Linwei Chen, Lin Gu, Ying Fu

ICCV 2025arXiv:2507.12006
6
citations
#604

Textured 3D Regenerative Morphing with 3D Diffusion Prior

Songlin Yang, Yushi LAN, Honghua Chen et al.

ICCV 2025arXiv:2502.14316
6
citations
#605

Snakes and Ladders: Two Steps Up for VideoMamba

Hui Lu, Albert Ali Salah, Ronald Poppe

ICCV 2025arXiv:2406.19006
6
citations
#606

MIEB: Massive Image Embedding Benchmark

Chenghao Xiao, Isaac Chung, Imene Kerboua et al.

ICCV 2025arXiv:2504.10471
6
citations
#607

Differentially Private Fine-Tuning of Diffusion Models

Yu-Lin Tsai, Yizhe Li, Zekai Chen et al.

ICCV 2025arXiv:2406.01355
6
citations
#608

ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models

Zifu Wan, Ce Zhang, Silong Yong et al.

ICCV 2025arXiv:2507.00898
6
citations
#609

R-LiViT: A LiDAR-Visual-Thermal Dataset Enabling Vulnerable Road User Focused Roadside Perception

Jonas Mirlach, Lei Wan, Andreas Wiedholz et al.

ICCV 2025arXiv:2503.17122
6
citations
#610

Augmented Mass-Spring Model for Real-Time Dense Hair Simulation

Jorge Herrera, Yi Zhou, Xin Sun et al.

ICCV 2025arXiv:2412.17144
6
citations
#611

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models

Kim Sung-Bin, Jeongsoo Choi, Puyuan Peng et al.

ICCV 2025arXiv:2504.02386
6
citations
#612

Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction

Yunheng Li, Yuxuan Li, Quan-Sheng Zeng et al.

ICCV 2025arXiv:2412.06244
6
citations
#613

Dual-Process Image Generation

Grace Luo, Jonathan Granskog, Aleksander Holynski et al.

ICCV 2025arXiv:2506.01955
6
citations
#614

Dissecting Generalized Category Discovery: Multiplex Consensus under Self-Deconstruction

Luyao Tang, Kunze Huang, Yuxuan Yuan et al.

ICCV 2025highlightarXiv:2508.10731
6
citations
#615

ZeroStereo: Zero-shot Stereo Matching from Single Images

Xianqi Wang, Hao Yang, Gangwei Xu et al.

ICCV 2025arXiv:2501.08654
6
citations
#616

TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation

Yinda Chen, Haoyuan Shi, Xiaoyu Liu et al.

ICCV 2025arXiv:2405.16847
6
citations
#617

Driving View Synthesis on Free-form Trajectories with Generative Prior

Zeyu Yang, Zijie Pan, Yuankun Yang et al.

ICCV 2025arXiv:2412.01717
6
citations
#618

AnyCalib: On-Manifold Learning for Model-Agnostic Single-View Camera Calibration

Javier Tirado-Garín, Javier Civera

ICCV 2025arXiv:2503.12701
6
citations
#619

GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scene

Xiao Chen, Tai Wang, Quanyi Li et al.

ICCV 2025arXiv:2505.20294
6
citations
#620

MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs

Jiawei Mao, Yuhan Wang, Yucheng Tang et al.

ICCV 2025arXiv:2504.06897
6
citations
#621

LVFace: Progressive Cluster Optimization for Large Vision Models in Face Recognition

Jinghan You, Shanglin Li, Yuanrui Sun et al.

ICCV 2025highlightarXiv:2501.13420
6
citations
#622

SweetTok: Semantic-Aware Spatial-Temporal Tokenizer for Compact Video Discretization

Zhentao Tan, Ben Xue, Jian Jia et al.

ICCV 2025arXiv:2412.10443
6
citations
#623

CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization

Jan Ackermann, Jonas Kulhanek, Shengqu Cai et al.

ICCV 2025arXiv:2506.21117
6
citations
#624

CODA: Repurposing Continuous VAEs for Discrete Tokenization

Zeyu Liu, Zanlin Ni, Yeguo Hua et al.

ICCV 2025arXiv:2503.17760
6
citations
#625

DynamicFace: High-Quality and Consistent Face Swapping for Image and Video using Composable 3D Facial Priors

Runqi Wang, Yang Chen, Sijie Xu et al.

ICCV 2025arXiv:2501.08553
6
citations
#626

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

Zhiqi Ge, Juncheng Li, Xinglei Pang et al.

ICCV 2025arXiv:2412.10342
6
citations
#627

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

Zichen Tang, Haihong E, Jiacheng Liu et al.

ICCV 2025arXiv:2508.04625
6
citations
#628

RadarSplat: Radar Gaussian Splatting for High-Fidelity Data Synthesis and 3D Reconstruction of Autonomous Driving Scenes

Pou-Chun Kung, Skanda Harisha, Ram Vasudevan et al.

ICCV 2025arXiv:2506.01379
6
citations
#629

GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting

Andrew Bond, Jui-Hsien Wang, Long Mai et al.

ICCV 2025arXiv:2501.04782
6
citations
#630

PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning

Yan Zhang, Yao Feng, Alpár Cseke et al.

ICCV 2025arXiv:2503.17544
6
citations
#631

CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image

Wonseok Roh, Hwanhee Jung, JongWook Kim et al.

ICCV 2025arXiv:2412.12906
6
citations
#632

FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation

Cui Miao, Tao Chang, meihan wu et al.

ICCV 2025arXiv:2508.02190
5
citations
#633

CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning

Duo Wu, Jinghe Wang, Yuan Meng et al.

ICCV 2025arXiv:2411.16313
5
citations
#634

Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations

Jeong Hun Yeo, Minsu Kim, Chae Won Kim et al.

ICCV 2025arXiv:2503.06273
5
citations
#635

Adding Additional Control to One-Step Diffusion with Joint Distribution Matching

Yihong Luo, Tianyang Hu, Yifan Song et al.

ICCV 2025arXiv:2503.06652
5
citations
#636

Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing

Joonghyuk Shin, Alchan Hwang, Yujin Kim et al.

ICCV 2025arXiv:2508.07519
5
citations
#637

DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models

hongji yang, Wencheng Han, Yucheng Zhou et al.

ICCV 2025arXiv:2502.14779
5
citations
#638

LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal

Shr-Ruei Tsai, Wei-Cheng Chang, Jie-Ying Lee et al.

ICCV 2025arXiv:2510.15868
5
citations
#639

SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

Gengze Zhou, Yicong Hong, Zun Wang et al.

ICCV 2025arXiv:2412.05552
5
citations
#640

Manual-PA: Learning 3D Part Assembly from Instruction Diagrams

Jiahao Zhang, Anoop Cherian, Cristian Rodriguez-Opazo et al.

ICCV 2025arXiv:2411.18011
5
citations
#641

DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction

Rui Wang, Quentin Lohmeyer, Mirko Meboldt et al.

ICCV 2025arXiv:2503.13176
5
citations
#642

Information Density Principle for MLLM Benchmarks

Chunyi Li, Xiaozhe Li, Zicheng Zhang et al.

ICCV 2025arXiv:2503.10079
5
citations
#643

EDiT: Efficient Diffusion Transformers with Linear Compressed Attention

Philipp Becker, Abhinav Mehrotra, Ruchika Chavhan et al.

ICCV 2025arXiv:2503.16726
5
citations
#644

Fine-grained Spatiotemporal Grounding on Egocentric Videos

Shuo LIANG, Yiwu Zhong, Zi-Yuan Hu et al.

ICCV 2025arXiv:2508.00518
5
citations
#645

MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips

SHIBO WANG, Haonan He, Maria Parelli et al.

ICCV 2025arXiv:2508.05506
5
citations
#646

VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving

Ruifei Zhang, Wei Zhang, Xiao Tan et al.

ICCV 2025arXiv:2511.06256
5
citations
#647

Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels

Olaf Dünkel, Thomas Wimmer, Christian Theobalt et al.

ICCV 2025arXiv:2506.05312
5
citations
#648

DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer

Yecheng Wu, Han Cai, Junyu Chen et al.

ICCV 2025arXiv:2507.04947
5
citations
#649

BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes

Minkyun Seo, Hyungtae Lim, Kanghee Lee et al.

ICCV 2025highlightarXiv:2503.07940
5
citations
#650

egoPPG: Heart Rate Estimation from Eye-Tracking Cameras in Egocentric Systems to Benefit Downstream Vision Tasks

Björn Braun, Rayan Armani, Manuel Meier et al.

ICCV 2025arXiv:2502.20879
5
citations
#651

TriDi: Trilateral Diffusion of 3D Humans, Objects, and Interactions

Ilya A. Petrov, Riccardo Marin, Julian Chibane et al.

ICCV 2025arXiv:2412.06334
5
citations
#652

StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting

Shakiba Kheradmand, Delio Vicini, George Kopanas et al.

ICCV 2025arXiv:2503.24366
5
citations
#653

Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation

Youwei Zheng, Yuxi Ren, Xin Xia et al.

ICCV 2025arXiv:2510.09094
5
citations
#654

CaO2: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation

Haoxuan Wang, Zhenghao Zhao, Junyi Wu et al.

ICCV 2025
5
citations
#655

Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation

Yujie Zhang, Bingyang Cui, Qi Yang et al.

ICCV 2025arXiv:2412.11170
5
citations
#656

X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation

jian ma, Qirong Peng, Xu Guo et al.

ICCV 2025arXiv:2503.06134
5
citations
#657

GUAVA: Generalizable Upper Body 3D Gaussian Avatar

Dongbin Zhang, Yunfei Liu, Lijian Lin et al.

ICCV 2025arXiv:2505.03351
5
citations
#658

GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR

Christophe Bolduc, Yannick Hold-Geoffroy, Jean-Francois Lalonde

ICCV 2025arXiv:2504.10809
5
citations
#659

Improving Multimodal Learning via Imbalanced Learning

Shicai Wei, Chunbo Luo, Yang Luo

ICCV 2025arXiv:2507.10203
5
citations
#660

Adaptive Hyper-Graph Convolution Network for Skeleton-based Human Action Recognition with Virtual Connections

Youwei Zhou, Tianyang Xu, Cong Wu et al.

ICCV 2025arXiv:2411.14796
5
citations
#661

Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation

Tuna Meral, Enis Simsar, Federico Tombari et al.

ICCV 2025highlightarXiv:2403.19776
5
citations
#662

PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Model

Jinhua Zhang, Hualian Sheng, Sijia Cai et al.

ICCV 2025arXiv:2407.06109
5
citations
#663

NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes

Han-Hung Lee, Qinghong Han, Angel Chang

ICCV 2025arXiv:2503.16375
5
citations
#664

Synergistic Prompting for Robust Visual Recognition with Missing Modalities

Zhihui Zhang, Luanyuan Dai, Qika Lin et al.

ICCV 2025arXiv:2507.07802
5
citations
#665

Street Gaussians without 3D Object Tracker

Ruida Zhang, Chengxi Li, Chenyangguang Zhang et al.

ICCV 2025arXiv:2412.05548
5
citations
#666

Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers

Divyansh Srivastava, Xiang Zhang, He Wen et al.

ICCV 2025arXiv:2505.04718
5
citations
#667

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding

Yuanhan Zhang, Yunice Chew, Yuhao Dong et al.

ICCV 2025arXiv:2507.15028
5
citations
#668

HERO: Human Reaction Generation from Videos

Chengjun Yu, Wei Zhai, Yuhang Yang et al.

ICCV 2025arXiv:2503.08270
5
citations
#669

WaveMamba: Wavelet-Driven Mamba Fusion for RGB-Infrared Object Detection

Haodong Zhu, Wenhao Dong, Linlin Yang et al.

ICCV 2025arXiv:2507.18173
5
citations
#670

InterGSEdit: Interactive 3D Gaussian Splatting Editing with 3D Geometry-Consistent Attention Prior

Minghao Wen, Shengjie Wu, Kangkan Wang et al.

ICCV 2025arXiv:2507.04961
5
citations
#671

DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation

Donglin Di, He Feng, Wenzhang SUN et al.

ICCV 2025arXiv:2410.07151
5
citations
#672

NeurOp-Diff: Continuous Remote Sensing Image Super-Resolution via Neural Operator Diffusion

Zihao Xu, Yuzhi Tang, Bowen Xu et al.

ICCV 2025
5
citations
#673

The Silent Assistant: NoiseQuery as Implicit Guidance for Goal-Driven Image Generation

Ruoyu Wang, Huayang Huang, Ye Zhu et al.

ICCV 2025highlightarXiv:2412.05101
5
citations
#674

Not All Frame Features Are Equal: Video-to-4D Generation via Decoupling Dynamic-Static Features

Liying Yang, Chen Liu, Zhenwei Zhu et al.

ICCV 2025highlightarXiv:2502.08377
5
citations
#675

Latent Diffusion Models with Masked AutoEncoders

Junho Lee, Jeongwoo Shin, Hyungwook Choi et al.

ICCV 2025arXiv:2507.09984
5
citations
#676

AnyI2V: Animating Any Conditional Image with Motion Control

Ziye Li, Xincheng Shuai, Hao Luo et al.

ICCV 2025arXiv:2507.02857
5
citations
#677

PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling

Hao Zhang, Haolan Xu, Chun Feng et al.

ICCV 2025arXiv:2506.20936
5
citations
#678

Φ-GAN:Physics-Inspired GAN for Generating SAR Images Under Limited Data

Xidan Zhang, Yihan Zhuang, Qian Guo et al.

ICCV 2025
5
citations
#679

G-DexGrasp: Generalizable Dexterous Grasping Synthesis Via Part-Aware Prior Retrieval and Prior-Assisted Generation

Juntao Jian, Xiuping Liu, Zixuanchen Zixuanchen et al.

ICCV 2025arXiv:2503.19457
5
citations
#680

Versatile Transition Generation with Image-to-Video Diffusion

Zuhao Yang, Jiahui Zhang, Yingchen Yu et al.

ICCV 2025arXiv:2508.01698
5
citations
#681

MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network

Jianfei Jiang, Qiankun Liu, Haochen Yu et al.

ICCV 2025arXiv:2507.11333
5
citations
#682

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding

Rui Hu, Yuxuan Zhang, Lianghui Zhu et al.

ICCV 2025arXiv:2503.10596
5
citations
#683

Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation by Object Tracking

Qiangqiang Wu, Yi Yu, Chenqi Kong et al.

ICCV 2025arXiv:2507.07483
5
citations
#684

DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization

Wenchuan Wang, Mengqi Huang, Yijing Tu et al.

ICCV 2025arXiv:2505.02192
5
citations
#685

VisNumBench: Evaluating Number Sense of Multimodal Large Language Models

Tengjin Weng, Jingyi Wang, Wenhao Jiang et al.

ICCV 2025arXiv:2503.14939
5
citations
#686

EgoM2P: Egocentric Multimodal Multitask Pretraining

Gen Li, Yutong Chen, Yiqian Wu et al.

ICCV 2025arXiv:2506.07886
5
citations
#687

Scendi Score: Prompt‑Aware Diversity Evaluation via Schur Complement of CLIP Embeddings

Azim Ospanov, Mohammad Jalali, Farzan Farnia

ICCV 2025highlightarXiv:2412.18645
5
citations
#688

Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data

Qi Chen, Xinze Zhou, Chen Liu et al.

ICCV 2025arXiv:2510.14831
5
citations
#689

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality

Sijie Li, Chen Chen, Jungong Han

ICCV 2025arXiv:2507.19264
5
citations
#690

VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization

Sihan Yang, Runsen Xu, Chenhang Cui et al.

ICCV 2025arXiv:2508.05211
5
citations
#691

CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation

Zhuoyan Luo, Yinghao Wu, Tianheng Cheng et al.

ICCV 2025arXiv:2405.15658
5
citations
#692

RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping

Dongming Wu, Yanping Fu, Saike Huang et al.

ICCV 2025arXiv:2507.23734
5
citations
#693

Auto-Regressively Generating Multi-View Consistent Images

JiaKui Hu, Yuxiao Yang, Jialun Liu et al.

ICCV 2025arXiv:2506.18527
5
citations
#694

Integrating Task-Specific and Universal Adapters for Pre-Trained Model-based Class-Incremental Learning

yan wang, Da-Wei Zhou, Han-Jia Ye

ICCV 2025arXiv:2508.08165
5
citations
#695

Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion

shengyuan zhang, An Zhao, Ling Yang et al.

ICCV 2025arXiv:2412.03515
5
citations
#696

Towards Immersive Human-X Interaction: A Real-Time Framework for Physically Plausible Motion Synthesis

Kaiyang Ji, Ye Shi, Zichen Jin et al.

ICCV 2025highlightarXiv:2508.02106
5
citations
#697

ObjectRelator: Enabling Cross-View Object Relation Understanding Across Ego-Centric and Exo-Centric Perspectives

Yuqian Fu, Runze Wang, Bin Ren et al.

ICCV 2025highlightarXiv:2411.19083
5
citations
#698

From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment

Yucheng Suo, Fan Ma, Linchao Zhu et al.

ICCV 2025arXiv:2503.20472
5
citations
#699

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

Jiahui Wang, Zuyan Liu, Yongming Rao et al.

ICCV 2025arXiv:2506.05344
5
citations
#700

Exploring the Visual Feature Space for Multimodal Neural Decoding

Weihao Xia, Cengiz Oztireli

ICCV 2025arXiv:2505.15755
5
citations
#701

Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy

JUNHAO WEI, YU ZHE, Jun Sakuma

ICCV 2025arXiv:2503.07661
5
citations
#702

SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image

Dimitrije Antić, Georgios Paschalidis, Shashank Tripathi et al.

ICCV 2025arXiv:2409.16178
5
citations
#703

Edicho: Consistent Image Editing in the Wild

Qingyan Bai, Hao Ouyang, Yinghao Xu et al.

ICCV 2025arXiv:2412.21079
5
citations
#704

TikZero: Zero-Shot Text-Guided Graphics Program Synthesis

Jonas Belouadi, Eddy Ilg, Margret Keuper et al.

ICCV 2025highlightarXiv:2503.11509
5
citations
#705

AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction

Junhao Cheng, Yuying Ge, Yixiao Ge et al.

ICCV 2025arXiv:2504.01014
5
citations
#706

Adversarial Robust Memory-Based Continual Learner

Xiaoyue Mi, Fan Tang, Zonghan Yang et al.

ICCV 2025arXiv:2311.17608
5
citations
#707

MOSCATO: Predicting Multiple Object State Change Through Actions

Parnian Zameni, Yuhan Shen, Ehsan Elhamifar

ICCV 2025
5
citations
#708

HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction

Sara Rojas Martinez, Matthieu Armando, Bernard Ghanem et al.

ICCV 2025arXiv:2508.16433
5
citations
#709

Streaming VideoLLMs for Real-Time Procedural Video Understanding

Dibyadip Chatterjee, Edoardo Remelli, Yale Song et al.

ICCV 2025arXiv:2504.13915
5
citations
#710

2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos

Marvin Heidinger, Snehal Jauhri, Vignesh Prasad et al.

ICCV 2025arXiv:2503.09320
5
citations
#711

Dynamic Dictionary Learning for Remote Sensing Image Segmentation

Xuechao Zou, Yue Li, Shun Zhang et al.

ICCV 2025arXiv:2503.06683
5
citations
#712

MikuDance: Animating Character Art with Mixed Motion Dynamics

Jiaxu Zhang, Xianfang Zeng, Xin Chen et al.

ICCV 2025arXiv:2411.08656
5
citations
#713

Bokehlicious: Photorealistic Bokeh Rendering with Controllable Apertures

Tim Seizinger, Florin-Alexandru Vasluianu, Marcos Conde et al.

ICCV 2025highlightarXiv:2503.16067
5
citations
#714

TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition

Xingsong Ye, Yongkun Du, Yunbo Tao et al.

ICCV 2025arXiv:2412.01137
5
citations
#715

GaussianUpdate: Continual 3D Gaussian Splatting Update for Changing Environments

Lin Zeng, Boming Zhao, Jiarui Hu et al.

ICCV 2025arXiv:2508.08867
5
citations
#716

GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts

Minwen Liao, Hao Dong, Xinyi Wang et al.

ICCV 2025arXiv:2503.07417
5
citations
#717

Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving

Junhao Ge, Zuhong Liu, Longteng Fan et al.

ICCV 2025arXiv:2503.18108
5
citations
#718

OpenM3D: Open Vocabulary Multi-view Indoor 3D Object Detection without Human Annotations

Peng-Hao Hsu, Ke Zhang, Fu-En Wang et al.

ICCV 2025arXiv:2508.20063
5
citations
#719

SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations

Songchun Zhang, Huiyao Xu, Sitong Guo et al.

ICCV 2025arXiv:2505.11992
5
citations
#720

SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering

Byeongjun Park, Hyojun Go, Hyelin Nam et al.

ICCV 2025arXiv:2503.12024
5
citations
#721

DisTime: Distribution-based Time Representation for Video Large Language Models

yingsen zeng, Zepeng Huang, Yujie Zhong et al.

ICCV 2025arXiv:2505.24329
5
citations
#722

Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis

Junyan Ye, Jun He, Weijia Li et al.

ICCV 2025arXiv:2408.01812
5
citations
#723

4D Visual Pre-training for Robot Learning

Chengkai Hou, Yanjie Ze, Yankai Fu et al.

ICCV 2025arXiv:2508.17230
5
citations
#724

Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models?

Yuru Jia, Valerio Marsocci, Ziyang Gong et al.

ICCV 2025arXiv:2503.07890
5
citations
#725

MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

Shuangkang Fang, I-Chao Shen, Yufeng Wang et al.

ICCV 2025highlightarXiv:2508.01242
5
citations
#726

DuCos: Duality Constrained Depth Super-Resolution via Foundation Model

Zhiqiang Yan, Zhengxue Wang, Haoye Dong et al.

ICCV 2025arXiv:2503.04171
5
citations
#727

ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness

Boqian Li, Zeyu Cai, Michael Black et al.

ICCV 2025highlightarXiv:2503.10624
5
citations
#728

IDFace: Face Template Protection for Efficient and Secure Identification

Sunpill Kim, Seunghun Paik, Chanwoo Hwang et al.

ICCV 2025arXiv:2507.12050
5
citations
#729

Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning

Qi Wang, Zhipeng Zhang, Baao Xie et al.

ICCV 2025arXiv:2503.08751
5
citations
#730

LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

Nan Chen, Mengqi Huang, Yihao Meng et al.

ICCV 2025arXiv:2507.01945
5
citations
#731

Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras

Shuang Guo, Friedhelm Hamann, Guillermo Gallego

ICCV 2025highlightarXiv:2503.17262
5
citations
#732

DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Model

Junjia Huang, Pengxiang Yan, Jinhang Cai et al.

ICCV 2025highlight
5
citations
#733

Learning Normal Flow Directly From Events

Dehao Yuan, Levi Burner, Jiayi Wu et al.

ICCV 2025arXiv:2412.11284
5
citations
#734

MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance

Hallee Wong, Jose Javier Gonzalez Ortiz, John Guttag et al.

ICCV 2025arXiv:2412.15058
5
citations
#735

Sat2City: 3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion

Tongyan Hua, Lutao Jiang, Ying-Cong Chen et al.

ICCV 2025arXiv:2507.04403
5
citations
#736

DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations

Xiaohui Li, Yihao Liu, Shuo Cao et al.

ICCV 2025arXiv:2501.10110
5
citations
#737

External Knowledge Injection for CLIP-Based Class-Incremental Learning

Da-Wei Zhou, Kai-Wen Li, Jingyi Ning et al.

ICCV 2025arXiv:2503.08510
5
citations
#738

HOLa: Zero-Shot HOI Detection with Low-Rank Decomposed VLM Feature Adaptation

Qinqian Lei, Bo Wang, Robby Tan

ICCV 2025arXiv:2507.15542
5
citations
#739

Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis

Zhuokun Chen, Jugang Fan, Zhuowei Yu et al.

ICCV 2025arXiv:2507.20454
5
citations
#740

Multi-identity Human Image Animation with Structural Video Diffusion

Zhenzhi Wang, Yixuan Li, yanhong zeng et al.

ICCV 2025arXiv:2504.04126
5
citations
#741

SEGS-SLAM: Structure-enhanced 3D Gaussian Splatting SLAM with Appearance Embedding

Tianci Wen, Zhiang Liu, Yongchun Fang

ICCV 2025arXiv:2501.05242
5
citations
#742

Motion Synthesis with Sparse and Flexible Keyjoint Control

Inwoo Hwang, Jinseok Bae, Donggeun Lim et al.

ICCV 2025arXiv:2503.15557
5
citations
#743

C4D: 4D Made from 3D through Dual Correspondences

Shizun Wang, Zhenxiang Jiang, Xingyi Yang et al.

ICCV 2025arXiv:2510.14960
5
citations
#744

AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian Rendering

Michael Steiner, Thomas Köhler, Lukas Radl et al.

ICCV 2025highlightarXiv:2504.12811
5
citations
#745

A Quality-Guided Mixture of Score-Fusion Experts Framework for Human Recognition

Jie Zhu, Yiyang Su, Minchul Kim et al.

ICCV 2025arXiv:2508.00053
5
citations
#746

ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation

Jimyeong Kim, Jungwon Park, Yeji Song et al.

ICCV 2025highlightarXiv:2507.01496
5
citations
#747

Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration

Junyuan Deng, Wei Yin, Xiaoyang Guo et al.

ICCV 2025arXiv:2411.17240
5
citations
#748

SignRep: Enhancing Self-Supervised Sign Representations

Ryan Wong, Necati Cihan Camgoz, Richard Bowden

ICCV 2025arXiv:2503.08529
5
citations
#749

Frequency-Guided Posterior Sampling for Diffusion-Based Image Restoration

Darshan Thaker, Abhishek Goyal, Rene Vidal

ICCV 2025arXiv:2411.15295
5
citations
#750

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Wenqi Zhang, Hang Zhang, Xin Li et al.

ICCV 2025highlightarXiv:2501.00958
5
citations
#751

Latent-Reframe: Enabling Camera Control for Video Diffusion Models without Training

Zhenghong Zhou, Jie An, Jiebo Luo

ICCV 2025arXiv:2412.06029
5
citations
#752

Curve-Aware Gaussian Splatting for 3D Parametric Curve Reconstruction

Zhirui Gao, Renjiao Yi, YaQiao Dai et al.

ICCV 2025arXiv:2506.21401
5
citations
#753

Precise Action-to-Video Generation Through Visual Action Prompts

Yuang Wang, Chao Wen, Haoyu Guo et al.

ICCV 2025arXiv:2508.13104
5
citations
#754

FaceLift: Learning Generalizable Single Image 3D Face Reconstruction from Synthetic Heads

Weijie Lyu, Yi Zhou, Ming-Hsuan Yang et al.

ICCV 2025arXiv:2412.17812
5
citations
#755

ARGUS: Hallucination and Omission Evaluation in Video-LLMs

Ruchit Rawal, Reza Shirkavand, Heng Huang et al.

ICCV 2025arXiv:2506.07371
4
citations
#756

CAVIS: Context-Aware Video Instance Segmentation

Seunghun Lee, Jiwan Seo, Kiljoon Han et al.

ICCV 2025arXiv:2407.03010
4
citations
#757

IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation

Wenxuan Guo, Xiuwei Xu, Hang Yin et al.

ICCV 2025arXiv:2508.00823
4
citations
#758

ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

Shaofeng Yin, Ting Lei, Yang Liu

ICCV 2025arXiv:2508.03284
4
citations
#759

Test-time Adaptation for Foundation Medical Segmentation Model Without Parametric Updates

Kecheng Chen, Xinyu Luo, Tiexin Qin et al.

ICCV 2025highlightarXiv:2504.02008
4
citations
#760

Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation

Jungeun Kim, Hyeongwoo Jeon, Jongseong Bae et al.

ICCV 2025arXiv:2411.16789
4
citations
#761

PriorMotion: Generative Class-Agnostic Motion Prediction with Raster-Vector Motion Field Priors

Kangan Qian, Jinyu Miao, Xinyu Jiao et al.

ICCV 2025
4
citations
#762

Generative Zoo

Tomasz Niewiadomski, Anastasios Yiannakidis, Hanz Cuevas Velasquez et al.

ICCV 2025arXiv:2412.08101
4
citations
#763

Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos

Changwoon Choi, Jeongjun Kim, Geonho Cha et al.

ICCV 2025arXiv:2412.19089
4
citations
#764

TurboTrain: Towards Efficient and Balanced Multi-Task Learning for Multi-Agent Perception and Prediction

Zewei Zhou, Zhihao Zhao, Tianhui Cai et al.

ICCV 2025arXiv:2508.04682
4
citations
#765

GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology

Saarthak Kapse, Pushpak Pati, Srikar Yellapragada et al.

ICCV 2025highlightarXiv:2504.01009
4
citations
#766

GAP: Gaussianize Any Point Clouds with Text Guidance

Weiqi Zhang, Junsheng Zhou, Haotian Geng et al.

ICCV 2025arXiv:2508.05631
4
citations
#767

Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization

Li, Yang Xiao, Jie Ji et al.

ICCV 2025arXiv:2504.09039
4
citations
#768

X-Capture: An Open-Source Portable Device for Multi-Sensory Learning

Samuel Clarke, Suzannah Wistreich, Yanjie Ze et al.

ICCV 2025arXiv:2504.02318
4
citations
#769

Arti-PG: A Toolbox for Procedurally Synthesizing Large-Scale and Diverse Articulated Objects with Rich Annotations

Jianhua Sun, Yuxuan Li, Jiude Wei et al.

ICCV 2025arXiv:2412.14974
4
citations
#770

Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning

Tianyi Zhao, Boyang Liu, Yanglei Gao et al.

ICCV 2025arXiv:2503.11780
4
citations
#771

Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation

Jie Xu, Na Zhao, Gang Niu et al.

ICCV 2025arXiv:2503.04151
4
citations
#772

Learning to Unlearn while Retaining: Combating Gradient Conflicts in Machine Unlearning

Gaurav Patel, Qiang Qiu

ICCV 2025arXiv:2503.06339
4
citations
#773

Inverse 3D Microscopy Rendering for Cell Shape Inference with Active Mesh

Sacha Ichbiah, Anshuman Sinha, Fabrice Delbary et al.

ICCV 2025highlightarXiv:2303.10440
4
citations
#774

Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving

Hao Zhou, Zhanning Gao, Zhili Chen et al.

ICCV 2025arXiv:2411.13076
4
citations
#775

HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis

Timo Teufel, xilong zhou, Umar Iqbal et al.

ICCV 2025arXiv:2508.09137
4
citations
#776

BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation

Ruotong Wang, Mingli Zhu, Jiarong Ou et al.

ICCV 2025arXiv:2504.16907
4
citations
#777

VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges

Yuxuan Wang, Yiqi Song, Cihang Xie et al.

ICCV 2025arXiv:2409.01071
4
citations
#778

SceneMI: Motion In-betweening for Modeling Human-Scene Interaction

Inwoo Hwang, Bing Zhou, Young Min Kim et al.

ICCV 2025highlightarXiv:2503.16289
4
citations
#779

Occupancy Learning with Spatiotemporal Memory

Ziyang Leng, Jiawei Yang, Wenlong Yi et al.

ICCV 2025arXiv:2508.04705
4
citations
#780

ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models

Shadi Hamdan, Chonghao Sima, Zetong Yang et al.

ICCV 2025arXiv:2506.07725
4
citations
#781

FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing

Bizhu Wu, Jinheng Xie, Meidan Ding et al.

ICCV 2025arXiv:2507.19850
4
citations
#782

Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues

Francesco Taioli, Edoardo Zorzi, Gianni Franchi et al.

ICCV 2025arXiv:2412.01250
4
citations
#783

GS-ID: Illumination Decomposition on Gaussian Splatting via Adaptive Light Aggregation and Diffusion-Guided Material Priors

Kang DU, Zhihao Liang, Yulin Shen et al.

ICCV 2025arXiv:2408.08524
4
citations
#784

DistillDrive: End-to-End Multi-Mode Autonomous Driving Distillation by Isomorphic Hetero-Source Planning Model

Rui Yu, Xianghang Zhang, Runkai Zhao et al.

ICCV 2025arXiv:2508.05402
4
citations
#785

CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching

Zizhuo Li, Yifan Lu, Linfeng Tang et al.

ICCV 2025highlightarXiv:2503.23925
4
citations
#786

HORT: Monocular Hand-held Objects Reconstruction with Transformers

Zerui Chen, Rolandos Alexandros Potamias, Shizhe Chen et al.

ICCV 2025arXiv:2503.21313
4
citations
#787

GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning

Kelin Yu, Sheng Zhang, Harshit Soora et al.

ICCV 2025arXiv:2508.11049
4
citations
#788

UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis

Yuanrui Wang, Cong Han, Yafei Li et al.

ICCV 2025arXiv:2507.00992
4
citations
#789

ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition

Ronggang Huang, Haoxin Yang, Yan Cai et al.

ICCV 2025arXiv:2507.11261
4
citations
#790

Learning to Inference Adaptively for Multimodal Large Language Models

Zhuoyan Xu, Khoi Nguyen, Preeti Mukherjee et al.

ICCV 2025arXiv:2503.10905
4
citations
#791

ForgeLens: Data-Efficient Forgery Focus for Generalizable Forgery Image Detection

Yingjian Chen, Lei Zhang, Yakun Niu

ICCV 2025arXiv:2408.13697
4
citations
#792

VertexRegen: Mesh Generation with Continuous Level of Detail

Xiang Zhang, Yawar Siddiqui, Armen Avetisyan et al.

ICCV 2025arXiv:2508.09062
4
citations
#793

Free-Form Motion Control: Controlling the 6D Poses of Camera and Objects in Video Generation

Xincheng Shuai, Henghui Ding, Zhenyuan Qin et al.

ICCV 2025arXiv:2501.01425
4
citations
#794

Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis

Peng Zheng, Junke Wang, Yi Chang et al.

ICCV 2025arXiv:2507.01756
4
citations
#795

JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers

Kwon Byung-Ki, Qi Dai, Lee Hyoseok et al.

ICCV 2025arXiv:2505.00482
4
citations
#796

Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing

Taihang Hu, Linxuan Li, Kai Wang et al.

ICCV 2025arXiv:2504.10434
4
citations
#797

LayerD: Decomposing Raster Graphic Designs into Layers

Tomoyuki Suzuki, Kang-Jun Liu, Naoto Inoue et al.

ICCV 2025arXiv:2509.25134
4
citations
#798

DuoLoRA : Cycle-consistent and Rank-disentangled Content-Style Personalization

Aniket Roy, Shubhankar Borse, Shreya Kadambi et al.

ICCV 2025arXiv:2504.13206
4
citations
#799

IDF: Iterative Dynamic Filtering Networks for Generalizable Image Denoising

Dongjin Kim, Jaekyun Ko, Muhammad Kashif Ali et al.

ICCV 2025arXiv:2508.19649
4
citations
#800

Boosting Multimodal Learning via Disentangled Gradient Learning

Shicai Wei, Chunbo Luo, Yang Luo

ICCV 2025arXiv:2507.10213
4
citations