ICCV

2,701 papers tracked across 1 years

Top Papers in ICCV 2025

View all papers →
#1

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Guowei Xu, Peng Jin, ZiangWu ZiangWu et al.

ICCV 2025
338
citations
#2

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

yi yang, Xiaoxuan He, Hongkun Pan et al.

ICCV 2025
247
citations
#3

CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos

Nikita Karaev, Iurii Makarov, Jianyuan Wang et al.

ICCV 2025
211
citations
#4

LVBench: An Extreme Long Video Understanding Benchmark

Weihan Wang, zehai he, Wenyi Hong et al.

ICCV 2025
208
citations
#5

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Jingyi Zhang, Jiaxing Huang, Huanjin Yao et al.

ICCV 2025
206
citations
#6

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D Capabilities

CHENMING ZHU, Tai Wang, Wenwei Zhang et al.

ICCV 2025
127
citations
#7

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Decoupled Video Diffusion

Wenqiang Sun, Shuo Chen, Fangfu Liu et al.

ICCV 2025
103
citations
#8

GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

Quanfeng Lu, Wenqi Shao, Zitao Liu et al.

ICCV 2025
96
citations
#9

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

gaojie lin, Jianwen Jiang, Jiaqi Yang et al.

ICCV 2025
86
citations
#10

Stable Virtual Camera: Generative View Synthesis with Diffusion Models

Jensen Zhou, Hang Gao, Vikram Voleti et al.

ICCV 2025
83
citations
#11

REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers

Xingjian Leng, Jaskirat Singh, Yunzhong Hou et al.

ICCV 2025
73
citations
#12

MV-Adapter: Multi-View Consistent Image Generation Made Easy

Zehuan Huang, Yuan-Chen Guo, Haoran Wang et al.

ICCV 2025
69
citations
#13

EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer

Yuxuan Zhang, Yirui Yuan, Yiren Song et al.

ICCV 2025
69
citations
#14

MeshAnything V2: Artist-Created Mesh Generation with Adjacent Mesh Tokenization

Yiwen Chen, Yikai Wang, Yihao Luo et al.

ICCV 2025
66
citations
#15

GameFactory: Creating New Games with Generative Interactive Videos

Jiwen Yu, Yiran Qin, Xintao Wang et al.

ICCV 2025
63
citations
#16

ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation

Haoyu Fu, Diankun Zhang, Zongchuang Zhao et al.

ICCV 2025
62
citations
#17

DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

Xuemeng Yang, Licheng Wen, Tiantian Wei et al.

ICCV 2025
58
citations
#18

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

Chen Ziwen, Hao Tan, Kai Zhang et al.

ICCV 2025
56
citations
#19

Long Context Tuning for Video Generation

Yuwei Guo, Ceyuan Yang, Ziyan Yang et al.

ICCV 2025
56
citations
#20

Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment

Lijie Liu, Tianxiang Ma, Bingchuan Li et al.

ICCV 2025
55
citations