All Papers

34,180 papers found • Page 676 of 684

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Xiaohan Wang, Yuhui Zhang, Orr Zohar et al.

ECCV 2024poster

Video-Based Human Pose Regression via Decoupled Space-Time Aggregation

Jijie He, Wenwu Yang

CVPR 2024poster

VideoBooth: Diffusion-based Video Generation with Image Prompts

Yuming Jiang, Tianxing Wu, Shuai Yang et al.

CVPR 2024poster
118
citations

VideoClusterNet: Self-Supervised and Adaptive Face Clustering for Videos

Devesh Bilwakumar Walawalkar, Pablo Garrido

ECCV 2024posterarXiv:2407.12214
3
citations

VideoCon: Robust Video-Language Alignment via Contrast Captions

Hritik Bansal, Yonatan Bitton, Idan Szpektor et al.

CVPR 2024posterarXiv:2311.10111
28
citations

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Haoxin Chen, Yong Zhang, Xiaodong Cun et al.

CVPR 2024poster

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

XuDong Wang, Ishan Misra, Ziyun Zeng et al.

CVPR 2024posterarXiv:2308.14710
36
citations

Video Decomposition Prior: Editing Videos Layer by Layer

Gaurav Shrivastava, Ser-Nam Lim, Abhinav Shrivastava

ICLR 2024poster

Video Editing via Factorized Diffusion Distillation

Uriel Singer, Amit Zohar, Yuval Kirstain et al.

ECCV 2024posterarXiv:2403.09334
28
citations

Video Event Extraction with Multi-View Interaction Knowledge Distillation

Kaiwen Wei, Du Runyan, Li Jin et al.

AAAI 2024paper

Video Frame Interpolation via Direct Synthesis with the Event-based Reference

Yuhan Liu, Yongjian Deng, Hao Chen et al.

CVPR 2024poster

Video Frame Prediction from a Single Image and Events

Juanjuan Zhu, Zhexiong Wan, Yuchao Dai

AAAI 2024paper

VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding

Syed Talal Wasim, Muzammal Naseer, Salman Khan et al.

CVPR 2024poster

Video Harmonization with Triplet Spatio-Temporal Variation Patterns

Zonghui Guo, XinYu Han, Jie Zhang et al.

CVPR 2024poster

Video Interpolation with Diffusion Models

Siddhant Jain, Daniel Watson, Aleksander Holynski et al.

CVPR 2024poster
63
citations

Video-Language Aligned Transformer for Video Question Answering

AAAI 2024paper

Video Language Planning

Yilun Du, Sherry Yang, Pete Florence et al.

ICLR 2024poster
144
citations

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

Yang Jin, Zhicheng Sun, Kun Xu et al.

ICML 2024oral

VideoLLM-online: Online Video Large Language Model for Streaming Video

Joya Chen, Zhaoyang Lv, Shiwei Wu et al.

CVPR 2024posterarXiv:2406.11816
109
citations

VideoMAC: Video Masked Autoencoders Meet ConvNets

Gensheng Pei, Tao Chen, Xiruo Jiang et al.

CVPR 2024posterarXiv:2402.19082
20
citations

VideoMamba: Spatio-Temporal Selective State Space Model

Jinyoung Park, Hee-Seon Kim, Kangwook Ko et al.

ECCV 2024poster
23
citations

VideoMamba: State Space Model for Efficient Video Understanding

Kunchang Li, Xinhao Li, Yi Wang et al.

ECCV 2024posterarXiv:2403.06977
401
citations

Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition

Hao Fei, Shengqiong Wu, Wei Ji et al.

ICML 2024oral

Video-P2P: Video Editing with Cross-attention Control

Shaoteng Liu, Yuechen Zhang, Wenbo Li et al.

CVPR 2024poster
309
citations

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Dan Kondratyuk, Lijun Yu, Xiuye Gu et al.

ICML 2024poster

Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes

Gaurav Shrivastava, Abhinav Shrivastava

CVPR 2024poster
16
citations

VideoPrism: A Foundational Visual Encoder for Video Understanding

Long Zhao, Nitesh Bharadwaj Gundavarapu, Liangzhe Yuan et al.

ICML 2024poster

Video Question Answering with Procedural Programs

Rohan Choudhury, Koichiro Niinuma, Kris Kitani et al.

ECCV 2024posterarXiv:2312.00937
37
citations

Video ReCap: Recursive Captioning of Hour-Long Videos

Md Mohaiminul Islam, Vu Bao Ngan Ho, Xitong Yang et al.

CVPR 2024posterarXiv:2402.13250
82
citations

Video Recognition in Portrait Mode

Mingfei Han, Linjie Yang, Xiaojie Jin et al.

CVPR 2024posterarXiv:2312.13746

VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams

Liao Wang, Kaixin Yao, Chengcheng Guo et al.

CVPR 2024posterarXiv:2312.01407
21
citations

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

Guangzhi Sun, Wenyi Yu, Changli Tang et al.

ICML 2024oral

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

Xiang Fan, Anand Bhattad, Ranjay Krishna

ECCV 2024posterarXiv:2403.14617
23
citations

VideoStudio: Generating Consistent-Content and Multi-Scene Videos

Fuchen Long, Zhaofan Qiu, Ting Yao et al.

ECCV 2024poster

Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention

Xingyu Zhou, Leheng Zhang, Xiaorui Zhao et al.

CVPR 2024posterarXiv:2401.06312

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

Yuchao Gu, Yipin Zhou, Bichen Wu et al.

CVPR 2024poster
63
citations

VidLA: Video-Language Alignment at Scale

Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan et al.

CVPR 2024posterarXiv:2403.14870
8
citations

vid-TLDR: Training Free Token Merging for Light-weight Video Transformer

Joonmyung Choi, Sanghyeok Lee, Jaewon Chu et al.

CVPR 2024poster

VidToMe: Video Token Merging for Zero-Shot Video Editing

Xirui Li, Chao Ma, Xiaokang Yang et al.

CVPR 2024poster
89
citations

View-Category Interactive Sharing Transformer for Incomplete Multi-View Multi-Label Learning

Shilong Ou, Zhe Xue, Yawen Li et al.

CVPR 2024highlight

View-Consistent 3D Editing with Gaussian Splatting

Yuxuan Wang, Xuanyu Yi, Zike Wu et al.

ECCV 2024poster

View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields

Haodi He, Colton Stearns, Adam Harley et al.

ECCV 2024posterarXiv:2405.19678
4
citations

View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network

Quan Zhang, Lei Wang, Vishal M. Patel et al.

CVPR 2024posterarXiv:2403.14513
32
citations

ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models

Lukas Höllein, Aljaž Božič, Norman Müller et al.

CVPR 2024posterarXiv:2403.01807

ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers

Jinke Li, Xiao He, Chonghua Zhou et al.

ECCV 2024posterarXiv:2405.04299
26
citations

View From Above: Orthogonal-View aware Cross-view Localization

Shan Wang, Chuong Nguyen, Jiawei Liu et al.

CVPR 2024poster

ViewFusion: Towards Multi-View Consistency via Interpolated Denoising

Xianghui Yang, Gil Avraham, Yan Zuo et al.

CVPR 2024poster

Viewing Transformers Through the Lens of Long Convolutions Layers

Itamar Zimerman, Lior Wolf

ICML 2024poster

Viewpoint-Aware Visual Grounding in 3D Scenes

Xiangxi Shi, Zhonghua Wu, Stefan Lee

CVPR 2024poster

Viewpoint textual inversion: discovering scene representations and 3D view control in 2D diffusion models

James Burgess, Kuan-Chieh Wang, Serena Yeung-Levy

ECCV 2024posterarXiv:2309.07986
6
citations