Poster "video understanding" Papers

25 papers found

Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model

Benlin Liu, Yuhao Dong, Yiqin Wang et al.

CVPR 2025posterarXiv:2408.00754
9
citations

Glance2Gaze: Efficient Vision-Language Models from Glance Fusion to Gaze Compression

Juan Chen, Honglin liu, Yingying Ao et al.

NeurIPS 2025poster

Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models

Zhihang Liu, Chen-Wei Xie, Pandeng Li et al.

CVPR 2025posterarXiv:2503.16036
14
citations

Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors

Duo Zheng, shijia Huang, Yanyang Li et al.

NeurIPS 2025posterarXiv:2505.24625
24
citations

Multiple Object Tracking as ID Prediction

Ruopeng Gao, Ji Qi, Limin Wang

CVPR 2025posterarXiv:2403.16848
53
citations

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Miran Heo, Min-Hung Chen, De-An Huang et al.

CVPR 2025posterarXiv:2501.08326
9
citations

ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding

Yiyang Zhou, Yangfan He, Yaofeng Su et al.

NeurIPS 2025posterarXiv:2506.01300
28
citations

The Devil is in the Spurious Correlations: Boosting Moment Retrieval with Dynamic Learning

Xinyang Zhou, Fanyue Wei, Lixin Duan et al.

ICCV 2025posterarXiv:2501.07305

TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced Relation-aware Knowledge Transferring

Zhu Xu, Ting Lei, Zhimin Li et al.

ICCV 2025posterarXiv:2508.04943

VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding

Zongxia Li, Xiyang Wu, Guangyao Shi et al.

NeurIPS 2025posterarXiv:2505.01481
13
citations

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Liang Chen, Haozhe Zhao, Tianyu Liu et al.

ECCV 2024posterarXiv:2403.06764
343
citations

ContPhy: Continuum Physical Concept Learning and Reasoning from Videos

Zhicheng Zheng, Xin Yan, Zhenfang Chen et al.

ICML 2024poster

DEVIAS: Learning Disentangled Video Representations of Action and Scene

Kyungho Bae, Youngrae Kim, Geo Ahn et al.

ECCV 2024posterarXiv:2312.00826
6
citations

Fine-grained Dynamic Network for Generic Event Boundary Detection

Ziwei Zheng, Lijun He, Le Yang et al.

ECCV 2024posterarXiv:2407.04274
2
citations

FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition

Ishan Rajendrakumar Dave, Mamshad Nayeem Rizve, Shah Mubarak

ECCV 2024posterarXiv:2409.01448
5
citations

Learning Video Context as Interleaved Multimodal Sequences

Qinghong Lin, Pengchuan Zhang, Difei Gao et al.

ECCV 2024posterarXiv:2407.21757
12
citations

LongVLM: Efficient Long Video Understanding via Large Language Models

Yuetian Weng, Mingfei Han, Haoyu He et al.

ECCV 2024posterarXiv:2404.03384
128
citations

Open Vocabulary Multi-Label Video Classification

Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan et al.

ECCV 2024posterarXiv:2407.09073
5
citations

Self-Supervised Any-Point Tracking by Contrastive Random Walks

Ayush Shrivastava, Andrew Owens

ECCV 2024posterarXiv:2409.16288
11
citations

Semantically Guided Representation Learning For Action Anticipation

Anxhelo Diko, Danilo Avola, Bardh Prenkaj et al.

ECCV 2024posterarXiv:2407.02309
6
citations

Stepwise Multi-grained Boundary Detector for Point-supervised Temporal Action Localization

Mengnan Liu, Le Wang, Sanping Zhou et al.

ECCV 2024poster
3
citations

Text-Conditioned Resampler For Long Form Video Understanding

Bruno Korbar, Yongqin Xian, Alessio Tonioni et al.

ECCV 2024posterarXiv:2312.11897
24
citations

Vamos: Versatile Action Models for Video Understanding

Shijie Wang, Qi Zhao, Minh Quan et al.

ECCV 2024posterarXiv:2311.13627
36
citations

VideoMamba: State Space Model for Efficient Video Understanding

Kunchang Li, Xinhao Li, Yi Wang et al.

ECCV 2024posterarXiv:2403.06977
401
citations

VideoPrism: A Foundational Visual Encoder for Video Understanding

Long Zhao, Nitesh Bharadwaj Gundavarapu, Liangzhe Yuan et al.

ICML 2024poster