"video large language models" Papers

14 papers found

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Rui Qian, Shuangrui Ding, Xiaoyi Dong et al.

CVPR 2025posterarXiv:2501.03218
31
citations

FastVID: Dynamic Density Pruning for Fast Video Large Language Models

Leqi Shen, Guoqiang Gong, Tao He et al.

NeurIPS 2025oralarXiv:2503.11187
16
citations

HoliTom: Holistic Token Merging for Fast Video Large Language Models

Kele Shao, Keda TAO, Can Qin et al.

NeurIPS 2025oralarXiv:2505.21334
18
citations

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Joya Chen, Yiqi Lin, Ziyun Zeng et al.

CVPR 2025posterarXiv:2504.16030
4
citations

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos

Tiantian Geng, Jinrui Zhang, Qingni Wang et al.

CVPR 2025posterarXiv:2411.19772
32
citations

MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models

Hengzhi Li, Megan Tjandrasuwita, Yi R. (May) Fung et al.

NeurIPS 2025posterarXiv:2502.16671
7
citations

Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering

JIANFENG CAI, Jiale Hong, Zongmeng Zhang et al.

NeurIPS 2025oralarXiv:2505.12826
1
citations

Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs

Jeongseok Hyun, Sukjun Hwang, Su Ho Han et al.

ICCV 2025posterarXiv:2507.07990
12
citations

PAVE: Patching and Adapting Video Large Language Models

Zhuoming Liu, Yiquan Li, Khoi D Nguyen et al.

CVPR 2025posterarXiv:2503.19794
1
citations

PVChat: Personalized Video Chat with One-Shot Learning

YUFEI SHI, Weilong Yan, Gang Xu et al.

ICCV 2025posterarXiv:2503.17069
2
citations

Temporal Reasoning Transfer from Text to Video

Lei Li, Yuanxin Liu, Linli Yao et al.

ICLR 2025oralarXiv:2410.06166
20
citations

VideoOrion: Tokenizing Object Dynamics in Videos

Yicheng Feng, Yijiang Li, Wanpeng Zhang et al.

ICCV 2025posterarXiv:2411.16156
8
citations

VQToken: Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models

Haichao Zhang, Yun Fu

NeurIPS 2025oralarXiv:2503.16980
3
citations

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

Long Qian, Juncheng Li, Yu Wu et al.

ICML 2024oral