"video large language models" Papers
14 papers found
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Rui Qian, Shuangrui Ding, Xiaoyi Dong et al.
CVPR 2025posterarXiv:2501.03218
31
citations
FastVID: Dynamic Density Pruning for Fast Video Large Language Models
Leqi Shen, Guoqiang Gong, Tao He et al.
NeurIPS 2025oralarXiv:2503.11187
16
citations
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Kele Shao, Keda TAO, Can Qin et al.
NeurIPS 2025oralarXiv:2505.21334
18
citations
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Joya Chen, Yiqi Lin, Ziyun Zeng et al.
CVPR 2025posterarXiv:2504.16030
4
citations
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
Tiantian Geng, Jinrui Zhang, Qingni Wang et al.
CVPR 2025posterarXiv:2411.19772
32
citations
MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models
Hengzhi Li, Megan Tjandrasuwita, Yi R. (May) Fung et al.
NeurIPS 2025posterarXiv:2502.16671
7
citations
Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering
JIANFENG CAI, Jiale Hong, Zongmeng Zhang et al.
NeurIPS 2025oralarXiv:2505.12826
1
citations
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
Jeongseok Hyun, Sukjun Hwang, Su Ho Han et al.
ICCV 2025posterarXiv:2507.07990
12
citations
PAVE: Patching and Adapting Video Large Language Models
Zhuoming Liu, Yiquan Li, Khoi D Nguyen et al.
CVPR 2025posterarXiv:2503.19794
1
citations
PVChat: Personalized Video Chat with One-Shot Learning
YUFEI SHI, Weilong Yan, Gang Xu et al.
ICCV 2025posterarXiv:2503.17069
2
citations
Temporal Reasoning Transfer from Text to Video
Lei Li, Yuanxin Liu, Linli Yao et al.
ICLR 2025oralarXiv:2410.06166
20
citations
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng, Yijiang Li, Wanpeng Zhang et al.
ICCV 2025posterarXiv:2411.16156
8
citations
VQToken: Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models
Haichao Zhang, Yun Fu
NeurIPS 2025oralarXiv:2503.16980
3
citations
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Long Qian, Juncheng Li, Yu Wu et al.
ICML 2024oral