"video-language understanding" Papers
3 papers found
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari et al.
CVPR 2025posterarXiv:2503.02175
48
citations
Sim-DETR: Unlock DETR for Temporal Sentence Grounding
Jiajin Tang, Zhengxuan Wei, Yuchen Zhu et al.
ICCV 2025posterarXiv:2509.23867
2
citations
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models
Jinhui Yi, Syed Talal Wasim, Yanan Luo et al.
CVPR 2025posterarXiv:2412.18609
1
citations