"multi-modal large language models" Papers
9 papers found
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Dohwan Ko, Ji Soo Lee, Minhyuk Choi et al.
ICCV 2025highlightarXiv:2507.23284
1
citations
EndoBench: A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis
Shengyuan Liu, Boyun Zheng, Wenting Chen et al.
NeurIPS 2025posterarXiv:2505.23601
9
citations
HOComp: Interaction-Aware Human-Object Composition
Dong Liang, Jinyuan Jia, Yuhao LIU et al.
NeurIPS 2025posterarXiv:2507.16813
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye, Haiyang Xu, Haowei Liu et al.
ICLR 2025posterarXiv:2408.04840
237
citations
VideoAds for Fast-Paced Video Understanding
Zheyuan Zhang, Wanying Dou, Linkai Peng et al.
ICCV 2025posterarXiv:2504.09282
2
citations
V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Zhengrong Yue, Shaobin Zhuang, Kunchang Li et al.
CVPR 2025posterarXiv:2503.12077
5
citations
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models
Didi Zhu, Zhongyi Sun, Zexi Li et al.
ICML 2024poster
SegPoint: Segment Any Point Cloud via Large Language Model
Shuting He, Henghui Ding, Xudong Jiang et al.
ECCV 2024posterarXiv:2407.13761
37
citations
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Dongyang Liu, Renrui Zhang, Longtian Qiu et al.
ICML 2024poster