ICCV "multi-modal large language models" Papers
4 papers found
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan, Zining Wang, Pei Fu et al.
ICCV 2025posterarXiv:2503.02304
4
citations
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Dohwan Ko, Ji Soo Lee, Minhyuk Choi et al.
ICCV 2025highlightarXiv:2507.23284
1
citations
DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
Xiaoyi Bao, Chen-Wei Xie, Hao Tang et al.
ICCV 2025posterarXiv:2507.15569
1
citations
VideoAds for Fast-Paced Video Understanding
Zheyuan Zhang, Wanying Dou, Linkai Peng et al.
ICCV 2025posterarXiv:2504.09282
2
citations