ICCV Poster "multi-modal large language models" Papers
4 papers found
Aligning Effective Tokens with Video Anomaly in Large Language Models
YINGXIAN Chen, Jiahui Liu, Ruidi Fan et al.
ICCV 2025posterarXiv:2508.06350
1
citations
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan, Zining Wang, Pei Fu et al.
ICCV 2025posterarXiv:2503.02304
4
citations
DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
Xiaoyi Bao, Chen-Wei Xie, Hao Tang et al.
ICCV 2025posterarXiv:2507.15569
1
citations
VideoAds for Fast-Paced Video Understanding
Zheyuan Zhang, Wanying Dou, Linkai Peng et al.
ICCV 2025posterarXiv:2504.09282
2
citations