"multimodal language models" Papers
4 papers found
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
Benlin Liu, Yuhao Dong, Yiqin Wang et al.
CVPR 2025posterarXiv:2408.00754
9
citations
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Fangxun Shu, Yue Liao, Lei Zhang et al.
ICLR 2025posterarXiv:2408.15881
34
citations
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Xuehai He, Weixi Feng, Kaizhi Zheng et al.
ICLR 2025posterarXiv:2406.08407
34
citations
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Juan Rodriguez, Abhay Puri, Shubham Agarwal et al.
CVPR 2025posterarXiv:2312.11556
30
citations