"vision-language understanding" Papers
4 papers found
Anyprefer: An Agentic Framework for Preference Data Synthesis
Yiyang Zhou, Zhaoyang Wang, Tianle Wang et al.
ICLR 2025posterarXiv:2504.19276
10
citations
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
Yuxuan Cai, Jiangning Zhang, Haoyang He et al.
ICCV 2025posterarXiv:2410.16236
23
citations
SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs
Jinhong Deng, Wen Li, Joey Tianyi Zhou et al.
NeurIPS 2025posterarXiv:2510.24214
Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning
Jiachen Li, Qiaozi Gao, Michael Johnston et al.
ICML 2024poster