2024 "vision-language understanding" Papers
2 papers found
Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning
Jiachen Li, Qiaozi Gao, Michael Johnston et al.
ICML 2024posterarXiv:2310.09676
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
Swetha Sirnam, Jinyu Yang, Tal Neiman et al.
ECCV 2024posterarXiv:2407.13851
10
citations