"multi-modal models" Papers
3 papers found
LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content
Nimrod Shabtay, Felipe Maia Polo, Sivan Doveh et al.
ICLR 2025posterarXiv:2410.10783
11
citations
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
Ye Liu, Zongyang Ma, Junfu Pu et al.
NeurIPS 2025posterarXiv:2509.18094
4
citations
Think before Placement: Common Sense Enhanced Transformer for Object Placement
Yaxuan Qin, Jiayu Xu, Ruiping Wang et al.
ECCV 2024poster