2025 Oral "multimodal foundation models" Papers
2 papers found
TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models
Ziyao Shangguan, Chuhan Li, Yuxuan Ding et al.
ICLR 2025oralarXiv:2410.23266
36
citations
VITRIX-UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao, Yiyang Gan, Bairui Wang et al.
NEURIPS 2025oral
3
citations