2025 Oral "vision-language models" Papers
14 papers found
CrypticBio: A Large Multimodal Dataset for Visually Confusing Species
Georgiana Manolache, Gerard Schouten, Joaquin Vanschoren
NeurIPS 2025oral
Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning
Ankan Deria, Adinath Dukre, feilong tang et al.
NeurIPS 2025oralarXiv:2506.15649
Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency
Xiangyu Guo, Zhanqian Wu, Kaixin Xiong et al.
NeurIPS 2025oralarXiv:2506.07497
8
citations
HyperET: Efficient Training in Hyperbolic Space for Multi-modal Large Language Models
Zelin Peng, Zhengqin Xu, Qingyang Liu et al.
NeurIPS 2025oralarXiv:2510.20322
1
citations
Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination
Rakshit Trivedi, Kartik Sharma, David Parkes
NeurIPS 2025oral
Multi-scale Temporal Prediction via Incremental Generation and Multi-agent Collaboration
Zhitao Zeng, Guojian Yuan, Junyuan Mao et al.
NeurIPS 2025oralarXiv:2509.17429
NOVA: A Benchmark for Rare Anomaly Localization and Clinical Reasoning in Brain MRI
Cosmin Bercea, Jun Li, Philipp Raffler et al.
NeurIPS 2025oral
One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding
Zheyu Zhang, Ziqi Pang, Shixing Chen et al.
NeurIPS 2025oral
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Jang Hyun Cho, Andrea Madotto, Effrosyni Mavroudi et al.
NeurIPS 2025oralarXiv:2504.13180
40
citations
RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events
Zhenyuan Chen, Chenxi Wang, Ningyu Zhang et al.
NeurIPS 2025oralarXiv:2509.01907
2
citations
Semantic Temporal Abstraction via Vision-Language Model Guidance for Efficient Reinforcement Learning
Tian-Shuo Liu, Xu-Hui Liu, Ruifeng Chen et al.
ICLR 2025oral
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving
Christian Fruhwirth-Reisinger, Dušan Malić, Wei Lin et al.
NeurIPS 2025oralarXiv:2506.06218
4
citations
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Makoto Shing, Kou Misaki, Han Bao et al.
ICLR 2025oralarXiv:2501.16937
12
citations
Temporal Chain of Thought: Long-Video Understanding by Thinking in Frames
Anurag Arnab, Ahmet Iscen, Mathilde Caron et al.
NeurIPS 2025oralarXiv:2507.02001
8
citations