2025 Oral "vision-language models" Papers

14 papers found

CrypticBio: A Large Multimodal Dataset for Visually Confusing Species

Georgiana Manolache, Gerard Schouten, Joaquin Vanschoren

NeurIPS 2025oral

Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning

Ankan Deria, Adinath Dukre, feilong tang et al.

NeurIPS 2025oralarXiv:2506.15649

Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency

Xiangyu Guo, Zhanqian Wu, Kaixin Xiong et al.

NeurIPS 2025oralarXiv:2506.07497
8
citations

HyperET: Efficient Training in Hyperbolic Space for Multi-modal Large Language Models

Zelin Peng, Zhengqin Xu, Qingyang Liu et al.

NeurIPS 2025oralarXiv:2510.20322
1
citations

Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination

Rakshit Trivedi, Kartik Sharma, David Parkes

NeurIPS 2025oral

Multi-scale Temporal Prediction via Incremental Generation and Multi-agent Collaboration

Zhitao Zeng, Guojian Yuan, Junyuan Mao et al.

NeurIPS 2025oralarXiv:2509.17429

NOVA: A Benchmark for Rare Anomaly Localization and Clinical Reasoning in Brain MRI

Cosmin Bercea, Jun Li, Philipp Raffler et al.

NeurIPS 2025oral

One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding

Zheyu Zhang, Ziqi Pang, Shixing Chen et al.

NeurIPS 2025oral

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Jang Hyun Cho, Andrea Madotto, Effrosyni Mavroudi et al.

NeurIPS 2025oralarXiv:2504.13180
40
citations

RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events

Zhenyuan Chen, Chenxi Wang, Ningyu Zhang et al.

NeurIPS 2025oralarXiv:2509.01907
2
citations

Semantic Temporal Abstraction via Vision-Language Model Guidance for Efficient Reinforcement Learning

Tian-Shuo Liu, Xu-Hui Liu, Ruifeng Chen et al.

ICLR 2025oral

STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving

Christian Fruhwirth-Reisinger, Dušan Malić, Wei Lin et al.

NeurIPS 2025oralarXiv:2506.06218
4
citations

TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models

Makoto Shing, Kou Misaki, Han Bao et al.

ICLR 2025oralarXiv:2501.16937
12
citations

Temporal Chain of Thought: Long-Video Understanding by Thinking in Frames

Anurag Arnab, Ahmet Iscen, Mathilde Caron et al.

NeurIPS 2025oralarXiv:2507.02001
8
citations