Oral "vision-language models" Papers
5 papers found
CrypticBio: A Large Multimodal Dataset for Visually Confusing Species
Georgiana Manolache, Gerard Schouten, Joaquin Vanschoren
NeurIPS 2025oral
One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding
Zheyu Zhang, Ziqi Pang, Shixing Chen et al.
NeurIPS 2025oral
RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events
Zhenyuan Chen, Chenxi Wang, Ningyu Zhang et al.
NeurIPS 2025oralarXiv:2509.01907
2
citations
Semantic Temporal Abstraction via Vision-Language Model Guidance for Efficient Reinforcement Learning
Tian-Shuo Liu, Xu-Hui Liu, Ruifeng Chen et al.
ICLR 2025oral
Temporal Chain of Thought: Long-Video Understanding by Thinking in Frames
Anurag Arnab, Ahmet Iscen, Mathilde Caron et al.
NeurIPS 2025oralarXiv:2507.02001
8
citations