2025 Spotlight "vision-language models" Papers

10 papers found

Approximate Domain Unlearning for Vision-Language Models

Kodai Kawamura, Yuta Goto, Rintaro Yanagi et al.

NeurIPS 2025spotlightarXiv:2510.08132

Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

Bryan Sangwoo Kim, Jeongsol Kim, Jong Chul Ye

NeurIPS 2025spotlightarXiv:2505.18600

Conditional Representation Learning for Customized Tasks

Honglin Liu, Chao Sun, Peng Hu et al.

NeurIPS 2025spotlightarXiv:2510.04564

CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays

Hyungyung Lee, Geon Choi, Jung-Oh Lee et al.

NeurIPS 2025spotlightarXiv:2505.18087
3
citations

LaViDa: A Large Diffusion Model for Vision-Language Understanding

Shufan Li, Konstantinos Kallidromitis, Hritik Bansal et al.

NeurIPS 2025spotlight

OpenCUA: Open Foundations for Computer-Use Agents

Xinyuan Wang, Bowen Wang, Dunjie Lu et al.

NeurIPS 2025spotlightarXiv:2508.09123
31
citations

OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts

Shiting (Ginny) Xiao, Rishabh Kabra, Yuhang Li et al.

NeurIPS 2025spotlightarXiv:2507.05427
2
citations

QSVD: Efficient Low-rank Approximation for Unified Query-Key-Value Weight Compression in Low-Precision Vision-Language Models

Yutong Wang, Haiyu Wang, Sai Qian Zhang

NeurIPS 2025spotlightarXiv:2510.16292
1
citations

Vision-centric Token Compression in Large Language Model

Ling Xing, Alex Jinpeng Wang, Rui Yan et al.

NeurIPS 2025spotlightarXiv:2502.00791
7
citations

Vision Transformers Don't Need Trained Registers

Nicholas Jiang, Amil Dravid, Alexei Efros et al.

NeurIPS 2025spotlightarXiv:2506.08010
12
citations