2025 Spotlight "vision-language models" Papers
10 papers found
Approximate Domain Unlearning for Vision-Language Models
Kodai Kawamura, Yuta Goto, Rintaro Yanagi et al.
NeurIPS 2025spotlightarXiv:2510.08132
Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment
Bryan Sangwoo Kim, Jeongsol Kim, Jong Chul Ye
NeurIPS 2025spotlightarXiv:2505.18600
Conditional Representation Learning for Customized Tasks
Honglin Liu, Chao Sun, Peng Hu et al.
NeurIPS 2025spotlightarXiv:2510.04564
CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays
Hyungyung Lee, Geon Choi, Jung-Oh Lee et al.
NeurIPS 2025spotlightarXiv:2505.18087
3
citations
LaViDa: A Large Diffusion Model for Vision-Language Understanding
Shufan Li, Konstantinos Kallidromitis, Hritik Bansal et al.
NeurIPS 2025spotlight
OpenCUA: Open Foundations for Computer-Use Agents
Xinyuan Wang, Bowen Wang, Dunjie Lu et al.
NeurIPS 2025spotlightarXiv:2508.09123
31
citations
OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts
Shiting (Ginny) Xiao, Rishabh Kabra, Yuhang Li et al.
NeurIPS 2025spotlightarXiv:2507.05427
2
citations
QSVD: Efficient Low-rank Approximation for Unified Query-Key-Value Weight Compression in Low-Precision Vision-Language Models
Yutong Wang, Haiyu Wang, Sai Qian Zhang
NeurIPS 2025spotlightarXiv:2510.16292
1
citations
Vision-centric Token Compression in Large Language Model
Ling Xing, Alex Jinpeng Wang, Rui Yan et al.
NeurIPS 2025spotlightarXiv:2502.00791
7
citations
Vision Transformers Don't Need Trained Registers
Nicholas Jiang, Amil Dravid, Alexei Efros et al.
NeurIPS 2025spotlightarXiv:2506.08010
12
citations