2025 "vision-language model" Papers
4 papers found
Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification
Dongseob Kim, Hyunjung Shim
CVPR 2025posterarXiv:2503.16873
Image as a World: Generating Interactive World from Single Image via Panoramic Video Generation
Dongnan Gui, Xun Guo, Wengang Zhou et al.
NeurIPS 2025oral
1
citations
ImgEdit: A Unified Image Editing Dataset and Benchmark
Yang Ye, Xianyi He, Zongjian Li et al.
NeurIPS 2025posterarXiv:2505.20275
84
citations
TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision
Ayush Gupta, Anirban Roy, Rama Chellappa et al.
ICCV 2025posterarXiv:2506.09445