"vision-language model" Papers
8 papers found
Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification
Dongseob Kim, Hyunjung Shim
CVPR 2025posterarXiv:2503.16873
Image as a World: Generating Interactive World from Single Image via Panoramic Video Generation
Dongnan Gui, Xun Guo, Wengang Zhou et al.
NeurIPS 2025oral
1
citations
TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision
Ayush Gupta, Anirban Roy, Rama Chellappa et al.
ICCV 2025posterarXiv:2506.09445
Bottom-Up Domain Prompt Tuning for Generalized Face Anti-Spoofing
SI-QI LIU, Qirui Wang, Pong Chi Yuen
ECCV 2024poster
8
citations
Dolphins: Multimodal Language Model for Driving
Yingzi Ma, Yulong Cao, Jiachen Sun et al.
ECCV 2024posterarXiv:2312.00438
126
citations
Image Fusion via Vision-Language Model
Zixiang Zhao, Lilun Deng, Haowen Bai et al.
ICML 2024poster
PALM: Predicting Actions through Language Models
Sanghwan Kim, Daoji Huang, Yongqin Xian et al.
ECCV 2024posterarXiv:2311.17944
22
citations
Retrieval Across Any Domains via Large-scale Pre-trained Model
Jiexi Yan, Zhihui Yin, Chenghao Xu et al.
ICML 2024poster