Poster "image captioning" Papers
14 papers found
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
Haicheng Wang, Chen Ju, Weixiong Lin et al.
CVPR 2025posterarXiv:2412.00440
9
citations
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts
Divya Jyoti Bajpai, Manjesh Kumar Hanawal
ICLR 2025posterarXiv:2502.00745
3
citations
Controlling Multimodal LLMs via Reward-guided Decoding
Oscar Mañas, Pierluca D'Oro, Koustuv Sinha et al.
ICCV 2025posterarXiv:2508.11616
Edit Flows: Variable Length Discrete Flow Matching with Sequence-Level Edit Operations
Marton Havasi, Brian Karrer, Itai Gat et al.
NeurIPS 2025poster
INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling
Xin Dong, Shichao Dong, Jin Wang et al.
ICCV 2025posterarXiv:2507.05056
3
citations
Vision‑Language‑Vision Auto‑Encoder: Scalable Knowledge Distillation from Diffusion Models
Tiezheng Zhang, Yitong Li, Yu-Cheng Chou et al.
NeurIPS 2025posterarXiv:2507.07104
2
citations
DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism
Zhen Wang, Xinyun Jiang, Jun Xiao et al.
ECCV 2024posterarXiv:2311.14920
4
citations
Differentially Private Representation Learning via Image Captioning
Tom Sander, Yaodong Yu, Maziar Sanjabi et al.
ICML 2024poster
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Guez Aflalo et al.
ECCV 2024posterarXiv:2404.01197
Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks
Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens
ECCV 2024posterarXiv:2403.09377
4
citations
LookupViT: Compressing visual information to a limited number of tokens
Rajat Koner, Gagan Jain, Sujoy Paul et al.
ECCV 2024posterarXiv:2407.12753
15
citations
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models
Didi Zhu, Zhongyi Sun, Zexi Li et al.
ICML 2024poster
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment
Ziping Ma, Furong Xu, Jian liu et al.
ICML 2024poster
TrojVLM: Backdoor Attack Against Vision Language Models
Weimin Lyu, Lu Pang, Tengfei Ma et al.
ECCV 2024posterarXiv:2409.19232
23
citations