"image captioning" Papers
15 papers found
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts
Divya Jyoti Bajpai, Manjesh Kumar Hanawal
ICLR 2025posterarXiv:2502.00745
3
citations
Controlling Multimodal LLMs via Reward-guided Decoding
Oscar Mañas, Pierluca D'Oro, Koustuv Sinha et al.
ICCV 2025posterarXiv:2508.11616
Edit Flows: Variable Length Discrete Flow Matching with Sequence-Level Edit Operations
Marton Havasi, Brian Karrer, Itai Gat et al.
NeurIPS 2025poster
Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution
Qihao Liu, Xi Yin, Alan L. Yuille et al.
CVPR 2025highlightarXiv:2412.15213
10
citations
INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling
Xin Dong, Shichao Dong, Jin Wang et al.
ICCV 2025posterarXiv:2507.05056
3
citations
Vision‑Language‑Vision Auto‑Encoder: Scalable Knowledge Distillation from Diffusion Models
Tiezheng Zhang, Yitong Li, Yu-Cheng Chou et al.
NeurIPS 2025posterarXiv:2507.07104
2
citations
Cycle-Consistency Learning for Captioning and Grounding
Ning Wang, Jiajun Deng, Mingbo Jia
AAAI 2024paperarXiv:2312.15162
13
citations
Differentially Private Representation Learning via Image Captioning
Tom Sander, Yaodong Yu, Maziar Sanjabi et al.
ICML 2024poster
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Guez Aflalo et al.
ECCV 2024posterarXiv:2404.01197
Image Captioning with Multi-Context Synthetic Data
AAAI 2024paperarXiv:2305.18072
Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks
Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens
ECCV 2024posterarXiv:2403.09377
4
citations
LookupViT: Compressing visual information to a limited number of tokens
Rajat Koner, Gagan Jain, Sujoy Paul et al.
ECCV 2024posterarXiv:2407.12753
15
citations
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models
Didi Zhu, Zhongyi Sun, Zexi Li et al.
ICML 2024poster
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment
Ziping Ma, Furong Xu, Jian liu et al.
ICML 2024poster
TrojVLM: Backdoor Attack Against Vision Language Models
Weimin Lyu, Lu Pang, Tengfei Ma et al.
ECCV 2024posterarXiv:2409.19232
23
citations