"image captioning" Papers

15 papers found

BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts

Divya Jyoti Bajpai, Manjesh Kumar Hanawal

ICLR 2025posterarXiv:2502.00745
3
citations

Controlling Multimodal LLMs via Reward-guided Decoding

Oscar Mañas, Pierluca D'Oro, Koustuv Sinha et al.

ICCV 2025posterarXiv:2508.11616

Edit Flows: Variable Length Discrete Flow Matching with Sequence-Level Edit Operations

Marton Havasi, Brian Karrer, Itai Gat et al.

NeurIPS 2025poster

Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution

Qihao Liu, Xi Yin, Alan L. Yuille et al.

CVPR 2025highlightarXiv:2412.15213
10
citations

INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling

Xin Dong, Shichao Dong, Jin Wang et al.

ICCV 2025posterarXiv:2507.05056
3
citations

Vision‑Language‑Vision Auto‑Encoder: Scalable Knowledge Distillation from Diffusion Models

Tiezheng Zhang, Yitong Li, Yu-Cheng Chou et al.

NeurIPS 2025posterarXiv:2507.07104
2
citations

Cycle-Consistency Learning for Captioning and Grounding

Ning Wang, Jiajun Deng, Mingbo Jia

AAAI 2024paperarXiv:2312.15162
13
citations

Differentially Private Representation Learning via Image Captioning

Tom Sander, Yaodong Yu, Maziar Sanjabi et al.

ICML 2024poster

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Guez Aflalo et al.

ECCV 2024posterarXiv:2404.01197

Image Captioning with Multi-Context Synthetic Data

AAAI 2024paperarXiv:2305.18072

Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks

Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens

ECCV 2024posterarXiv:2403.09377
4
citations

LookupViT: Compressing visual information to a limited number of tokens

Rajat Koner, Gagan Jain, Sujoy Paul et al.

ECCV 2024posterarXiv:2407.12753
15
citations

Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models

Didi Zhu, Zhongyi Sun, Zexi Li et al.

ICML 2024poster

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

Ziping Ma, Furong Xu, Jian liu et al.

ICML 2024poster

TrojVLM: Backdoor Attack Against Vision Language Models

Weimin Lyu, Lu Pang, Tengfei Ma et al.

ECCV 2024posterarXiv:2409.19232
23
citations