"multimodal tasks" Papers
6 papers found
DAMO: Decoding by Accumulating Activations Momentum for Mitigating Hallucinations in Vision-Language Models
Kaishen Wang, Hengrui Gu, Meijun Gao et al.
ICLR 2025poster
7
citations
HoPE: Hybrid of Position Embedding for Long Context Vision-Language Models
Haoran Li, Yingjie Qin, Baoyuan Ou et al.
NeurIPS 2025oralarXiv:2505.20444
2
citations
How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?
Seongyun Lee, Geewook Kim, Jiyeon Kim et al.
ICLR 2025posterarXiv:2410.07571
4
citations
Teaching Human Behavior Improves Content Understanding Abilities Of VLMs
SOMESH SINGH, Harini S I, Yaman Singla et al.
ICLR 2025poster
2
citations
KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning
Debjyoti Mondal, Suraj Modi, Subhadarshi Panda et al.
AAAI 2024paperarXiv:2401.12863
78
citations
Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models
Hao Cheng, Erjia Xiao, Jindong Gu et al.
ECCV 2024posterarXiv:2402.19150
15
citations