Paper "visual question answering" Papers
9 papers found
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Wenbo Hu, Yifan Xu, Yi Li et al.
AAAI 2024paperarXiv:2308.09936
190
citations
BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining
Minjun Kim, SeungWoo Song, Youhan Lee et al.
AAAI 2024paperarXiv:2401.06443
9
citations
Detecting and Preventing Hallucinations in Large Vision Language Models
Anisha Gunjal, Jihan Yin, Erhan Bas
AAAI 2024paperarXiv:2308.06394
256
citations
Detection-Based Intermediate Supervision for Visual Question Answering
Yuhang Liu, Daowan Peng, Wei Wei et al.
AAAI 2024paperarXiv:2312.16012
3
citations
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen, Longteng Guo, Jia Sun et al.
AAAI 2024paperarXiv:2308.11971
20
citations
Image Content Generation with Causal Reasoning
Xiaochuan Li, Baoyu Fan, Run Zhang et al.
AAAI 2024paperarXiv:2312.07132
Interactive Visual Task Learning for Robots
AAAI 2024paperarXiv:2312.13219
NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving
Tianwen Qian, Jingjing Chen, Linhai Zhuo et al.
AAAI 2024paperarXiv:2305.14836
266
citations
Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA
Chengen Lai, Shengli Song, Shiqi Meng et al.
AAAI 2024paperarXiv:2312.13594
9
citations