2025 "visual question answering" Papers
15 papers found
Acknowledging Focus Ambiguity in Visual Questions
Chongyan Chen, Yu-Yun Tseng, Zhuoheng Li et al.
ICCV 2025posterarXiv:2501.02201
2
citations
Ask and Remember: A Questions-Only Replay Strategy for Continual Visual Question Answering
Imad Eddine MAROUF, Enzo Tartaglione, Stéphane Lathuilière et al.
ICCV 2025posterarXiv:2502.04469
1
citations
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
Tianyu Huai, Jie Zhou, Xingjiao Wu et al.
CVPR 2025highlightarXiv:2503.00413
10
citations
CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology
Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.
CVPR 2025posterarXiv:2412.12077
22
citations
EndoBench: A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis
Shengyuan Liu, Boyun Zheng, Wenting Chen et al.
NeurIPS 2025posterarXiv:2505.23601
9
citations
End-to-End Multi-Modal Diffusion Mamba
Chunhao Lu, Qiang Lu, Meichen Dong et al.
ICCV 2025posterarXiv:2510.13253
3
citations
INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling
Xin Dong, Shichao Dong, Jin Wang et al.
ICCV 2025posterarXiv:2507.05056
3
citations
LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content
Nimrod Shabtay, Felipe Maia Polo, Sivan Doveh et al.
ICLR 2025posterarXiv:2410.10783
11
citations
mmWalk: Towards Multi-modal Multi-view Walking Assistance
Kedi Ying, Ruiping Liu, Chongyan Chen et al.
NeurIPS 2025posterarXiv:2510.11520
Scaling Language-Free Visual Representation Learning
David Fan, Shengbang Tong, Jiachen Zhu et al.
ICCV 2025highlightarXiv:2504.01017
39
citations
Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding
feilong tang, Chengzhi Liu, Zhongxing Xu et al.
CVPR 2025posterarXiv:2505.16652
22
citations
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
Divya Velayudhan, Abdelfatah Ahmed, Mohamad Alansari et al.
CVPR 2025highlightarXiv:2504.02823
2
citations
TaiwanVQA: Benchmarking and Enhancing Cultural Understanding in Vision-Language Models
Hsin Yi Hsieh, Shang-Wei Liu, Chang-Chih Meng et al.
NeurIPS 2025poster
TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data
Jeremy Irvin, Emily Liu, Joyce Chen et al.
ICLR 2025oralarXiv:2410.06234
45
citations
WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios
Eun Chang, Zhuangqun Huang, Yiwei Liao et al.
NeurIPS 2025posterarXiv:2511.22154