"multimodal agents" Papers
3 papers found
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
CHEN CHEN, Yuchen Hu, Siyin Wang et al.
ICLR 2025posterarXiv:2501.17202
22
citations
MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents
Lukas Aichberger, Alasdair Paren, Guohao Li et al.
NeurIPS 2025posterarXiv:2503.10809
10
citations
WebVLN: Vision-and-Language Navigation on Websites
Qi Chen, Dileepa Pitawela, Chongyang Zhao et al.
AAAI 2024paperarXiv:2312.15820
19
citations