"multimodal understanding" Papers
13 papers found
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Kim Sung-Bin, Oh Hyun-Bin, Lee Jung-Mok et al.
ICLR 2025posterarXiv:2410.18325
17
citations
Can LLMs Understand Time Series Anomalies?
Zihao Zhou, Rose Yu
ICLR 2025posterarXiv:2410.05440
32
citations
HMVLM:Human Motion-Vision-Language Model via MoE LoRA
Lei Hu, Yongjing Ye, Shihong Xia
NeurIPS 2025poster
LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling
Li Huaqiu, Yong Wang, Tongwen Huang et al.
ICCV 2025posterarXiv:2507.00790
3
citations
MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding
Rongchang Xie, Chen Du, Ping Song et al.
ICCV 2025posterarXiv:2411.17762
25
citations
One Head to Rule Them All: Amplifying LVLM Safety through a Single Critical Attention Head
Junhao Xia, Haotian Zhu, Shuchao Pang et al.
NeurIPS 2025poster
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Jinheng Xie, Weijia Mao, Zechen Bai et al.
ICLR 2025posterarXiv:2408.12528
455
citations
Teaching Human Behavior Improves Content Understanding Abilities Of VLMs
SOMESH SINGH, Harini S I, Yaman Singla et al.
ICLR 2025poster
2
citations
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
Liao Qu, Huichao Zhang, Yiheng Liu et al.
CVPR 2025posterarXiv:2412.03069
120
citations
Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Xiangyu Wang, Donglin Yang, ziqin wang et al.
ICLR 2025posterarXiv:2410.07087
52
citations
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
Zehan Wang, Ziang Zhang, xize cheng et al.
ICML 2024poster
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Kaining Ying, Fanqing Meng, Jin Wang et al.
ICML 2024poster
NExT-Chat: An LMM for Chat, Detection and Segmentation
Ao Zhang, Yuan Yao, Wei Ji et al.
ICML 2024poster