Paper "multimodal large language models" Papers
19 papers found
Conference
Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models
Jean Park, Kuk Jin Jang, Basam Alasaly et al.
BearLLM: A Prior Knowledge-Enhanced Bearing Health Management Framework with Unified Vibration Signal Representation
Haotian Peng, Jiawei Liu, Jinsong Du et al.
Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution
Wentao Tan, Qiong Cao, Yibing Zhan et al.
CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs
Siyu Wang, Cailian Chen, Xinyi Le et al.
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion
Yunlong Tang, Gen Zhan, Li Yang et al.
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Han Zhao, Min Zhang, Wei Zhao et al.
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning
Shengqiong Wu, Hao Fei, Liangming Pan et al.
ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models
Yeji Park, Deokyeong Lee, Junsuk Choe et al.
Crafting Dynamic Virtual Activities with Advanced Multimodal Models
Changyang Li, Qingan Yan, Minyoung Kim et al.
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
Yunlong Tang, Daiki Shimada, Jing Bi et al.
Enhancing Multimodal Large Language Models Complex Reason via Similarity Computation
Xiaofeng Zhang, Fanshuo Zeng, Yihao Quan et al.
Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
Weihao Ye, Qiong Wu, Wenhao Lin et al.
Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Guosheng Zhang, Keyao Wang, Haixiao Yue et al.
TextToucher: Fine-Grained Text-to-Touch Generation
Jiahang Tu, Hao Fu, Fengyu Yang et al.
What Kind of Visual Tokens Do We Need? Training-Free Visual Token Pruning for Multi-Modal Large Language Models from the Perspective of Graph
Yutao Jiang, Qiong Wu, Wenhao Lin et al.
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Wenbo Hu, Yifan Xu, Yi Li et al.
InstructDoc: A Dataset for Zero
Shot Generalization of Visual Document Understanding with Instructions - Ryota Tanaka, Taichi Iki, Kyosuke Nishida et al.
PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology
Yuxuan Sun, Chenglu Zhu, Sunyi Zheng et al.
VIGC: Visual Instruction Generation and Correction
Théo Delemazure, Jérôme Lang, Grzegorz Pierczyński