Poster "multimodal large language models" Papers
51 papers found • Page 1 of 2
Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Qiong Wu, Wenhao Lin, Yiyi Zhou et al.
Adaptive Keyframe Sampling for Long Video Understanding
Xi Tang, Jihao Qiu, Lingxi Xie et al.
AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding
Xue zhucun, Jiangning Zhang, Xie Xurong et al.
AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
Ziyin Zhou, Yunpeng Luo, Yuanchen Wu et al.
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models
Junho Kim, Hyungjin Chung, Byung-Hoon Kim
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLMs
Xinyu Fang, Zhijian Chen, Kai Lan et al.
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Wenhui Liao, Jiapeng Wang, Hongliang Li et al.
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
Wenwen Yu, Zhibo Yang, Yuliang Liu et al.
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi, Fuxiao Liu, Shihao Wang et al.
EgoBlind: Towards Egocentric Visual Assistance for the Blind
Junbin Xiao, Nanxin Huang, Hao Qiu et al.
EventGPT: Event Stream Understanding with Multimodal Large Language Models
shaoyu liu, Jianing Li, guanghui zhao et al.
Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment
Jiayi Guo, Zhao Junhao, Chaoqun Du et al.
FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging
Zichen Tang, Haihong E, Jiacheng Liu et al.
Fit the Distribution: Cross-Image/Prompt Adversarial Attacks on Multimodal Large Language Models
Hai Yan, Haijian Ma, Xiaowen Cai et al.
FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers
Yanbing Zhang, Zhe Wang, Qin Zhou et al.
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs
Yi Fang, Bowen Jin, Jiacheng Shen et al.
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
Jiayi Guo, Chuanhao Yan, Xingqian Xu et al.
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks
Lehan Wang, Haonan Wang, Honglong Yang et al.
Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs
Barrett Tang, Zile Huang, Chengzhi Liu et al.
Is Your Multimodal Language Model Oversensitive to Safe Queries?
Xirui Li, Hengguang Zhou, Ruochen Wang et al.
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Shiji Zhao, Ranjie Duan, Fengxiang Wang et al.
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
Duo Zheng, shijia Huang, Yanyang Li et al.
LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
Haoran Lou, Chunxiao Fan, Ziyan Liu et al.
MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM
Bowen Dong, Minheng Ni, Zitong Huang et al.
MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS
Sheng-Chieh Lin, Chankyu Lee, Mohammad Shoeybi et al.
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning
Gang Liu, Michael Sun, Wojciech Matusik et al.
Multimodal LLM Guided Exploration and Active Mapping using Fisher Information
Wen Jiang, BOSHU LEI, Katrina Ashton et al.
Multimodal Tabular Reasoning with Privileged Structured Information
Jun-Peng Jiang, Yu Xia, Hai-Long Sun et al.
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Tianhao Peng, Haochen Wang, Yuanxing Zhang et al.
Object-aware Sound Source Localization via Audio-Visual Scene Understanding
Sung Jin Um, Dongjin Kim, Sangmin Lee et al.
ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models
Yahan Tu, Rui Hu, Jitao Sang
Online Video Understanding: OVBench and VideoChat-Online
Zhenpeng Huang, Xinhao Li, Jiaqi Li et al.
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs
Yangyu Huang, Tianyi Gao, Haoran Xu et al.
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
Shaojie Zhang, Jiahui Yang, Jianqin Yin et al.
Revealing Multimodal Causality with Large Language Models
Jin Li, Shoujin Wang, Qi Zhang et al.
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
ShuHang Xun, Sicheng Tao, Jungang Li et al.
SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs
Jinhong Deng, Wen Li, Joey Tianyi Zhou et al.
Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding
feilong tang, Chengzhi Liu, Zhongxing Xu et al.
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding
Yangliu Hu, Zikai Song, Na Feng et al.
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
ziang yan, Zhilin Li, Yinan He et al.
The Mirage of Performance Gains: Why Contrastive Decoding Fails to Mitigate Object Hallucinations in MLLMs?
Hao Yin, Guangzong Si, Zilei Wang
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
Ruilin Luo, Zhuofan Zheng, Lei Wang et al.
VisualLens: Personalization through Task-Agnostic Visual History
Wang Bill Zhu, Deqing Fu, Kai Sun et al.
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu, Xiaosen Zheng, Tianyu Pang et al.
Improving Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning
Wei Li, Hehe Fan, Yongkang Wong et al.
LLMCO4MR: LLMs-aided Neural Combinatorial Optimization for Ancient Manuscript Restoration from Fragments with Case Studies on Dunhuang
Yuqing Zhang, Hangqi Li, Shengyu Zhang et al.
Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning
Zhuo Huang, Chang Liu, Yinpeng Dong et al.
MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark
Dongping Chen, Ruoxi Chen, Shilin Zhang et al.
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
Xin Liu, Yichen Zhu, Jindong Gu et al.
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu, Hao Fei, Leigang Qu et al.