Poster "multimodal large language models" Papers

232 papers found • Page 4 of 5

Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models

Qingni Wang, Tiantian Geng, Zhiyuan Wang et al.

ICLR 2025arXiv:2410.08174
14
citations

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Yuhao Zhou, Yiheng Wang, Xuming He et al.

NEURIPS 2025arXiv:2506.10521
18
citations

ScImage: How good are multimodal large language models at scientific text-to-image generation?

Leixin Zhang, Steffen Eger, Yinjie Cheng et al.

ICLR 2025arXiv:2412.02368
5
citations

SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs

Jinhong Deng, Wen Li, Joey Tianyi Zhou et al.

NEURIPS 2025arXiv:2510.24214

Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding

feilong tang, Chengzhi Liu, Zhongxing Xu et al.

CVPR 2025arXiv:2505.16652
25
citations

Seeking and Updating with Live Visual Knowledge

Mingyang Fu, Yuyang Peng, Dongping Chen et al.

NEURIPS 2025arXiv:2504.05288
7
citations

SegLLM: Multi-round Reasoning Segmentation with Large Language Models

Xudong Wang, Shaolun Zhang, Shufan Li et al.

ICLR 2025
9
citations

Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens

Qihang Fan, Huaibo Huang, Mingrui Chen et al.

ICCV 2025arXiv:2405.13337
3
citations

SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding

Yangliu Hu, Zikai Song, Na Feng et al.

CVPR 2025arXiv:2504.07745
11
citations

ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers

Qianhao Yuan, Qingyu Zhang, yanjiang liu et al.

ICCV 2025arXiv:2504.00502
4
citations

Situat3DChange: Situated 3D Change Understanding Dataset for Multimodal Large Language Model

Ruiping Liu, Junwei Zheng, Yufan Chen et al.

NEURIPS 2025arXiv:2510.11509

SketchAgent: Language-Driven Sequential Sketch Generation

Yael Vinker, Tamar Rott Shaham, Kristine Zheng et al.

CVPR 2025arXiv:2411.17673
20
citations

Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping

Weili Zeng, Ziyuan Huang, Kaixiang Ji et al.

ICCV 2025arXiv:2503.21817
6
citations

SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

Ying Chen, Guoan Wang, Yuanfeng Ji et al.

CVPR 2025arXiv:2410.11761
29
citations

SMMILE: An expert-driven benchmark for multimodal medical in-context learning

Melanie Rieff, Maya Varma, Ossian Rabow et al.

NEURIPS 2025arXiv:2506.21355
3
citations

SMoLoRA: Exploring and Defying Dual Catastrophic Forgetting in Continual Visual Instruction Tuning

Ziqi Wang, Chang Che, Qi Wang et al.

ICCV 2025arXiv:2411.13949
4
citations

SpaceServe: Spatial Multiplexing of Complementary Encoders and Decoders for Multimodal LLMs

zhicheng li, Shuoming Zhang, Jiacheng Zhao et al.

NEURIPS 2025

SpatialLM: Training Large Language Models for Structured Indoor Modeling

Yongsen Mao, Junhao Zhong, Chuan Fang et al.

NEURIPS 2025arXiv:2506.07491
22
citations

Spatially-aware Weights Tokenization for NeRF-Language Models

Andrea Amaduzzi, Pierluigi Zama Ramirez, Giuseppe Lisanti et al.

NEURIPS 2025

SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models

Haotian Xia, Zhengbang Yang, Junbo Zou et al.

ICLR 2025arXiv:2410.08474
14
citations

STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?

Yun Li, Yiming Zhang, Tao Lin et al.

ICCV 2025arXiv:2503.23765
38
citations

Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs

Fangrui Zhu, Hanhui Wang, Yiming Xie et al.

NEURIPS 2025arXiv:2506.04220

Structure-Aware Cooperative Ensemble Evolutionary Optimization on Combinatorial Problems with Multimodal Large Language Models

Jie Zhao, Kang Cheong

NEURIPS 2025arXiv:2510.21906

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Hao Li, Changyao TIAN, Jie Shao et al.

CVPR 2025arXiv:2412.09604
35
citations

Taming the Untamed: Graph-Based Knowledge Retrieval and Reasoning for MLLMs to Conquer the Unknown

Bowen Wang, Zhouqiang Jiang, Yasuaki Susumu et al.

ICCV 2025arXiv:2506.17589
1
citations

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

ziang yan, Zhilin Li, Yinan He et al.

CVPR 2025arXiv:2412.19326
20
citations

Text4Seg: Reimagining Image Segmentation as Text Generation

Mengcheng Lan, Chaofeng Chen, Yue Zhou et al.

ICLR 2025arXiv:2410.09855
34
citations

The Mirage of Performance Gains: Why Contrastive Decoding Fails to Mitigate Object Hallucinations in MLLMs?

Hao Yin, Guangzong Si, Zilei Wang

NEURIPS 2025arXiv:2504.10020

The Photographer's Eye: Teaching Multimodal Large Language Models to See, and Critique Like Photographers

Daiqing Qi, Handong Zhao, Jing Shi et al.

CVPR 2025
1
citations

TWIST & SCOUT: Grounding Multimodal LLM-Experts by Forget-Free Tuning

Aritra Bhowmik, Mohammad Mahdi Derakhshani, Dennis Koelma et al.

ICCV 2025arXiv:2410.10491

un$^2$CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP

Yinqi Li, Jiahe Zhao, Hong Chang et al.

NEURIPS 2025arXiv:2505.24517
1
citations

Unlabeled Data Improves Fine-Grained Image Zero-shot Classification with Multimodal LLMs

Yunqi Hong, Sohyun An, Andrew Bai et al.

NEURIPS 2025arXiv:2506.03195
1
citations

Unlocking Multimodal Mathematical Reasoning via Process Reward Model

Ruilin Luo, Zhuofan Zheng, Lei Wang et al.

NEURIPS 2025arXiv:2501.04686
31
citations

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly

Yexin Liu, Zhengyang Liang, Yueze Wang et al.

CVPR 2025arXiv:2406.10638
19
citations

Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA

Zhixuan Li, Hyunse Yoon, Sanghoon Lee et al.

ICCV 2025arXiv:2503.10225
3
citations

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach

Jing Bi, Lianggong Bruce Wen, Zhang Liu et al.

CVPR 2025arXiv:2412.18108
18
citations

UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation

Qihui Zhang, Munan Ning, Zheyuan Liu et al.

CVPR 2025arXiv:2503.14941
2
citations

UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?

Yuanxin Liu, Rui Zhu, Shuhuai Ren et al.

NEURIPS 2025arXiv:2503.09949
3
citations

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

Yunlong Tang, JunJia Guo, Hang Hua et al.

CVPR 2025arXiv:2411.10979
16
citations

Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding

Duo Zheng, Shijia Huang, Liwei Wang

CVPR 2025arXiv:2412.00493
70
citations

Video Perception Models for 3D Scene Synthesis

Rui Huang, Guangyao Zhai, Zuria Bauer et al.

NEURIPS 2025arXiv:2506.20601
6
citations

Video Summarization with Large Language Models

Min Jung Lee, Dayoung Gong, Minsu Cho

CVPR 2025arXiv:2504.11199
11
citations

Vision Function Layer in Multimodal LLMs

Cheng Shi, Yizhou Yu, Sibei Yang

NEURIPS 2025arXiv:2509.24791
4
citations

Visual Instruction Bottleneck Tuning

Changdae Oh, Jiatong Li, Shawn Im et al.

NEURIPS 2025arXiv:2505.13946
3
citations

VisualLens: Personalization through Task-Agnostic Visual History

Wang Bill Zhu, Deqing Fu, Kai Sun et al.

NEURIPS 2025arXiv:2411.16034

VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning

Nilay Yilmaz, Maitreya Patel, Lawrence Luo et al.

ICLR 2025arXiv:2503.00043
1
citations

VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning

Jinglei Zhang, Yuanfan Guo, Rolandos Alexandros Potamias et al.

ICCV 2025arXiv:2510.14672
2
citations

WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image

Yuci Liang, Xinheng Lyu, Meidan Ding et al.

ICCV 2025arXiv:2412.02141
10
citations

X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation

jian ma, Qirong Peng, Xu Guo et al.

ICCV 2025arXiv:2503.06134
5
citations

You Only Communicate Once: One-shot Federated Low-Rank Adaptation of MLLM

Binqian Xu, Haiyang Mei, Zechen Bai et al.

NEURIPS 2025