NeurIPS Poster "multimodal large language models" Papers

57 papers found • Page 1 of 2

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings

Qiong Wu, Wenhao Lin, Yiyi Zhou et al.

NeurIPS 2025posterarXiv:2411.19628
5
citations

ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking

Lequan Lin, Dai Shi, Andi Han et al.

NeurIPS 2025posterarXiv:2511.09833

AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding

Xue zhucun, Jiangning Zhang, Xie Xurong et al.

NeurIPS 2025posterarXiv:2506.13589
7
citations

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

Xiaojun Jia, Sensen Gao, Simeng Qin et al.

NeurIPS 2025posterarXiv:2505.21494
12
citations

ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation

Lingfeng Wang, Hualing Lin, Senda Chen et al.

NeurIPS 2025posterarXiv:2505.16495
2
citations

Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents

Han Lin, Jaemin Cho, Amir Zadeh et al.

NeurIPS 2025posterarXiv:2508.05954
6
citations

BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception

junyan ye, Dongzhi JIANG, Jun He et al.

NeurIPS 2025posterarXiv:2510.09361
2
citations

Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative Search

Haoran Sun, Yankai Jiang, Wenjie Lou et al.

NeurIPS 2025posterarXiv:2506.16962
5
citations

Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback

Janet Wang, Yunbei Zhang, Zhengming Ding et al.

NeurIPS 2025posterarXiv:2506.12323
2
citations

Don't Just Chase “Highlighted Tokens” in MLLMs: Revisiting Visual Holistic Context Retention

Xin Zou, Di Lu, Yizhou Wang et al.

NeurIPS 2025posterarXiv:2510.02912
7
citations

DreamPRM: Domain-reweighted Process Reward Model for Multimodal Reasoning

Qi Cao, Ruiyi Wang, Ruiyi Zhang et al.

NeurIPS 2025posterarXiv:2505.20241
5
citations

EgoBlind: Towards Egocentric Visual Assistance for the Blind

Junbin Xiao, Nanxin Huang, Hao Qiu et al.

NeurIPS 2025posterarXiv:2503.08221
8
citations

Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation

Jitesh Jain, Zhengyuan Yang, Humphrey Shi et al.

NeurIPS 2025posterarXiv:2412.09585
4
citations

Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling

Jiahao Wang, Weiye Xu, Aijun Yang et al.

NeurIPS 2025posterarXiv:2511.10648

Fit the Distribution: Cross-Image/Prompt Adversarial Attacks on Multimodal Large Language Models

Hai Yan, Haijian Ma, Xiaowen Cai et al.

NeurIPS 2025poster

FlexAC: Towards Flexible Control of Associative Reasoning in Multimodal Large Language Models

shengming yuan, Xinyu Lyu, Shuailong Wang et al.

NeurIPS 2025posterarXiv:2510.11190

From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes

Tianxu Wang, Zhuofan Zhang, Ziyu Zhu et al.

NeurIPS 2025posterarXiv:2506.04897

GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing

Rongyao Fang, Chengqi Duan, Kun Wang et al.

NeurIPS 2025poster
60
citations

Guard Me If You Know Me: Protecting Specific Face-Identity from Deepfakes

Kaiqing Lin, Zhiyuan Yan, Ke-Yue Zhang et al.

NeurIPS 2025posterarXiv:2505.19582
2
citations

Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment

Pengfei Zhao, Rongbo Luan, Wei Zhang et al.

NeurIPS 2025posterarXiv:2506.06970
1
citations

Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning

Tianyi Bai, Yuxuan Fan, Qiu Jiantao et al.

NeurIPS 2025posterarXiv:2506.07227
2
citations

HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning

Chuhao Zhou, Jianfei Yang

NeurIPS 2025posterarXiv:2505.17645

Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models

Hao Cheng, Erjia Xiao, Jing Shao et al.

NeurIPS 2025posterarXiv:2501.13772
4
citations

Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning

Kaihang Pan, Yang Wu, Wendong Bu et al.

NeurIPS 2025posterarXiv:2506.01480
6
citations

Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors

Duo Zheng, shijia Huang, Yanyang Li et al.

NeurIPS 2025posterarXiv:2505.24625
24
citations

Learning to Instruct for Visual Instruction Tuning

Zhihan Zhou, Feng Hong, JIAAN LUO et al.

NeurIPS 2025posterarXiv:2503.22215
3
citations

Lie Detector: Unified Backdoor Detection via Cross-Examination Framework

Xuan Wang, Siyuan Liang, Dongping Liao et al.

NeurIPS 2025posterarXiv:2503.16872
4
citations

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

Bingquan Dai, Luo Li, Qihong Tang et al.

NeurIPS 2025posterarXiv:2508.14879
5
citations

MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents

Ziming Wei, Bingqian Lin, Zijian Jiao et al.

NeurIPS 2025posterarXiv:2505.20148
1
citations

MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM

Bowen Dong, Minheng Ni, Zitong Huang et al.

NeurIPS 2025posterarXiv:2505.24238
2
citations

MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation

Jiaxin Huang, Runnan Chen, Ziwen Li et al.

NeurIPS 2025posterarXiv:2503.18135
8
citations

MLLMs Need 3D-Aware Representation Supervision for Scene Understanding

Xiaohu Huang, Jingjing Wu, Qunyi Xie et al.

NeurIPS 2025posterarXiv:2506.01946
17
citations

MTBBench: A Multimodal Sequential Clinical Decision-Making Benchmark in Oncology

Kiril Vasilev, Alexandre Misrahi, Eeshaan Jain et al.

NeurIPS 2025posterarXiv:2511.20490
1
citations

Multimodal Tabular Reasoning with Privileged Structured Information

Jun-Peng Jiang, Yu Xia, Hai-Long Sun et al.

NeurIPS 2025posterarXiv:2506.04088
6
citations

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

Tianhao Peng, Haochen Wang, Yuanxing Zhang et al.

NeurIPS 2025posterarXiv:2511.07250
2
citations

NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation

Longtian Qiu, Shan Ning, Jiaxuan Sun et al.

NeurIPS 2025posterarXiv:2510.21122

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Yana Wei, Liang Zhao, Jianjian Sun et al.

NeurIPS 2025posterarXiv:2507.05255
14
citations

RAG-IGBench: Innovative Evaluation for RAG-based Interleaved Generation in Open-domain Question Answering

Rongyang Zhang, Yuqing Huang, Chengqiang Lu et al.

NeurIPS 2025posterarXiv:2512.05119

Revealing Multimodal Causality with Large Language Models

Jin Li, Shoujin Wang, Qi Zhang et al.

NeurIPS 2025posterarXiv:2509.17784

RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video

ShuHang Xun, Sicheng Tao, Jungang Li et al.

NeurIPS 2025posterarXiv:2505.02064
5
citations

Safe RLHF-V: Safe Reinforcement Learning from Multi-modal Human Feedback

Jiaming Ji, Xinyu Chen, Rui Pan et al.

NeurIPS 2025posterarXiv:2503.17682
8
citations

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Yuhao Zhou, Yiheng Wang, Xuming He et al.

NeurIPS 2025posterarXiv:2506.10521
15
citations

SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs

Jinhong Deng, Wen Li, Joey Tianyi Zhou et al.

NeurIPS 2025posterarXiv:2510.24214

Situat3DChange: Situated 3D Change Understanding Dataset for Multimodal Large Language Model

Ruiping Liu, Junwei Zheng, Yufan Chen et al.

NeurIPS 2025posterarXiv:2510.11509

SMMILE: An expert-driven benchmark for multimodal medical in-context learning

Melanie Rieff, Maya Varma, Ossian Rabow et al.

NeurIPS 2025posterarXiv:2506.21355
3
citations

SpatialLM: Training Large Language Models for Structured Indoor Modeling

Yongsen Mao, Junhao Zhong, Chuan Fang et al.

NeurIPS 2025posterarXiv:2506.07491
21
citations

Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs

Fangrui Zhu, Hanhui Wang, Yiming Xie et al.

NeurIPS 2025posterarXiv:2506.04220

Structure-Aware Cooperative Ensemble Evolutionary Optimization on Combinatorial Problems with Multimodal Large Language Models

Jie Zhao, Kang Cheong

NeurIPS 2025posterarXiv:2510.21906

The Mirage of Performance Gains: Why Contrastive Decoding Fails to Mitigate Object Hallucinations in MLLMs?

Hao Yin, Guangzong Si, Zilei Wang

NeurIPS 2025posterarXiv:2504.10020

un$^2$CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP

Yinqi Li, Jiahe Zhao, Hong Chang et al.

NeurIPS 2025posterarXiv:2505.24517
1
citations
← PreviousNext →