2025 "multimodal models" Papers

12 papers found

CASP: Compression of Large Multimodal Models Based on Attention Sparsity

Mohsen Gholami, Mohammad Akbari, Kevin Cannons et al.

CVPR 2025highlightarXiv:2503.05936
2
citations

Context-aware Dynamic Pruning for Speech Foundation Models

Masao Someki, Yifan Peng, Siddhant Arora et al.

ICLR 2025poster
7
citations

DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models

Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari et al.

CVPR 2025posterarXiv:2503.02175
48
citations

DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection

Li Li, Huixian Gong, Hao Dong et al.

CVPR 2025highlightarXiv:2411.08227
14
citations

LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs

Hanyu Zhou, Gim Hee Lee

ICCV 2025posterarXiv:2503.06934
2
citations

Matryoshka Multimodal Models

Mu Cai, Jianwei Yang, Jianfeng Gao et al.

ICLR 2025posterarXiv:2405.17430
58
citations

MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge

yuntao du, Kailin Jiang, Zhi Gao et al.

ICLR 2025posterarXiv:2502.19870
9
citations

Reconstructive Visual Instruction Tuning

Haochen Wang, Anlin Zheng, Yucheng Zhao et al.

ICLR 2025posterarXiv:2410.09575
34
citations

See What You Are Told: Visual Attention Sink in Large Multimodal Models

Seil Kang, Jinyeong Kim, Junhyeok Kim et al.

ICLR 2025posterarXiv:2503.03321
52
citations

SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation

Leigang Qu, Haochuan Li, Wenjie Wang et al.

CVPR 2025posterarXiv:2412.05818
9
citations

VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization

Sihan Yang, Runsen Xu, Chenhang Cui et al.

ICCV 2025posterarXiv:2508.05211
3
citations

When Thinking Drifts: Evidential Grounding for Robust Video Reasoning

Romy Luo, Zihui (Sherry) Xue, Alex Dimakis et al.

NeurIPS 2025posterarXiv:2510.06077
4
citations