2025 "multimodal models" Papers
12 papers found
CASP: Compression of Large Multimodal Models Based on Attention Sparsity
Mohsen Gholami, Mohammad Akbari, Kevin Cannons et al.
CVPR 2025highlightarXiv:2503.05936
2
citations
Context-aware Dynamic Pruning for Speech Foundation Models
Masao Someki, Yifan Peng, Siddhant Arora et al.
ICLR 2025poster
7
citations
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari et al.
CVPR 2025posterarXiv:2503.02175
48
citations
DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection
Li Li, Huixian Gong, Hao Dong et al.
CVPR 2025highlightarXiv:2411.08227
14
citations
LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs
Hanyu Zhou, Gim Hee Lee
ICCV 2025posterarXiv:2503.06934
2
citations
Matryoshka Multimodal Models
Mu Cai, Jianwei Yang, Jianfeng Gao et al.
ICLR 2025posterarXiv:2405.17430
58
citations
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge
yuntao du, Kailin Jiang, Zhi Gao et al.
ICLR 2025posterarXiv:2502.19870
9
citations
Reconstructive Visual Instruction Tuning
Haochen Wang, Anlin Zheng, Yucheng Zhao et al.
ICLR 2025posterarXiv:2410.09575
34
citations
See What You Are Told: Visual Attention Sink in Large Multimodal Models
Seil Kang, Jinyeong Kim, Junhyeok Kim et al.
ICLR 2025posterarXiv:2503.03321
52
citations
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Leigang Qu, Haochuan Li, Wenjie Wang et al.
CVPR 2025posterarXiv:2412.05818
9
citations
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
Sihan Yang, Runsen Xu, Chenhang Cui et al.
ICCV 2025posterarXiv:2508.05211
3
citations
When Thinking Drifts: Evidential Grounding for Robust Video Reasoning
Romy Luo, Zihui (Sherry) Xue, Alex Dimakis et al.
NeurIPS 2025posterarXiv:2510.06077
4
citations