CVPR 2025 "large language models" Papers

10 papers found

A Simple yet Effective Layout Token in Large Language Models for Document Understanding

Zhaoqing Zhu, Chuwei Luo, Zirui Shao et al.

CVPR 2025posterarXiv:2503.18434
7
citations

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset

Xiao Wang, Fuling Wang, Yuehang Li et al.

CVPR 2025posterarXiv:2410.00379
16
citations

DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models

Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari et al.

CVPR 2025posterarXiv:2503.02175
48
citations

DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation

Amin Karimi, Charalambos Poullis

CVPR 2025posterarXiv:2503.04006
4
citations

Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy

You Li, Fan Ma, Yi Yang

CVPR 2025posterarXiv:2411.16752
9
citations

InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing

Jinlu Zhang, Yixin Chen, Zan Wang et al.

CVPR 2025highlightarXiv:2505.24315
7
citations

IterIS: Iterative Inference-Solving Alignment for LoRA Merging

Hongxu chen, Zhen Wang, Runshi Li et al.

CVPR 2025posterarXiv:2411.15231
5
citations

Learning from Neighbors: Category Extrapolation for Long-Tail Learning

Shizhen Zhao, Xin Wen, Jiahui Liu et al.

CVPR 2025posterarXiv:2410.15980
5
citations

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

Xiao Guo, Xiufeng Song, Yue Zhang et al.

CVPR 2025posterarXiv:2503.20188
24
citations

Video Summarization with Large Language Models

Min Jung Lee, Dayoung Gong, Minsu Cho

CVPR 2025posterarXiv:2504.11199
8
citations