"document understanding" Papers

14 papers found

A Simple yet Effective Layout Token in Large Language Models for Document Understanding

Zhaoqing Zhu, Chuwei Luo, Zirui Shao et al.

CVPR 2025arXiv:2503.18434
7
citations

A Token-level Text Image Foundation Model for Document Understanding

Tongkun Guan, Zining Wang, Pei Fu et al.

ICCV 2025arXiv:2503.02304
4
citations

DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding

Wenwen Yu, Zhibo Yang, Yuliang Liu et al.

ICCV 2025arXiv:2508.08589
4
citations

DocVLM: Make Your VLM an Efficient Reader

Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz et al.

CVPR 2025arXiv:2412.08746
10
citations

Enhancing Document Understanding with Group Position Embedding: A Novel Approach to Incorporate Layout Information

Yuke Zhu, Yue Zhang, Dongdong Liu et al.

ICLR 2025
2
citations

Harnessing Webpage UIs for Text-Rich Visual Understanding

Junpeng Liu, Tianyue Ou, Yifan Song et al.

ICLR 2025arXiv:2410.13824
22
citations

Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid

Mingxin Huang, Yuliang Liu, Dingkang Liang et al.

ICLR 2025arXiv:2408.02034
22
citations

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Linke Ouyang, Yuan Qu, Hongbin Zhou et al.

CVPR 2025arXiv:2412.07626
45
citations

Training Plug-and-Play Knowledge Modules with Deep Context Distillation

Lucas Caccia, Alan Ansell, Edoardo Ponti et al.

COLM 2025paperarXiv:2503.08727
8
citations

Extracting Training Data From Document-Based VQA Models

Francesco Pinto, Nathalie Rauschmayr, Florian Tramer et al.

ICML 2024arXiv:2407.08707
6
citations

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

Chuwei Luo, Yufan Shen, Zhaoqing Zhu et al.

CVPR 2024arXiv:2404.05225
109
citations

Table of Contents

Pengfei Hu, Zhenrong Zhang, Jianshu Zhang et al.

AAAI 2024paperarXiv:2212.02896
15
citations

Textual Grounding for Open-vocabulary Visual Information Extraction in Layout-diversified Documents

MENGJUN CHENG, Chengquan Zhang, Chang Liu et al.

ECCV 2024

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

Ofir Abramovich, Niv Nayman, Sharon Fogel et al.

ECCV 2024arXiv:2407.12594
6
citations