"document understanding" Papers

14 papers found

Filters:document understanding Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

A Simple yet Effective Layout Token in Large Language Models for Document Understanding

Zhaoqing Zhu, Chuwei Luo, Zirui Shao et al.

CVPR 2025arXiv:2503.18434

citations

A Token-level Text Image Foundation Model for Document Understanding

Tongkun Guan, Zining Wang, Pei Fu et al.

ICCV 2025arXiv:2503.02304

citations

DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding

Wenwen Yu, Zhibo Yang, Yuliang Liu et al.

ICCV 2025arXiv:2508.08589

citations

DocVLM: Make Your VLM an Efficient Reader

Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz et al.

CVPR 2025arXiv:2412.08746

citations

Enhancing Document Understanding with Group Position Embedding: A Novel Approach to Incorporate Layout Information

Yuke Zhu, Yue Zhang, Dongdong Liu et al.

ICLR 2025

citations

Harnessing Webpage UIs for Text-Rich Visual Understanding

Junpeng Liu, Tianyue Ou, Yifan Song et al.

ICLR 2025arXiv:2410.13824

citations

Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid

Mingxin Huang, Yuliang Liu, Dingkang Liang et al.

ICLR 2025arXiv:2408.02034

citations

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Linke Ouyang, Yuan Qu, Hongbin Zhou et al.

CVPR 2025arXiv:2412.07626

citations

Training Plug-and-Play Knowledge Modules with Deep Context Distillation

Lucas Caccia, Alan Ansell, Edoardo Ponti et al.

COLM 2025paperarXiv:2503.08727

citations

Extracting Training Data From Document-Based VQA Models

Francesco Pinto, Nathalie Rauschmayr, Florian Tramer et al.

ICML 2024arXiv:2407.08707

citations

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

Chuwei Luo, Yufan Shen, Zhaoqing Zhu et al.

CVPR 2024arXiv:2404.05225

109

citations

Pengfei Hu, Zhenrong Zhang, Jianshu Zhang et al.

AAAI 2024paperarXiv:2212.02896

citations

Textual Grounding for Open-vocabulary Visual Information Extraction in Layout-diversified Documents

MENGJUN CHENG, Chengquan Zhang, Chang Liu et al.

ECCV 2024

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

Ofir Abramovich, Niv Nayman, Sharon Fogel et al.

ECCV 2024arXiv:2407.12594

citations

"document understanding" Papers

Conference

Paper Type

A Simple yet Effective Layout Token in Large Language Models for Document Understanding

A Token-level Text Image Foundation Model for Document Understanding

DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding

DocVLM: Make Your VLM an Efficient Reader

Enhancing Document Understanding with Group Position Embedding: A Novel Approach to Incorporate Layout Information

Harnessing Webpage UIs for Text-Rich Visual Understanding

Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Training Plug-and-Play Knowledge Modules with Deep Context Distillation

Extracting Training Data From Document-Based VQA Models

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

Table of Contents

Textual Grounding for Open-vocabulary Visual Information Extraction in Layout-diversified Documents

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding