Yuliang Liu

11

Papers

417

Total Citations

Papers (11)

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining

Bridging the Gap Between End-to-End and Two-Step Text Spotting

DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding

Training-free Geometric Image Editing on Diffusion Models

OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition

SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance

Towards Comprehensive Lecture Slides Understanding: Large-scale Dataset and Effective Method

Multi-scenario Overlapping Text Segmentation with Depth Awareness