Yuliang Liu
11
Papers
417
Total Citations
Papers (11)
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
CVPR 2024
384
citations
ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining
AAAI 2024arXiv
18
citations
Bridging the Gap Between End-to-End and Two-Step Text Spotting
CVPR 2024
11
citations
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
ICCV 2025arXiv
4
citations
Training-free Geometric Image Editing on Diffusion Models
ICCV 2025arXiv
0
citations
OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition
CVPR 2024
0
citations
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting
CVPR 2025
0
citations
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
ICML 2024
0
citations
LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance
ICCV 2025
0
citations
Towards Comprehensive Lecture Slides Understanding: Large-scale Dataset and Effective Method
ICCV 2025
0
citations
Multi-scenario Overlapping Text Segmentation with Depth Awareness
ICCV 2025
0
citations