Lianwen Jin

33

Papers

215

Total Citations

Papers (33)

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy

DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks

ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining

M2Doc: A Multi-Modal Fusion Approach for Document Layout Analysis

Bridging the Gap Between End-to-End and Two-Step Text Spotting

Revisiting Tampered Scene Text Detection in the Era of Generative AI

DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding

Predicting the Original Appearance of Damaged Historical Documents

On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering

Implicit Feature Alignment: Learn To Convert Text Recognizer to Text Spotter

Fourier Contour Embedding for Arbitrary-Shaped Text Detection

SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization

SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition

Look Closer To Supervise Better: One-Shot Font Generation via Component-Based Discriminator

Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution

M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis

Scale-Aware Modulation Meet Transformer

Revisiting Scene Text Recognition: A Data Perspective

ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

RD-GAN: Few/Zero-Shot Chinese Character Style Transfer via Radical Decomposition and Rendering

Don’t Forget Me: Accurate Background Recovery for Text Removal via Modeling Local-Global Context

Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition

DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming

DocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple Degradations

Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods

UPOCR: Towards Unified Pixel-Level OCR Interface

Deep Matching Prior Network: Toward Tighter Multi-Oriented Text Detection

Aggregation Cross-Entropy for Sequence Recognition

Tightness-Aware Evaluation Protocol for Scene Text Detection

ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network

MSDS: A Large-Scale Chinese Signature and Token Digit String Dataset for Handwriting Verification

M5HisDoc: A Large-scale Multi-style Chinese Historical Document Analysis Benchmark