R. Manmatha

17

Papers

68

Total Citations

Papers (17)

DocFormerv2: Local Features for Document Understanding

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

No Head Left Behind – Multi-Head Alignment Distillation for Transformers

Deep Decision Network for Multi-Class Image Classification

Compressed Video Action Recognition

SCATTER: Selective Context Attentional Scene Text Recognizer

Sequence-to-Sequence Contrastive Learning for Text Recognition

LaTr: Layout-Aware Transformer for Scene-Text VQA

Towards Weakly-Supervised Text Spotting Using a Multi-Task Transformer

PolyFormer: Referring Image Segmentation As Sequential Polygon Generation

Sampling Matters in Deep Embedding Learning

DocFormer: End-to-End Transformer for Document Understanding

DocTr: Document Transformer for Structured Information Extraction in Documents

Scaling up Image Segmentation across Data and Tasks

GLASS: Global to Local Attention for Scene-Text Spotting

Scalable Enumeration of Trap Spaces in Boolean Networks via Answer Set Programming

On the Scalability of Diffusion-based Text-to-Image Generation