Qi Zheng

8

Papers

185

Total Citations

Papers (8)

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

A Simple yet Effective Layout Token in Large Language Models for Document Understanding

ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data

WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation

End-to-End HOI Reconstruction Transformer with Graph-based Encoding

ST-ReP: Learning Predictive Representations Efficiently for Spatial-Temporal Forecasting

Frequency-Biased Synergistic Design for Image Compression and Compensation