Qi Zheng

7

Papers

180

Total Citations

Papers (7)

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

A Simple yet Effective Layout Token in Large Language Models for Document Understanding

ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data

End-to-End HOI Reconstruction Transformer with Graph-based Encoding

ST-ReP: Learning Predictive Representations Efficiently for Spatial-Temporal Forecasting

Frequency-Biased Synergistic Design for Image Compression and Compensation