Qingpei Guo

8

Papers

11

Total Citations

Papers (8)

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding

Engage for All: Making Ordinary Image Descriptions Appealing Again!

Social Debiasing for Fair Multi-modal LLMs

Unified Video Generation via Next-Set Prediction in Continuous Domain

Attributive Reasoning for Hallucination Diagnosis of Large Language Models

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment