Qingpei Guo

12

Papers

11

Total Citations

Papers (12)

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding

Engage for All: Making Ordinary Image Descriptions Appealing Again!

Social Debiasing for Fair Multi-modal LLMs

Unified Video Generation via Next-Set Prediction in Continuous Domain

Attributive Reasoning for Hallucination Diagnosis of Large Language Models

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

LPSNet: A Lightweight Solution for Fast Panoptic Segmentation

CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset

Boundary-Aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval

Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input