Xueyan Zou

7

Papers

168

Total Citations

Papers (7)

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Visual In-Context Prompting

3D-SPATIAL MULTIMODAL MEMORY

Progressive Temporal Feature Alignment Network for Video Inpainting

Generalized Decoding for Pixel, Image, and Language

A Simple Framework for Open-Vocabulary Segmentation and Detection

Segment Everything Everywhere All at Once