Xiaoming Wei

8

Papers

53

Total Citations

Papers (8)

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

NeurIPS 2025arXiv

ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

ARIG: Autoregressive Interactive Head Generation for Real-time Conversations

BEM: Balanced and Entropy-based Mix for Long-Tailed Semi-Supervised Learning

Animating General Image with Large Visual Motion Model

Real3D the Curious Case of Neural Scene Degeneration

Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding