Xiaoming Wei
8
Papers
53
Total Citations
Papers (8)
Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation
NeurIPS 2025arXiv
30
citations
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
CVPR 2024
16
citations
ARIG: Autoregressive Interactive Head Generation for Real-time Conversations
ICCV 2025arXiv
7
citations
BEM: Balanced and Entropy-based Mix for Long-Tailed Semi-Supervised Learning
CVPR 2024
0
citations
Animating General Image with Large Visual Motion Model
CVPR 2024
0
citations
Real3D the Curious Case of Neural Scene Degeneration
AAAI 2024
0
citations
Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
AAAI 2025
0
citations
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding
CVPR 2025
0
citations