Xi Wang

17

Papers

121

Total Citations

Papers (17)

GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control

Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation

PALM: Predicting Actions through Language Models

Real Appearance Modeling for More General Deepfake Detection

StateSpaceDiffuser: Bringing Long Context to Diffusion World Models

SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering

LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model

Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction

Scale-invariant attention

DCTMamba: Advancing JPEG Image Restoration Through Long-Sequence Modeling and Adaptive Frequency Strategy

AKiRa: Augmentation Kit on Rays for Optical Video Generation

WANDR: Intention-guided Human Motion Generation

What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation

Long-Tail Class Incremental Learning via Independent Sub-prototype Construction

Understanding Museum Exhibits using Vision-Language Reasoning

Exploration-Driven Generative Interactive Environments

Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description