Xi Wang

17
Papers
121
Total Citations

Papers (17)

GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control

CVPR 2025arXiv
41
citations

Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation

CVPR 2024
22
citations

PALM: Predicting Actions through Language Models

ECCV 2024arXiv
22
citations

Real Appearance Modeling for More General Deepfake Detection

ECCV 2024
12
citations

StateSpaceDiffuser: Bringing Long Context to Diffusion World Models

NeurIPS 2025
8
citations

SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering

AAAI 2025
7
citations

LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model

CVPR 2025
4
citations

Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction

ICLR 2025arXiv
3
citations

Scale-invariant attention

NeurIPS 2025
2
citations

DCTMamba: Advancing JPEG Image Restoration Through Long-Sequence Modeling and Adaptive Frequency Strategy

AAAI 2025
0
citations

AKiRa: Augmentation Kit on Rays for Optical Video Generation

CVPR 2025
0
citations

WANDR: Intention-guided Human Motion Generation

CVPR 2024
0
citations

What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation

CVPR 2024
0
citations

Long-Tail Class Incremental Learning via Independent Sub-prototype Construction

CVPR 2024
0
citations

Understanding Museum Exhibits using Vision-Language Reasoning

ICCV 2025
0
citations

Exploration-Driven Generative Interactive Environments

CVPR 2025
0
citations

Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description

ICCV 2025
0
citations