Xiaolong Wang

72
Papers
973
Total Citations

Papers (72)

Designing Deep Networks for Surface Normal Estimation

CVPR 2015
374
citations

TD-MPC2: Scalable, Robust World Models for Continuous Control

ICLR 2024
293
citations

GenSim: Generating Robotic Simulation Tasks via Large Language Models

ICLR 2024
120
citations

One-Minute Video Generation with Test-Time Training

CVPR 2025
65
citations

Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios

CVPR 2024
41
citations

WorldModelBench: Judging Video Generation Models As World Models

NeurIPS 2025
31
citations

Hierarchical World Models as Visual Whole-Body Humanoid Controllers

ICLR 2025
19
citations

Editable Image Elements for Controllable Synthesis

ECCV 2024
13
citations

Consistent Flow Distillation for Text-to-3D Generation

ICLR 2025
12
citations

Parallel Sequence Modeling via Generalized Spatial Propagation Network

CVPR 2025arXiv
3
citations

3D-SPATIAL MULTIMODAL MEMORY

ICLR 2025
2
citations

3D Human Pose Estimation in the Wild by Adversarial Learning

CVPR 2018arXiv
0
citations

Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs

CVPR 2018arXiv
0
citations

Non-Local Neural Networks

CVPR 2018arXiv
0
citations

Learning Correspondence From the Cycle-Consistency of Time

CVPR 2019
0
citations

Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments

CVPR 2019
0
citations

Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks

CVPR 2020
0
citations

Semi-Supervised 3D Hand-Object Poses Estimation With Interactions in Time

CVPR 2021arXiv
0
citations

Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes

CVPR 2021arXiv
0
citations

Learning Continuous Image Representation With Local Implicit Image Function

CVPR 2021arXiv
0
citations

CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs

CVPR 2022arXiv
0
citations

VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution

CVPR 2022
0
citations

GIFS: Neural Implicit Function for General Shape Representation

CVPR 2022arXiv
0
citations

Look Outside the Room: Synthesizing a Consistent Long-Term 3D Scene Video From a Single Image

CVPR 2022arXiv
0
citations

GroupViT: Semantic Segmentation Emerges From Text Supervision

CVPR 2022arXiv
0
citations

Joint Hand Motion and Interaction Hotspots Prediction From Egocentric Videos

CVPR 2022arXiv
0
citations

DexArt: Benchmarking Generalizable Dexterous Manipulation With Articulated Objects

CVPR 2023arXiv
0
citations

Dynamic Inference With Grounding Based Vision and Language Models

CVPR 2023
0
citations

Open-Vocabulary Panoptic Segmentation With Text-to-Image Diffusion Models

CVPR 2023arXiv
0
citations

Zero-Shot Pose Transfer for Unrigged Stylized 3D Characters

CVPR 2023arXiv
0
citations

Policy Adaptation From Foundation Model Feedback

CVPR 2023arXiv
0
citations

Neural Volumetric Memory for Visual Locomotion Control

CVPR 2023arXiv
0
citations

Unsupervised Learning of Visual Representations Using Videos

ICCV 2015
0
citations

Transitive Invariance for Self-Supervised Visual Representation Learning

ICCV 2017arXiv
0
citations

Temporal Dynamic Graph LSTM for Action-Driven Video Object Detection

ICCV 2017arXiv
0
citations

Rethinking Self-Supervised Correspondence Learning: A Video Frame-Level Similarity Perspective

ICCV 2021arXiv
0
citations

Video Autoencoder: Self-Supervised Disentanglement of Static 3D Structure and Motion

ICCV 2021arXiv
0
citations

Contrastive Learning of Image Representations With Cross-Video Cycle-Consistency

ICCV 2021arXiv
0
citations

Robust Object Detection via Instance-Level Temporal Cycle Confusion

ICCV 2021arXiv
0
citations

A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation

ICCV 2021
0
citations

Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning

ICCV 2021
0
citations

Region Similarity Representation Learning

ICCV 2021arXiv
0
citations

Hand-Object Contact Consistency Reasoning for Human Grasps Generation

ICCV 2021arXiv
0
citations

Rethinking Preventing Class-Collapsing in Metric Learning With Margin-Based Losses

ICCV 2021arXiv
0
citations

ActorsNeRF: Animatable Few-shot Human Rendering with Generalizable NeRFs

ICCV 2023arXiv
0
citations

FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models

ICCV 2023arXiv
0
citations

Hierarchical Style-based Networks for Motion Synthesis

ECCV 2020
0
citations

Scraping Textures from Natural Images for Synthesis and Editing

ECCV 2022
0
citations

Transformers As Meta-Learners for Implicit Neural Representations

ECCV 2022
0
citations

Learning Implicit Feature Alignment Function for Semantic Segmentation

ECCV 2022
0
citations

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

ECCV 2022
0
citations

COLMAP-Free 3D Gaussian Splatting

CVPR 2024
0
citations

HomoMatcher: Achieving Dense Feature Matching with Semi-Dense Efficiency by Homography Estimation

AAAI 2025
0
citations

EditAR: Unified Conditional Generation with Autoregressive Models

CVPR 2025
0
citations

Image Neural Field Diffusion Models

CVPR 2024
0
citations

HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

CVPR 2024
0
citations

Pixel-Aligned Language Model

CVPR 2024
0
citations

CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation

CVPR 2024
0
citations

RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos

CVPR 2024
0
citations

Actions ~ Transformations

CVPR 2016
0
citations

Binge Watching: Scaling Affordance Learning From Sitcoms

CVPR 2017arXiv
0
citations

A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection

CVPR 2017
0
citations

Joint-task Self-supervised Learning for Temporal Correspondence

NeurIPS 2019
0
citations

Multi-Task Reinforcement Learning with Soft Modularization

NeurIPS 2020
0
citations

Online Adaptation for Consistent Mesh Reconstruction in the Wild

NeurIPS 2020
0
citations

Test-Time Personalization with a Transformer for Human Pose Estimation

NeurIPS 2021
0
citations

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

NeurIPS 2021
0
citations

Multi-Person 3D Motion Prediction with Multi-Range Transformers

NeurIPS 2021
0
citations

NovelD: A Simple yet Effective Exploration Criterion

NeurIPS 2021
0
citations

Category-Level 6D Object Pose Estimation in the Wild: A Semi-Supervised Learning Approach and A New Dataset

NeurIPS 2022
0
citations

Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator

NeurIPS 2023
0
citations

Elastic Decision Transformer

NeurIPS 2023
0
citations