Xiaolong Wang
72
Papers
973
Total Citations
Papers (72)
Designing Deep Networks for Surface Normal Estimation
CVPR 2015
374
citations
TD-MPC2: Scalable, Robust World Models for Continuous Control
ICLR 2024
293
citations
GenSim: Generating Robotic Simulation Tasks via Large Language Models
ICLR 2024
120
citations
One-Minute Video Generation with Test-Time Training
CVPR 2025
65
citations
Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios
CVPR 2024
41
citations
WorldModelBench: Judging Video Generation Models As World Models
NeurIPS 2025
31
citations
Hierarchical World Models as Visual Whole-Body Humanoid Controllers
ICLR 2025
19
citations
Editable Image Elements for Controllable Synthesis
ECCV 2024
13
citations
Consistent Flow Distillation for Text-to-3D Generation
ICLR 2025
12
citations
Parallel Sequence Modeling via Generalized Spatial Propagation Network
CVPR 2025arXiv
3
citations
3D-SPATIAL MULTIMODAL MEMORY
ICLR 2025
2
citations
3D Human Pose Estimation in the Wild by Adversarial Learning
CVPR 2018arXiv
0
citations
Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs
CVPR 2018arXiv
0
citations
Non-Local Neural Networks
CVPR 2018arXiv
0
citations
Learning Correspondence From the Cycle-Consistency of Time
CVPR 2019
0
citations
Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments
CVPR 2019
0
citations
Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks
CVPR 2020
0
citations
Semi-Supervised 3D Hand-Object Poses Estimation With Interactions in Time
CVPR 2021arXiv
0
citations
Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes
CVPR 2021arXiv
0
citations
Learning Continuous Image Representation With Local Implicit Image Function
CVPR 2021arXiv
0
citations
CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs
CVPR 2022arXiv
0
citations
VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution
CVPR 2022
0
citations
GIFS: Neural Implicit Function for General Shape Representation
CVPR 2022arXiv
0
citations
Look Outside the Room: Synthesizing a Consistent Long-Term 3D Scene Video From a Single Image
CVPR 2022arXiv
0
citations
GroupViT: Semantic Segmentation Emerges From Text Supervision
CVPR 2022arXiv
0
citations
Joint Hand Motion and Interaction Hotspots Prediction From Egocentric Videos
CVPR 2022arXiv
0
citations
DexArt: Benchmarking Generalizable Dexterous Manipulation With Articulated Objects
CVPR 2023arXiv
0
citations
Dynamic Inference With Grounding Based Vision and Language Models
CVPR 2023
0
citations
Open-Vocabulary Panoptic Segmentation With Text-to-Image Diffusion Models
CVPR 2023arXiv
0
citations
Zero-Shot Pose Transfer for Unrigged Stylized 3D Characters
CVPR 2023arXiv
0
citations
Policy Adaptation From Foundation Model Feedback
CVPR 2023arXiv
0
citations
Neural Volumetric Memory for Visual Locomotion Control
CVPR 2023arXiv
0
citations
Unsupervised Learning of Visual Representations Using Videos
ICCV 2015
0
citations
Transitive Invariance for Self-Supervised Visual Representation Learning
ICCV 2017arXiv
0
citations
Temporal Dynamic Graph LSTM for Action-Driven Video Object Detection
ICCV 2017arXiv
0
citations
Rethinking Self-Supervised Correspondence Learning: A Video Frame-Level Similarity Perspective
ICCV 2021arXiv
0
citations
Video Autoencoder: Self-Supervised Disentanglement of Static 3D Structure and Motion
ICCV 2021arXiv
0
citations
Contrastive Learning of Image Representations With Cross-Video Cycle-Consistency
ICCV 2021arXiv
0
citations
Robust Object Detection via Instance-Level Temporal Cycle Confusion
ICCV 2021arXiv
0
citations
A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation
ICCV 2021
0
citations
Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning
ICCV 2021
0
citations
Region Similarity Representation Learning
ICCV 2021arXiv
0
citations
Hand-Object Contact Consistency Reasoning for Human Grasps Generation
ICCV 2021arXiv
0
citations
Rethinking Preventing Class-Collapsing in Metric Learning With Margin-Based Losses
ICCV 2021arXiv
0
citations
ActorsNeRF: Animatable Few-shot Human Rendering with Generalizable NeRFs
ICCV 2023arXiv
0
citations
FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models
ICCV 2023arXiv
0
citations
Hierarchical Style-based Networks for Motion Synthesis
ECCV 2020
0
citations
Scraping Textures from Natural Images for Synthesis and Editing
ECCV 2022
0
citations
Transformers As Meta-Learners for Implicit Neural Representations
ECCV 2022
0
citations
Learning Implicit Feature Alignment Function for Semantic Segmentation
ECCV 2022
0
citations
DexMV: Imitation Learning for Dexterous Manipulation from Human Videos
ECCV 2022
0
citations
COLMAP-Free 3D Gaussian Splatting
CVPR 2024
0
citations
HomoMatcher: Achieving Dense Feature Matching with Semi-Dense Efficiency by Homography Estimation
AAAI 2025
0
citations
EditAR: Unified Conditional Generation with Autoregressive Models
CVPR 2025
0
citations
Image Neural Field Diffusion Models
CVPR 2024
0
citations
HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
CVPR 2024
0
citations
Pixel-Aligned Language Model
CVPR 2024
0
citations
CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation
CVPR 2024
0
citations
RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos
CVPR 2024
0
citations
Actions ~ Transformations
CVPR 2016
0
citations
Binge Watching: Scaling Affordance Learning From Sitcoms
CVPR 2017arXiv
0
citations
A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection
CVPR 2017
0
citations
Joint-task Self-supervised Learning for Temporal Correspondence
NeurIPS 2019
0
citations
Multi-Task Reinforcement Learning with Soft Modularization
NeurIPS 2020
0
citations
Online Adaptation for Consistent Mesh Reconstruction in the Wild
NeurIPS 2020
0
citations
Test-Time Personalization with a Transformer for Human Pose Estimation
NeurIPS 2021
0
citations
Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation
NeurIPS 2021
0
citations
Multi-Person 3D Motion Prediction with Multi-Range Transformers
NeurIPS 2021
0
citations
NovelD: A Simple yet Effective Exploration Criterion
NeurIPS 2021
0
citations
Category-Level 6D Object Pose Estimation in the Wild: A Semi-Supervised Learning Approach and A New Dataset
NeurIPS 2022
0
citations
Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator
NeurIPS 2023
0
citations
Elastic Decision Transformer
NeurIPS 2023
0
citations