Xiaowei Zhou

81

Papers

555

Total Citations

Papers (81)

Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed

SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation

FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views

IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination

Generating Human Motion in 3D Scenes from Text Descriptions

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

SAM-guided Graph Cut for 3D Instance Segmentation

EnvGS: Modeling View-Dependent Appearance with Environment Gaussian

FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation

BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation

EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds

Hierarchy UGP: Hierarchy Unified Gaussian Primitive for Large-Scale Dynamic Scene Reconstruction

Motion-2-to-3: Leveraging 2D Motion Data for 3D Motion Generations

EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Priors

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

Relightable and Animatable Neural Avatar from Sparse-View Video

SpatialTracker: Tracking Any 2D Pixels in 3D Space

4K4D: Real-Time 4D View Synthesis at 4K Resolution

Detector-Free Structure from Motion

3D Shape Estimation From 2D Landmarks: A Convex Relaxation Approach

Sparseness Meets Deepness: 3D Human Pose Estimation From Monocular Video

Harvesting Multiple Views for Marker-Less 3D Human Pose Annotations

Coarse-To-Fine Volumetric Prediction for Single-Image 3D Human Pose

Learning to Estimate 3D Human Pose and Shape From a Single Color Image

Multi-Image Semantic Matching by Mining Consistent Features

Ordinal Depth Supervision for 3D Human Pose Estimation

Extreme Relative Pose Estimation for RGB-D Scans via Scene Completion

PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation

Fast and Robust Multi-Person 3D Pose Estimation From Multiple Views

Learning Transformation Synchronization

Path-Invariant Map Networks

Coherent Reconstruction of Multiple Humans From a Single Image

Deep Snake for Real-Time Instance Segmentation

Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation

Reconstructing 3D Human Pose by Watching Humans in the Mirror

Neural Body: Implicit Neural Representations With Structured Latent Codes for Novel View Synthesis of Dynamic Humans

VS-Net: Voting With Segmentation for Visual Localization

LoFTR: Detector-Free Local Feature Matching With Transformers

NeuralRecon: Real-Time Coherent 3D Reconstruction From Monocular Video

Neural Rays for Occlusion-Aware Image-Based Rendering

Neural 3D Scene Reconstruction With the Manhattan-World Assumption

Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation

Modeling Indirect Illumination for Inverse Rendering

OnePose: One-Shot Object Pose Estimation Without CAD Models

PlanarRecon: Real-Time 3D Plane Detection and Reconstruction From Posed Monocular Videos

Ray Priors Through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation

Painting 3D Nature in 2D: View Synthesis of Natural Scenes From a Single Semantic Mask

Neural Scene Chronology

Learning Human Mesh Recovery in 3D Scenes

Reconstructing Humans with a Biomechanically Accurate Skeleton

Long-Term Visual Localization With Mobile Sensors

TensoIR: Tensorial Inverse Rendering

Learning Neural Volumetric Representations of Dynamic Humans in Minutes

AutoRecon: Automated 3D Object Discovery and Reconstruction

Single Image Pop-Up From Discriminatively Learned Parts

Multi-Image Matching via Fast Alternating Minimization

Fast Multi-Image Matching via Density-Based Clustering

Prior Guided Dropout for Robust Visual Localization in Dynamic Environments

Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies

You Don't Only Look Once: Constructing Spatial-Temporal Memory for Integrated 3D Object Detection and Tracking

Ponder: Point Cloud Pre-training via Neural Rendering

Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models

Deep Active Contours for Real-time 6-DoF Object Tracking

Learning Feature Descriptors using Camera Pose Supervision

Motion Capture from Internet Videos

Representing Volumetric Videos As Dynamic MLP Maps

LiDAR-RT: Gaussian-based Ray Tracing for Dynamic LiDAR Re-simulation

Glossy Object Reconstruction with Cost-effective Polarized Acquisition

StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models

ReTracker: Exploring Image Matching for Robust Online Any Point Tracking

SpatialTrackerV2: Advancing 3D Point Tracking with Explicit Camera Motion

MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Precise Action-to-Video Generation Through Visual Action Prompts

UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction

ERNet: Efficient Non-Rigid Registration Network for Point Sequences

GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs

TotalSelfScan: Learning Full-body Avatars from Self-Portrait Videos of Faces, Hands, and Bodies

OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models

Compact Neural Volumetric Video Representations with Dynamic Codebooks