Zhiguo Cao

50

Papers

282

Total Citations

Papers (50)

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

Weighing Counts: Sequential Crowd Counting by Reinforcement Learning

Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting

Unifying Automatic and Interactive Matting with Pretrained ViTs

3D Multi-frame Fusion for Video Stabilization

Semi-supervised Class-Agnostic Motion Prediction with Pseudo Label Regeneration and BEVMix

In-Context Matting

Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields

Training Matting Models Without Alpha Labels

DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting

Monocular Relative Depth Perception With Web Stereo Data Supervision

NM-Net: Mining Reliable Neighbors for Robust Feature Correspondences

Structure-Guided Ranking Loss for Single Image Depth Prediction

3DV: 3D Dynamic Voxel for Action Recognition in Depth Video

P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds

Composing Photos Like a Photographer

Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting

BokehMe: When Neural Rendering Meets Classical Rendering

3D Cinemagraphy From a Single Image

Matching Is Not Enough: A Two-Stage Framework for Category-Agnostic Pose Estimation

Real-Time Multi-Person Eyeblink Detection in the Wild for Untrimmed Video

A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation From a Single RGB Image

When Unsupervised Domain Adaptation Meets Tensor Representations

A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation From a Single Depth Image

From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer

TransView: Inside, Outside, and Across the Cropping View Boundaries

Neural Video Depth Stabilizer

Point-Query Quadtree for Crowd Counting, Localization, and More

CH3Depth: Efficient and Flexible Depth Foundation Model with Flow Matching

Fast Full-frame Video Stabilization with Iterative Optimization

When Epipolar Constraint Meets Non-Local Operators in Multi-View Stereo

Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth Approach with Saddle-shaped Depth Cells

Sparse-to-Dense Depth Completion Revisited: Sampling Strategy and Graph Construction

Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction

C3P: Cross-Domain Pose Prior Propagation for Weakly Supervised 3D Human Pose Estimation

MPIB: An MPI-Based Bokeh Rendering Framework for Realistic Partial Occlusion Effects

Robust Object Detection with Inaccurate Bounding Boxes

FADE: Fusing the Assets of Decoder and Encoder for Task-Agnostic Upsampling

3D Instances as 1D Kernels

Learning to Upsample by Learning to Sample

TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion

WildAvatar: Learning In-the-wild 3D Avatars from the Web

Exploring Contextual Attribute Density in Referring Expression Counting

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction

SRefiner: Soft-Braid Attention for Multi-Agent Trajectory Refinement

S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes

Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations

DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video

SAPA: Similarity-Aware Point Affiliation for Feature Upsampling