Tao Yu

51

Papers

572

Total Citations

Papers (51)

Generative Representational Instruction Tuning

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Fluctuation-Based Adaptive Structured Pruning for Large Language Models

OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning

PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing

GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration

MotionPRO: Exploring the Role of Pressure in Human MoCap and Beyond

Shadow Cones: A Generalized Framework for Partial Order Embeddings

ImViD: Immersive Volumetric Videos for Enhanced VR Engagement

View Transformation Robustness for Multi-View 3D Object Reconstruction with Reconstruction Error-Guided View Selection

Robust 3D Self-Portraits in Seconds

4D Association Graph for Realtime Multi-Person Motion Capture Using Multiple Video Cameras

Deep Implicit Templates for 3D Shape Representation

POSEFusion: Pose-Guided Selective Fusion for Single-View Human Volumetric Capture

Function4D: Real-Time Human Volumetric Capture From Very Sparse Consumer RGBD Sensors

DoubleField: Bridging the Neural Surface and Radiance Fields for High-Fidelity Human Reconstruction and Rendering

FaceVerse: A Fine-Grained and Detail-Controllable 3D Face Morphable Model From a Hybrid Dataset

Interacting Attention Graph for Single Image Two-Hand Reconstruction

Structured Local Radiance Fields for Human Avatar Modeling

Learning Visibility Field for Detailed 3D Human Reconstruction and Relighting

ZBS: Zero-Shot Background Subtraction via Instance-Level Background Modeling and Foreground Selection

Task Residual for Tuning Vision-Language Models

DeepHuman: 3D Human Reconstruction From a Single Image

DeepMultiCap: Performance Capture of Multiple Characters Using Sparse Multiview Cameras

Lightweight Multi-Person Total Motion Capture Using Sparse Multi-View Cameras

PARF: Primitive-Aware Radiance Fusion for Indoor Scene Novel View Synthesis

RobustFusion: Human Volumetric Capture with Data-driven Visual Cues using a RGBD Camera

NormalGAN: Learning Detailed 3D Human from a Single RGB-D Image

Learning Disentangled Feature Representation for Hybrid-distorted Image Restoration

HuMMan: Multi-modal 4D Human Dataset for Versatile Sensing and Modeling

GIMO: Gaze-Informed Human Motion Prediction in Context

Geometry-Aware Single-Image Full-Body Human Relighting

BodyFusion: Real-Time Capture of Human Motion and Surface Geometry Using a Single Depth Camera

V2V3D: View-to-View Denoised 3D Reconstruction for Light Field Microscopy

Neural Fluid Simulation on Geometric Surfaces

Neural Physical Simulation with Multi-Resolution Hash Grid Encoding

DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation

MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors

HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models

Collage: Light-Weight Low-Precision Strategy for LLM Training

DoubleFusion: Real-Time Capture of Human Performances With Inner Body Shapes From a Single Depth Sensor

SimulCap : Single-View Human Performance Capture With Cloth Simulation

Numerically Accurate Hyperbolic Embeddings Using Tiling-Based Models

A New Defense Against Adversarial Images: Turning a Weakness into a Strength

PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning

Representing Hyperbolic Space Accurately using Multi-Component Floats

Understanding Hyperdimensional Computing for Parallel Single-Pass Learning

Mask-based Latent Reconstruction for Reinforcement Learning

Triangulation Residual Loss for Data-efficient 3D Pose Estimation

Coneheads: Hierarchy Aware Attention

Simplifying Graph Convolutional Networks