Jie Song

58

Papers

214

Total Citations

Papers (58)

SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion

4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations

SpikePoint: An Efficient Point-based Spiking Neural Network for Event Cameras Action Recognition

Training-Free Pretrained Model Merging

MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild

PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation

GauSTAR: Gaussian Surface Tracking and Reconstruction

Dataset Ownership Verification in Contrastive Pre-trained Models

MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips

Holistic Semantic Representation for Navigational Trajectory Generation

MoGA: 3D Generative Avatar Prior for Monocular Gaussian Avatar Reconstruction

D^2-DPM: Dual Denoising for Quantized Diffusion Probabilistic Models

Agent-Aware Training for Agent-Agnostic Action Advising in Deep Reinforcement Learning

Dataset Ownership Verification for Pre-trained Masked Models

SrSv: Integrating Sequential Rollouts with Sequential Value Estimation for Multi-agent Reinforcement Learning

Bootstrapping ViTs: Towards Liberating Vision Transformers From Pre-Training

Meta-Attention for ViT-Backed Continual Learning

D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions

PINA: Learning a Personalized Implicit Neural Avatar From a Single RGB-D Video Sequence

gDNA: Towards Generative Detailed Neural Avatars

Learning Locally Editable Virtual Humans

X-Avatar: Expressive Human Avatars

InstantAvatar: Learning Avatars From Monocular Video in 60 Seconds

Vid2Avatar: 3D Avatar Reconstruction From Videos in the Wild via Self-Supervised Scene Decomposition

Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation

Hi4D: 4D Instance Segmentation of Close Human Interaction

Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation

Monocular Neural Image Based Rendering With Continuous View Control

ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos

Self-Born Wiring for Neural Trees

Shape-Aware Multi-Person Pose Estimation From Multi-View Images

EM-POSE: 3D Human Pose Estimation From Sparse Electromagnetic Trackers

ModelGiF: Gradient Fields for Model Functional Distance

EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild

Evaluation and Improvement of Interpretability for Self-Explainable Part-Prototype Networks

Human from Blur: Human Pose Tracking from Blurry Images

Human Body Model Fitting by Learned Gradient Descent

Category Level Object Pose Estimation via Neural Analysis-by-Synthesis

Learning with Recoverable Forgetting

Attention Diversification for Domain Generalization

End-to-End Learning for Graph Decomposition

Training Data Provenance Verification: Did Your Model Use Synthetic Data from My Generative Model for Training?

Capturing head avatar with hand contacts from a monocular video

Boosting MLLM Reasoning with Text-Debiased Hint-GRPO

Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation

Association Pattern-enhanced Molecular Representation Learning

Cooperative Policy Agreement: Learning Diverse Policy for Offline MARL

Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos

Cross-Modal Deep Variational Hand Pose Estimation

Transductive Unbiased Embedding for Zero-Shot Learning

DEPARA: Deep Attribution Graph for Deep Knowledge Transferability

Tree-Like Decision Distillation

Training Generative Adversarial Networks in One Stage

Label Matching Semi-Supervised Object Detection

Slimmable Domain Adaptation

Deep Model Transferability from Attribution Maps

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Lookaround Optimizer: $k$ steps around, 1 step average