Wei-Shi Zheng

38

Papers

122

Total Citations

Papers (38)

LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models

Dexterous Grasp Transformer

Single-View Scene Point Cloud Human Grasp Generation

ViSpeak: Visual Instruction Feedback in Streaming Videos

Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation

Factorized Diffusion Autoencoder for Unsupervised Disentangled Representation Learning

DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation

Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework

NECA: Neural Customizable Human Avatar

DNF-Intrinsic: Deterministic Noise-Free Diffusion for Indoor Inverse Rendering

Person De-reidentification: A Variation-guided Identity Shift Modeling

EntityErasure: Erasing Entity Cleanly via Amodal Entity Segmentation and Completion

Learning Implicit Features with Flow-Infused Transformations for Realistic Virtual Try-On

FA: Forced Prompt Learning of Vision-Language Models for Out-of-Distribution Detection

Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels

Domain Generalizable Portrait Style Transfer

monoVLN: Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation

Less Static, More Private: Towards Transferable Privacy-Preserving Action Recognition by Generative Decoupled Learning

AffordDexGrasp: Open-set Language-guided Dexterous Grasp with Generalizable-Instructive Affordance

Diffusion-based Event Generation for High-Quality Image Deblurring

Distilling LLM Prior to Flow Model for Generalizable Agent’s Imagination in Object Goal Navigation

MaintaAvatar: A Maintainable Avatar Based on Neural Radiance Fields by Continual Learning

CLIP-RestoreX: Restore Image Structure and Perception in Exposure Correction

ParGo: Bridging Vision-Language with Partial and Global Views

Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks

When Shadow Removal Meets Intrinsic Image Decomposition: A Joint Learning Framework Using Unpaired Data

Panorama Generation From NFoV Image Done Right

RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images

ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation

Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model

Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding

Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

Decoupled Distillation to Erase: A General Unlearning Method for Any Class-centric Tasks

iManip: Skill-Incremental Learning for Robotic Manipulation

Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment

ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations

VIPerson: Flexibly Generating Virtual Identity for Person Re-Identification

Structure-Guided Diffusion Models for High-Fidelity Portrait Shadow Removal