Wei-Shi Zheng

93

Papers

122

Total Citations

Papers (93)

LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models

Dexterous Grasp Transformer

Single-View Scene Point Cloud Human Grasp Generation

ViSpeak: Visual Instruction Feedback in Streaming Videos

Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation

Factorized Diffusion Autoencoder for Unsupervised Disentangled Representation Learning

DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation

Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework

NECA: Neural Customizable Human Avatar

Person De-reidentification: A Variation-guided Identity Shift Modeling

DNF-Intrinsic: Deterministic Noise-Free Diffusion for Indoor Inverse Rendering

EntityErasure: Erasing Entity Cleanly via Amodal Entity Segmentation and Completion

FA: Forced Prompt Learning of Vision-Language Models for Out-of-Distribution Detection

Domain Generalizable Portrait Style Transfer

Learning Implicit Features with Flow-Infused Transformations for Realistic Virtual Try-On

Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels

CLIP-RestoreX: Restore Image Structure and Perception in Exposure Correction

ParGo: Bridging Vision-Language with Partial and Global Views

When Shadow Removal Meets Intrinsic Image Decomposition: A Joint Learning Framework Using Unpaired Data

Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model

Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding

Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment

Jointly Learning Heterogeneous Features for RGB-D Activity Recognition

Top-Push Video-Based Person Re-Identification

A Matrix Splitting Method for Composite Function Minimization

Weakly Supervised Person Re-Identification

Distilled Person Re-Identification: Towards a More Scalable System

Unsupervised Person Re-Identification by Soft Multilabel Learning

Progressive Teacher-Student Learning for Early Action Prediction

Patch-Based Discriminative Feature Learning for Unsupervised Person Re-Identification

Learning to Learn Relation for Important People Detection in Still Images

Weakly Supervised Open-Set Domain Adaptation by Dual-Domain Collaboration

A Decomposition Algorithm for the Sparse Generalized Eigenvalue Problem

Underexposed Photo Enhancement Using Deep Illumination Estimation

Deep Dual Relation Modeling for Egocentric Interaction Recognition

Learning to Detect Important People in Unlabelled Images for Semi-Supervised Important People Detection

Adaptive Interaction Modeling via Graph Operations Search

Spatial-Temporal Graph Convolutional Network for Video-Based Person Re-Identification

Weakly Supervised Discriminative Feature Learning With State Information for Person Identification

Squeeze-and-Attention Networks for Semantic Segmentation

MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection

Graph-Based High-Order Relation Modeling for Long-Term Action Recognition

Combined Depth Space Based Architecture Search for Person Re-Identification

Learning 3D Shape Feature for Texture-Insensitive Person Re-Identification

Fine-Grained Shape-Appearance Mutual Learning for Cloth-Changing Person Re-Identification

SIOD: Single Instance Annotated per Category per Image for Object Detection

Learning To Imagine: Diversify Memory for Incremental Learning Using Unlabeled Data

Likert Scoring With Grade Decoupling for Long-Term Action Assessment

Hierarchical Semantic Correspondence Networks for Video Paragraph Grounding

Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification

Collaborative Static and Dynamic Vision-Language Streams for Spatio-Temporal Video Grounding

Generating Anomalies for Video Anomaly Detection With Prompt-Based Feature Mapping

AsyFOD: An Asymmetric Adaptation Paradigm for Few-Shot Domain Adaptive Object Detection

Multi-Scale Learning for Low-Resolution Person Re-Identification

Cross-View Asymmetric Metric Learning for Unsupervised Person Re-Identification

RGB-Infrared Cross-Modality Person Re-Identification

Action Assessment by Joint Relation Graphs

Unsupervised Person Re-Identification by Camera-Aware Similarity Consistency Learning

Learning To Know Where To See: A Visibility-Aware Approach for Occluded Person Re-Identification

Predictive Feature Learning for Future Segmentation Prediction

Weakly Supervised Text-Based Person Re-Identification

Estimator Meets Equilibrium Perspective: A Rectified Straight Through Estimator for Binary Neural Networks Training

ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation

Event-Guided Procedure Planning from Instructional Videos with Text Supervision

Revisit PCA-based Technique for Out-of-Distribution Detection

When Prompt-based Incremental Learning Does Not Meet Strong Pretraining

Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians

MINI-Net: Multiple Instance Ranking Network for Video Highlight Detection

An Asymmetric Modeling for Action Assessment

Adversarial Partial Domain Adaptation by Cycle Inconsistency

AcroFOD: An Adaptive Method for Cross-Domain Few-Shot Object Detection

Partial Person Re-Identification

Decoupled Distillation to Erase: A General Unlearning Method for Any Class-centric Tasks

ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation

RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images

Panorama Generation From NFoV Image Done Right

Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks

Diffusion-based Event Generation for High-Quality Image Deblurring

AffordDexGrasp: Open-set Language-guided Dexterous Grasp with Generalizable-Instructive Affordance

Less Static, More Private: Towards Transferable Privacy-Preserving Action Recognition by Generative Decoupled Learning

iManip: Skill-Incremental Learning for Robotic Manipulation

ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations

VIPerson: Flexibly Generating Virtual Identity for Person Re-Identification

Structure-Guided Diffusion Models for High-Fidelity Portrait Shadow Removal

monoVLN: Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation

Distilling LLM Prior to Flow Model for Generalizable Agent’s Imagination in Object Goal Navigation

MaintaAvatar: A Maintainable Avatar Based on Neural Radiance Fields by Continual Learning

Action-guided 3D Human Motion Prediction

Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval

Inner-Outer Aware Reconstruction Model for Monocular 3D Scene Reconstruction

Diversifying Spatial-Temporal Perception for Video Domain Generalization

Temporal Continual Learning with Prior Compensation for Human Motion Prediction