Bin Zhu

8

Papers

363

Total Citations

Papers (8)

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses

Hand1000: Generating Realistic Hands from Text with Only 1,000 Images

From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning

Intersecting-Boundary-Sensitive Fingerprinting for Tampering Detection of DNN Models

PolarNeXt: Rethink Instance Segmentation with Polar Representation

RAGG: Retrieval-Augmented Grasp Generation Model

HD-EPIC: A Highly-Detailed Egocentric Video Dataset