Di Huang

53

Papers

356

Total Citations

Papers (53)

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

GVGEN: Text-to-3D Generation with Volumetric Representation

Beyond 3DMM Space: Towards Fine-grained 3D Face Reconstruction

Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems

Towards Training-free Anomaly Detection with Vision and Language Foundation Models

ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction

InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct

Micro-macro Wavelet-based Gaussian Splatting for 3D Reconstruction from Unconstrained Images

Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction

Constraint-Aware Feature Learning for Parametric Point Cloud

Progressive Parameter Efficient Transfer Learning for Semantic Segmentation

ShortFT: Diffusion Model Alignment via Shortcut-based Fine-Tuning

GigaGS: 3D Gaussian Based Planar Representation for Large-Scene Surface Reconstruction

Cross-domain Object Detection through Coarse-to-Fine Feature Adaptation

ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo

Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection

ImFace: A Nonlinear 3D Morphable Face Model With Implicit Neural Representations

CAT-Det: Contrastively Augmented Transformer for Multi-Modal 3D Object Detection

Entropy-Based Active Learning for Object Detection With Progressive Diversity Constraint

OcTr: Octree-Based Transformer for 3D Object Detection

NeuFace: Realistic 3D Neural Face Rendering From Multi-View Images

Adaptive Sparse Convolutional Networks With Global Context Enhancement for Faster Object Detection on Drone Images

PR-GCN: A Deep Graph Convolutional Network With Point Refinement for 6D Pose Estimation

Image Inpainting via Conditional Texture and Structure Dual Generation

Unilaterally Aggregated Contrastive Learning with Hierarchical Augmentation for Anomaly Detection

Denoising Diffusion Autoencoders are Unified Self-supervised Learners

DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration

Multi-Scale Positive Sample Refinement for Few-Shot Object Detection

Improving Object Detection with Selective Self-Supervised Self-Training

Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw Puzzles

Motion Sensitive Contrastive Learning for Self-Supervised Video Representation

Ponder: Point Cloud Pre-training via Neural Rendering

APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers

CoSDH: Communication-Efficient Collaborative Perception via Supply-Demand Awareness and Intermediate-Late Hybridization

QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code

Unveiling the Knowledge of CLIP for Training-Free Open-Vocabulary Semantic Segmentation

3D²-Actor: Learning Pose-Conditioned 3D-Aware Denoiser for Realistic Gaussian Avatar Modeling

Hypothesis, Verification, and Induction: Grounding Large Language Models with Self-Driven Skill Learning

Emergent Communication for Numerical Concepts Generalization

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge

FiT: Flexible Vision Transformer for Diffusion Model

Learning Face Age Progression: A Pyramid Architecture of GANs

Led3D: A Lightweight and Efficient Deep Approach to Recognizing Low-Quality 3D Faces

Adaptive NMS: Refining Pedestrian Detection in a Crowd

Fixed-Point Back-Propagation Training

OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models

Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images

Compressed Video Prompt Tuning

Emergent Communication for Rules Reasoning

ANPL: Towards Natural Programming with Interactive Decomposition