Di Huang

24

Papers

321

Total Citations

Papers (24)

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

GVGEN: Text-to-3D Generation with Volumetric Representation

Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems

Towards Training-free Anomaly Detection with Vision and Language Foundation Models

ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction

InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct

Micro-macro Wavelet-based Gaussian Splatting for 3D Reconstruction from Unconstrained Images

Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction

Constraint-Aware Feature Learning for Parametric Point Cloud

Progressive Parameter Efficient Transfer Learning for Semantic Segmentation

ShortFT: Diffusion Model Alignment via Shortcut-based Fine-Tuning

GigaGS: 3D Gaussian Based Planar Representation for Large-Scene Surface Reconstruction

3D²-Actor: Learning Pose-Conditioned 3D-Aware Denoiser for Realistic Gaussian Avatar Modeling

CoSDH: Communication-Efficient Collaborative Perception via Supply-Demand Awareness and Intermediate-Late Hybridization

Hypothesis, Verification, and Induction: Grounding Large Language Models with Self-Driven Skill Learning

Emergent Communication for Numerical Concepts Generalization

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge

APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers

QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code

FiT: Flexible Vision Transformer for Diffusion Model

Unveiling the Knowledge of CLIP for Training-Free Open-Vocabulary Semantic Segmentation