Hsin-Ying Lee

33

Papers

658

Total Citations

Papers (33)

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval

Exploiting Diffusion Prior for Generalizable Dense Prediction

SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

Controllable Image Synthesis via SegVAE

4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion

UpFusion: Novel View Diffusion from Unposed Sparse View Observations

GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement

UniPhy: Learning a Unified Constitutive Model for Inverse Physics Simulation

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars

Unsupervised Volumetric Animation

Unsupervised Representation Learning by Sorting Sequences

ReDAL: Region-Based and Diversity-Aware Active Learning for Point Cloud Semantic Segmentation

Text2Tex: Text-driven Texture Synthesis via Diffusion Models

InfiniCity: Infinite-Scale City Synthesis

Neural Design Network: Graphic Layout Generation with Constraints

Semantic View Synthesis

Cross-Modal 3D Shape Generation and Manipulation

Vector Quantized Image-to-Image Translation

D2ADA: Dynamic Density-Aware Active Domain Adaptation for Semantic Segmentation

Make-a-Story: Visual Memory Conditioned Consistent Story Generation

PrEditor3D: Fast and Precise 3D Shape Editing

T2Bs: Text-to-Character Blendshapes via Video Generation

Towards Text-guided 3D Scene Composition

Soft-Segmentation Guided Object Motion Deblurring

Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis

InOut: Diverse Image Outpainting via GAN Inversion

Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scene Synthesis

SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation

Dancing to Music

Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing