Hao Tan

28

Papers

913

Total Citations

Papers (28)

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

DMV3D: Denoising Multi-view Diffusion Using 3D Large Reconstruction Model

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

Numerical Pruning for Efficient Autoregressive Models

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues

Gaussian Mixture Flow Matching Models

Turbo3D: Ultra-fast Text-to-3D Generation

Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models

Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors

Generating 3D-Consistent Videos from Unposed Internet Photos

Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport

Large-scale Multi-view Tensor Clustering with Implicit Linear Kernels

MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data

RayZer: A Self-supervised Large View Synthesis Model

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation

DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

Building Vision-Language Models on Solid Foundations with Masked Distillation

Efficient Federated Incomplete Multi-View Clustering

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

EnvEdit: Environment Editing for Vision-and-Language Navigation

Learning Navigational Visual Representations with Semantic Map Supervision

Scaling Data Generation in Vision-and-Language Navigation

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer