Hao Tan

28
Papers
913
Total Citations

Papers (28)

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

ECCV 2024
245
citations

DMV3D: Denoising Multi-view Diffusion Using 3D Large Reconstruction Model

ICLR 2024
227
citations

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

ICLR 2024
154
citations

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

ICLR 2025
86
citations

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

CVPR 2025
61
citations

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

ICCV 2025
56
citations

Numerical Pruning for Efficient Autoregressive Models

AAAI 2025
22
citations

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

CVPR 2024
21
citations

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues

AAAI 2024arXiv
13
citations

Gaussian Mixture Flow Matching Models

ICML 2025
8
citations

Turbo3D: Ultra-fast Text-to-3D Generation

CVPR 2025
6
citations

Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models

AAAI 2025
6
citations

Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors

CVPR 2025
4
citations

Generating 3D-Consistent Videos from Unposed Internet Photos

CVPR 2025
4
citations

Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport

CVPR 2025
0
citations

Large-scale Multi-view Tensor Clustering with Implicit Linear Kernels

CVPR 2025
0
citations

MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data

CVPR 2025
0
citations

RayZer: A Self-supervised Large View Synthesis Model

ICCV 2025
0
citations

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation

ICCV 2025
0
citations

DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes

ICCV 2025
0
citations

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

AAAI 2025
0
citations

Building Vision-Language Models on Solid Foundations with Masked Distillation

CVPR 2024
0
citations

Efficient Federated Incomplete Multi-View Clustering

ICML 2025
0
citations

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

CVPR 2017arXiv
0
citations

EnvEdit: Environment Editing for Vision-and-Language Navigation

CVPR 2022arXiv
0
citations

Learning Navigational Visual Representations with Semantic Map Supervision

ICCV 2023arXiv
0
citations

Scaling Data Generation in Vision-and-Language Navigation

ICCV 2023arXiv
0
citations

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer

NeurIPS 2021
0
citations