Gang Yu

37
Papers
257
Total Citations

Papers (37)

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

CVPR 2024
108
citations

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

ICLR 2025arXiv
101
citations

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

NeurIPS 2025
23
citations

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

CVPR 2025arXiv
19
citations

DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models

CVPR 2025arXiv
6
citations

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network

CVPR 2017arXiv
0
citations

Learning a Discriminative Feature Network for Semantic Segmentation

CVPR 2018arXiv
0
citations

MegDet: A Large Mini-Batch Object Detector

CVPR 2018arXiv
0
citations

Cascaded Pyramid Network for Multi-Person Pose Estimation

CVPR 2018arXiv
0
citations

Modeling Local Geometric Structure of 3D Point Clouds Using Geo-CNN

CVPR 2019
0
citations

An End-To-End Network for Panoptic Segmentation

CVPR 2019
0
citations

Shape Robust Text Detection With Progressive Scale Expansion Network

CVPR 2019
0
citations

TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection

CVPR 2019
0
citations

State-Aware Tracker for Real-Time Video Object Segmentation

CVPR 2020arXiv
0
citations

Context Prior for Scene Segmentation

CVPR 2020arXiv
0
citations

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

CVPR 2022arXiv
0
citations

Executing Your Commands via Motion Diffusion in Latent Space

CVPR 2023arXiv
0
citations

End-to-End 3D Dense Captioning With Vote2Cap-DETR

CVPR 2023arXiv
0
citations

STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection

CVPR 2023
0
citations

ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices

ICCV 2019
0
citations

Objects365: A Large-Scale, High-Quality Dataset for Object Detection

ICCV 2019
0
citations

Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network

ICCV 2019
0
citations

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

ICCV 2023arXiv
0
citations

Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering

ICCV 2023arXiv
0
citations

A Large-Scale Outdoor Multi-Modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction

ICCV 2023arXiv
0
citations

D&D: Learning Human Dynamics from Dynamic Camera

ECCV 2022
0
citations

High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification

CVPR 2020arXiv
0
citations

PM-INR: Prior-Rich Multi-Modal Implicit Large-Scale Scene Neural Representation

AAAI 2024
0
citations

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

AAAI 2024arXiv
0
citations

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning

CVPR 2024
0
citations

Fast Action Proposals for Human Action Detection and Search

CVPR 2015
0
citations

Learnable Tree Filter for Structure-preserving Feature Transform

NeurIPS 2019
0
citations

Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D representations

NeurIPS 2022
0
citations

Hierarchical Normalization for Robust Monocular Depth Estimation

NeurIPS 2022
0
citations

PDF: Point Diffusion Implicit Function for Large-scale Scene Neural Representation

NeurIPS 2023
0
citations

MotionGPT: Human Motion as a Foreign Language

NeurIPS 2023
0
citations

Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation

NeurIPS 2023
0
citations