Gang Yu

37

Papers

257

Total Citations

Papers (37)

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network

Learning a Discriminative Feature Network for Semantic Segmentation

MegDet: A Large Mini-Batch Object Detector

Cascaded Pyramid Network for Multi-Person Pose Estimation

Modeling Local Geometric Structure of 3D Point Clouds Using Geo-CNN

An End-To-End Network for Panoptic Segmentation

Shape Robust Text Detection With Progressive Scale Expansion Network

TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection

State-Aware Tracker for Real-Time Video Object Segmentation

Context Prior for Scene Segmentation

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

Executing Your Commands via Motion Diffusion in Latent Space

End-to-End 3D Dense Captioning With Vote2Cap-DETR

STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection

ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices

Objects365: A Large-Scale, High-Quality Dataset for Object Detection

Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering

A Large-Scale Outdoor Multi-Modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction

D&D: Learning Human Dynamics from Dynamic Camera

High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification

PM-INR: Prior-Rich Multi-Modal Implicit Large-Scale Scene Neural Representation

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning

Fast Action Proposals for Human Action Detection and Search

Learnable Tree Filter for Structure-preserving Feature Transform

Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D representations

Hierarchical Normalization for Robust Monocular Depth Estimation

PDF: Point Diffusion Implicit Function for Large-scale Scene Neural Representation

MotionGPT: Human Motion as a Foreign Language

Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation