Xinggang Wang

53

Papers

1,897

Total Citations

Papers (53)

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

Boundary-preserving Mask R-CNN

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

MobileInst: Video Instance Segmentation on the Mobile

ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention

Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding

MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling

Human De-Occlusion: Invisible Perception and Recovery for Humans

Weakly-Supervised Instance Segmentation via Class-Agnostic Learning With Salient Images

MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens

AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception

Sparse Instance Activation for Real-Time Instance Segmentation

Temporally Efficient Vision Transformer for Video Instance Segmentation

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

PD-Quant: Post-Training Quantization Based on Prediction Difference Metric

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

BoxTeacher: Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation

RILS: Masked Visual Reconstruction in Language Semantic Space

Boosting Low-Data Instance Segmentation by Unsupervised Pre-Training With Saliency Prompt

Relaxed Multiple-Instance SVM With Application to Object Discovery

Object-Level Proposals

CCNet: Criss-Cross Attention for Semantic Segmentation

Instances As Queries

Crossover Learning for Fast Online Video Instance Segmentation

Hierarchical Aggregation for 3D Instance Segmentation

TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance

Query6DoF: Learning Sparse Queries as Implicit Shape Prior for Category-Level 6DoF Pose Estimation

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection

VAD: Vectorized Scene Representation for Efficient Autonomous Driving

ByteTrack: Multi-Object Tracking by Associating Every Detection Box

Robust Multi-Object Tracking by Marginal Inference

AiATrack: Attention in Attention for Transformer Visual Tracking

Context-Sensitive Temporal Feature Learning for Gait Recognition

DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving

YOLO-World: Real-Time Open-Vocabulary Object Detection

Symphonize 3D Semantic Scene Completion with Contextual Instance Queries

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection

Robust Scene Text Recognition With Automatic Rectification

Multiple Instance Detection Network With Online Instance Classifier Refinement

Weakly-Supervised Semantic Segmentation Network With Deep Seeded Region Growing

RENAS: Reinforced Evolutionary Neural Architecture Search

Mask Scoring R-CNN

Direct Object Recognition Without Line-Of-Sight Using Optical Coherence

Densely Connected Search Space for More Flexible Neural Architecture Search

You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

Circuit as Set of Points