Tong He

50

Papers

230

Total Citations

Papers (50)

Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

Aether: Geometric-Aware Unified World Modeling

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers

TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation

Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning

S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation

Boosting Residual Networks with Group Knowledge

ProAdvPrompter: A Two-Stage Journey to Effective Adversarial Prompting for LLMs

GigaGS: 3D Gaussian Based Planar Representation for Large-Scene Surface Reconstruction

Knowledge Adaptation for Efficient Semantic Segmentation

GIF2Video: Color Dequantization and Temporal Interpolation of GIF Images

Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation

GeoNet: Deep Geodesic Networks for Point Cloud Analysis

ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network

DyCo3D: Robust Instance Segmentation of 3D Point Clouds Through Dynamic Convolution

HCRF-Flow: Scene Flow From Point Clouds With Continuous High-Order CRFs and Position-Aware Flow Embedding

GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds

PVT-SSD: Single-Stage 3D Object Detector With Point-Voxel Transformer

MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling With Informative-Preserved Reconstruction and Self-Distilled Consistency

Crossing the Gap: Domain Generalization for Image Captioning

Single Shot Text Detector With Regional Attention

FCOS: Fully Convolutional One-Stage Object Detection

Learning Hierarchical Graph Neural Networks for Image Clustering

ARCH++: Animation-Ready Clothed Human Reconstruction Revisited

Ponder: Point Cloud Pre-training via Neural Rendering

Object-Centric Multiple Object Tracking

Unsupervised Open-Vocabulary Object Localization in Videos

Coarse-to-Fine Amodal Segmentation with Shape Prior

Learning and Memorizing Representative Prototypes for 3D Point Cloud Semantic and Instance Segmentation

Instance-Aware Embedding for Point Cloud Instance Segmentation

PointInst3D: Segmenting 3D Instances by Points

PSS: Progressive Sample Selection for Open-World Visual Representation Learning

Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation

GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds

Frozen CLIP Transformer Is an Efficient Point Cloud Encoder

Learning for Transductive Threshold Calibration in Open-World Recognition

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

Point Transformer V3: Simpler Faster Stronger

Sparse Autoencoders, Again?

An End-to-End TextSpotter With Explicit Alignment and Attention

Bag of Tricks for Image Classification with Convolutional Neural Networks

Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction

Progressive Coordinate Transforms for Monocular 3D Object Detection

GRIN: Generative Relation and Intention Network for Multi-agent Trajectory Prediction

Self-supervised Amodal Video Object Segmentation

Learning Manifold Dimensions with Conditional Variational Autoencoders