Hang Xu

82

Papers

430

Total Citations

Papers (82)

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors

Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution

LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement

TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields

ACE: Anti-Editing Concept Erasure in Text-to-Image Models

EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

Effective Sparsification of Neural Networks With Global Sparsity Constraint

Joint-DetNAS: Upgrade Your Detector With NAS, Pruning and Dynamic Distillation

Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism

Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search

Point2Seq: Detecting 3D Objects As Sequences

ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-Wise Semantic Alignment and Generation

ONCE-3DLanes: Building Monocular 3D Lane Detection

Mixed Autoencoder for Self-Supervised Visual Representation Learning

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment

ConQueR: Query Contrast Voxel-DETR for 3D Object Detection

Gaussian Label Distribution Learning for Spherical Image Object Detection

Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving

CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification

G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-Guided Feature Imitation

DetCo: Unsupervised Contrastive Learning for Object Detection

Voxel Transformer for 3D Object Detection

Adversarial Robustness for Unsupervised Domain Adaptation

Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-Modal Pretraining

MultiSiam: Self-Supervised Multi-Instance Siamese Representation Learning for Autonomous Driving

NASOA: Towards Faster Task-Oriented Online Fine-Tuning With a Zoo of Models

Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection

C3-SemiSeg: Contrastive Semi-Supervised Segmentation via Cross-Set Learning and Dynamic Class-Balancing

Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection

Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

GrowCLIP: Data-Aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-Training

FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration

PIDRo: Parallel Isomeric Attention with Dynamic Routing for Text-Video Retrieval

DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

AABO: Adaptive Anchor Box Optimization for Object Detection via Bayesian Sub-sampling

JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image

CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending

CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search

PANDORA: A Panoramic Detection Dataset for Object with Orientation

MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection

Open-World Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding

Learning Ego 3D Representation As Ray Tracing

Generative Negative Text Replay for Continual Vision-Language Pretraining

CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving

RCLane: Relay Chain Prediction for Lane Detection

DevNet: Self-Supervised Monocular Depth Learning via Density Volume Construction

MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation

Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution

VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning

FreeDNA: Endowing Domain Adaptation of Diffusion-Based Dense Prediction with Training-Free Domain Noise Alignment

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

Rethinking Boundary Discontinuity Problem for Oriented Object Detection

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior

Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models

Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection

Spatial-Aware Graph Relation Network for Large-Scale Object Detection

SP-NAS: Serial-to-Parallel Backbone Search for Object Detection

TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search

Hybrid Knowledge Routed Modules for Large-scale Object Detection

Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS

Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning

SOFT: Softmax-free Transformer with Linear Complexity

Learning Transferable Features for Point Cloud Detection via 3D Contrastive Co-training

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving

Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark

OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping

CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection