Hang Xu

82
Papers
430
Total Citations

Papers (82)

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

ICLR 2025
169
citations

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

CVPR 2024
45
citations

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

CVPR 2025
44
citations

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

ICLR 2024
44
citations

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

ICCV 2025
43
citations

HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

ECCV 2024arXiv
14
citations

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

CVPR 2025
13
citations

FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors

ICCV 2025
12
citations

Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution

CVPR 2024
11
citations

LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement

AAAI 2024arXiv
10
citations

TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields

ICLR 2024
9
citations

ACE: Anti-Editing Concept Erasure in Text-to-Image Models

CVPR 2025
8
citations

EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

CVPR 2025
8
citations

Effective Sparsification of Neural Networks With Global Sparsity Constraint

CVPR 2021arXiv
0
citations

Joint-DetNAS: Upgrade Your Detector With NAS, Pruning and Dynamic Distillation

CVPR 2021
0
citations

Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism

CVPR 2022arXiv
0
citations

Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search

CVPR 2022
0
citations

Point2Seq: Detecting 3D Objects As Sequences

CVPR 2022arXiv
0
citations

ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-Wise Semantic Alignment and Generation

CVPR 2022arXiv
0
citations

ONCE-3DLanes: Building Monocular 3D Lane Detection

CVPR 2022
0
citations

Mixed Autoencoder for Self-Supervised Visual Representation Learning

CVPR 2023arXiv
0
citations

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment

CVPR 2023arXiv
0
citations

ConQueR: Query Contrast Voxel-DETR for 3D Object Detection

CVPR 2023arXiv
0
citations

Gaussian Label Distribution Learning for Spherical Image Object Detection

CVPR 2023
0
citations

Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving

CVPR 2023arXiv
0
citations

CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data

CVPR 2023
0
citations

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

CVPR 2023arXiv
0
citations

Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification

ICCV 2019
0
citations

G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-Guided Feature Imitation

ICCV 2021
0
citations

DetCo: Unsupervised Contrastive Learning for Object Detection

ICCV 2021arXiv
0
citations

Voxel Transformer for 3D Object Detection

ICCV 2021arXiv
0
citations

Adversarial Robustness for Unsupervised Domain Adaptation

ICCV 2021arXiv
0
citations

Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-Modal Pretraining

ICCV 2021arXiv
0
citations

MultiSiam: Self-Supervised Multi-Instance Siamese Representation Learning for Autonomous Driving

ICCV 2021arXiv
0
citations

NASOA: Towards Faster Task-Oriented Online Fine-Tuning With a Zoo of Models

ICCV 2021arXiv
0
citations

Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection

ICCV 2021
0
citations

C3-SemiSeg: Contrastive Semi-Supervised Segmentation via Cross-Set Learning and Dynamic Class-Balancing

ICCV 2021
0
citations

Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

ICCV 2021
0
citations

PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection

ICCV 2023arXiv
0
citations

Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach

ICCV 2023
0
citations

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

ICCV 2023arXiv
0
citations

GrowCLIP: Data-Aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-Training

ICCV 2023arXiv
0
citations

FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration

ICCV 2023arXiv
0
citations

PIDRo: Parallel Isomeric Attention with Dynamic Routing for Text-Video Retrieval

ICCV 2023
0
citations

DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment

ICCV 2023arXiv
0
citations

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

ICCV 2023arXiv
0
citations

AABO: Adaptive Anchor Box Optimization for Object Detection via Bayesian Sub-sampling

ECCV 2020
0
citations

JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image

ECCV 2020
0
citations

CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending

ECCV 2020
0
citations

CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search

ECCV 2020
0
citations

PANDORA: A Panoramic Detection Dataset for Object with Orientation

ECCV 2022
0
citations

MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection

ECCV 2022
0
citations

Open-World Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding

ECCV 2022
0
citations

Learning Ego 3D Representation As Ray Tracing

ECCV 2022
0
citations

Generative Negative Text Replay for Continual Vision-Language Pretraining

ECCV 2022
0
citations

CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving

ECCV 2022
0
citations

RCLane: Relay Chain Prediction for Lane Detection

ECCV 2022
0
citations

DevNet: Self-Supervised Monocular Depth Learning via Density Volume Construction

ECCV 2022
0
citations

MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation

ICCV 2023arXiv
0
citations

Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution

CVPR 2025
0
citations

VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning

ICCV 2025
0
citations

FreeDNA: Endowing Domain Adaptation of Diffusion-Based Dense Prediction with Training-Free Domain Noise Alignment

ICCV 2025
0
citations

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

AAAI 2024
0
citations

Rethinking Boundary Discontinuity Problem for Oriented Object Detection

CVPR 2024
0
citations

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

CVPR 2024
0
citations

DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior

CVPR 2024
0
citations

Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models

CVPR 2024
0
citations

Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection

CVPR 2019
0
citations

Spatial-Aware Graph Relation Network for Large-Scale Object Detection

CVPR 2019
0
citations

SP-NAS: Serial-to-Parallel Backbone Search for Object Detection

CVPR 2020
0
citations

TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search

CVPR 2021
0
citations

Hybrid Knowledge Routed Modules for Large-scale Object Detection

NeurIPS 2018
0
citations

Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS

NeurIPS 2020
0
citations

Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

NeurIPS 2020
0
citations

DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning

NeurIPS 2021
0
citations

SOFT: Softmax-free Transformer with Linear Complexity

NeurIPS 2021
0
citations

Learning Transferable Features for Point Cloud Detection via 3D Contrastive Co-training

NeurIPS 2021
0
citations

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

NeurIPS 2022
0
citations

Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving

NeurIPS 2022
0
citations

Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark

NeurIPS 2022
0
citations

OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping

NeurIPS 2023
0
citations

CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection

NeurIPS 2023
0
citations