Hang Xu
82
Papers
430
Total Citations
Papers (82)
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
ICLR 2025
169
citations
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
CVPR 2024
45
citations
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
CVPR 2025
44
citations
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis
ICLR 2024
44
citations
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance
ICCV 2025
43
citations
HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
ECCV 2024arXiv
14
citations
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
CVPR 2025
13
citations
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
ICCV 2025
12
citations
Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution
CVPR 2024
11
citations
LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement
AAAI 2024arXiv
10
citations
TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields
ICLR 2024
9
citations
ACE: Anti-Editing Concept Erasure in Text-to-Image Models
CVPR 2025
8
citations
EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation
CVPR 2025
8
citations
Effective Sparsification of Neural Networks With Global Sparsity Constraint
CVPR 2021arXiv
0
citations
Joint-DetNAS: Upgrade Your Detector With NAS, Pruning and Dynamic Distillation
CVPR 2021
0
citations
Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism
CVPR 2022arXiv
0
citations
Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search
CVPR 2022
0
citations
Point2Seq: Detecting 3D Objects As Sequences
CVPR 2022arXiv
0
citations
ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-Wise Semantic Alignment and Generation
CVPR 2022arXiv
0
citations
ONCE-3DLanes: Building Monocular 3D Lane Detection
CVPR 2022
0
citations
Mixed Autoencoder for Self-Supervised Visual Representation Learning
CVPR 2023arXiv
0
citations
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment
CVPR 2023arXiv
0
citations
ConQueR: Query Contrast Voxel-DETR for 3D Object Detection
CVPR 2023arXiv
0
citations
Gaussian Label Distribution Learning for Spherical Image Object Detection
CVPR 2023
0
citations
Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving
CVPR 2023arXiv
0
citations
CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data
CVPR 2023
0
citations
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
CVPR 2023arXiv
0
citations
Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification
ICCV 2019
0
citations
G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-Guided Feature Imitation
ICCV 2021
0
citations
DetCo: Unsupervised Contrastive Learning for Object Detection
ICCV 2021arXiv
0
citations
Voxel Transformer for 3D Object Detection
ICCV 2021arXiv
0
citations
Adversarial Robustness for Unsupervised Domain Adaptation
ICCV 2021arXiv
0
citations
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-Modal Pretraining
ICCV 2021arXiv
0
citations
MultiSiam: Self-Supervised Multi-Instance Siamese Representation Learning for Autonomous Driving
ICCV 2021arXiv
0
citations
NASOA: Towards Faster Task-Oriented Online Fine-Tuning With a Zoo of Models
ICCV 2021arXiv
0
citations
Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection
ICCV 2021
0
citations
C3-SemiSeg: Contrastive Semi-Supervised Segmentation via Cross-Set Learning and Dynamic Class-Balancing
ICCV 2021
0
citations
Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection
ICCV 2021
0
citations
PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection
ICCV 2023arXiv
0
citations
Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach
ICCV 2023
0
citations
DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability
ICCV 2023arXiv
0
citations
GrowCLIP: Data-Aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-Training
ICCV 2023arXiv
0
citations
FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration
ICCV 2023arXiv
0
citations
PIDRo: Parallel Isomeric Attention with Dynamic Routing for Text-Video Retrieval
ICCV 2023
0
citations
DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment
ICCV 2023arXiv
0
citations
Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images
ICCV 2023arXiv
0
citations
AABO: Adaptive Anchor Box Optimization for Object Detection via Bayesian Sub-sampling
ECCV 2020
0
citations
JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image
ECCV 2020
0
citations
CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending
ECCV 2020
0
citations
CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search
ECCV 2020
0
citations
PANDORA: A Panoramic Detection Dataset for Object with Orientation
ECCV 2022
0
citations
MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection
ECCV 2022
0
citations
Open-World Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding
ECCV 2022
0
citations
Learning Ego 3D Representation As Ray Tracing
ECCV 2022
0
citations
Generative Negative Text Replay for Continual Vision-Language Pretraining
ECCV 2022
0
citations
CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving
ECCV 2022
0
citations
RCLane: Relay Chain Prediction for Lane Detection
ECCV 2022
0
citations
DevNet: Self-Supervised Monocular Depth Learning via Density Volume Construction
ECCV 2022
0
citations
MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation
ICCV 2023arXiv
0
citations
Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution
CVPR 2025
0
citations
VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning
ICCV 2025
0
citations
FreeDNA: Endowing Domain Adaptation of Diffusion-Based Dense Prediction with Training-Free Domain Noise Alignment
ICCV 2025
0
citations
Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images
AAAI 2024
0
citations
Rethinking Boundary Discontinuity Problem for Oriented Object Detection
CVPR 2024
0
citations
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
CVPR 2024
0
citations
DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior
CVPR 2024
0
citations
Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models
CVPR 2024
0
citations
Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection
CVPR 2019
0
citations
Spatial-Aware Graph Relation Network for Large-Scale Object Detection
CVPR 2019
0
citations
SP-NAS: Serial-to-Parallel Backbone Search for Object Detection
CVPR 2020
0
citations
TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search
CVPR 2021
0
citations
Hybrid Knowledge Routed Modules for Large-scale Object Detection
NeurIPS 2018
0
citations
Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS
NeurIPS 2020
0
citations
Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation
NeurIPS 2020
0
citations
DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning
NeurIPS 2021
0
citations
SOFT: Softmax-free Transformer with Linear Complexity
NeurIPS 2021
0
citations
Learning Transferable Features for Point Cloud Detection via 3D Contrastive Co-training
NeurIPS 2021
0
citations
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
NeurIPS 2022
0
citations
Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving
NeurIPS 2022
0
citations
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
NeurIPS 2022
0
citations
OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping
NeurIPS 2023
0
citations
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
NeurIPS 2023
0
citations