Hang Xu

21
Papers
430
Total Citations

Papers (21)

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

ICLR 2025
169
citations

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

CVPR 2024
45
citations

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

CVPR 2025
44
citations

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

ICLR 2024
44
citations

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

ICCV 2025
43
citations

HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

ECCV 2024arXiv
14
citations

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

CVPR 2025
13
citations

FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors

ICCV 2025
12
citations

Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution

CVPR 2024
11
citations

LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement

AAAI 2024arXiv
10
citations

TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields

ICLR 2024
9
citations

ACE: Anti-Editing Concept Erasure in Text-to-Image Models

CVPR 2025
8
citations

EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

CVPR 2025
8
citations

Rethinking Boundary Discontinuity Problem for Oriented Object Detection

CVPR 2024
0
citations

VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning

ICCV 2025
0
citations

Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution

CVPR 2025
0
citations

FreeDNA: Endowing Domain Adaptation of Diffusion-Based Dense Prediction with Training-Free Domain Noise Alignment

ICCV 2025
0
citations

DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior

CVPR 2024
0
citations

Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models

CVPR 2024
0
citations

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

CVPR 2024
0
citations

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

AAAI 2024
0
citations