Hang Xu

21

Papers

430

Total Citations

Papers (21)

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors

Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution

LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement

TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields

ACE: Anti-Editing Concept Erasure in Text-to-Image Models

EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

Rethinking Boundary Discontinuity Problem for Oriented Object Detection

VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning

Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution

FreeDNA: Endowing Domain Adaptation of Diffusion-Based Dense Prediction with Training-Free Domain Noise Alignment

DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior

Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images