XU

55
Papers
1,511
Total Citations

Papers (55)

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

ICLR 2025arXiv
351
citations

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

ECCV 2024arXiv
163
citations

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

ICLR 2025arXiv
121
citations

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

NeurIPS 2025arXiv
118
citations

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving

ECCV 2024arXiv
112
citations

MoBA: Mixture of Block Attention for Long-Context LLMs

NeurIPS 2025arXiv
94
citations

Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance

ECCV 2024arXiv
92
citations

ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

ICLR 2025arXiv
65
citations

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

NeurIPS 2025arXiv
52
citations

On the Role of Attention Heads in Large Language Model Safety

ICLR 2025arXiv
40
citations

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

ICLR 2025arXiv
34
citations

CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning

ICLR 2025arXiv
33
citations

Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection

ICLR 2025arXiv
26
citations

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

ECCV 2024arXiv
19
citations

ConGeo: Robust Cross-view Geo-localization across Ground View Variations

ECCV 2024arXiv
19
citations

Implicit Concept Removal of Diffusion Models

ECCV 2024arXiv
18
citations

Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models

ECCV 2024arXiv
15
citations

EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head

ECCV 2024arXiv
14
citations

ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer

ECCV 2024arXiv
13
citations

SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

ICLR 2025arXiv
12
citations

LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models

ICLR 2025arXiv
11
citations

TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation

ICLR 2025arXiv
11
citations

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models

ICLR 2025arXiv
11
citations

Few-shot NeRF by Adaptive Rendering Loss Regularization

ECCV 2024arXiv
10
citations

From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots

NeurIPS 2025arXiv
8
citations

CTSyn: A Foundation Model for Cross Tabular Data Generation

ICLR 2025arXiv
7
citations

Reinforcement learning with combinatorial actions for coupled restless bandits

ICLR 2025arXiv
5
citations

MIRA: Medical Time Series Foundation Model for Real-World Health Data

NeurIPS 2025arXiv
4
citations

Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control

NeurIPS 2025arXiv
3
citations

Optimal Brain Apoptosis

ICLR 2025arXiv
3
citations

ECD: A Machine Learning Benchmark for Predicting Enhanced-Precision Electronic Charge Density in Crystalline Inorganic Materials

ICLR 2025
3
citations

AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

NeurIPS 2025arXiv
3
citations

Efficient Federated Learning against Byzantine Attacks and Data Heterogeneity via Aggregating Normalized Gradients

NeurIPS 2025arXiv
3
citations

Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree Search

NeurIPS 2025arXiv
3
citations

HoPE: Hybrid of Position Embedding for Long Context Vision-Language Models

NeurIPS 2025arXiv
2
citations

Measuring And Improving Engagement of Text-to-Image Generation Models

ICLR 2025
2
citations

Easing Training Process of Rectified Flow Models Via Lengthening Inter-Path Distance

ICLR 2025
2
citations

An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning

ICLR 2025arXiv
2
citations

OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps

NeurIPS 2025arXiv
1
citations

4DGCPro: Efficient Hierarchical 4D Gaussian Compression for Progressive Volumetric Video Streaming

NeurIPS 2025arXiv
1
citations

Self-Verifying Reflection Helps Transformers with CoT Reasoning

NeurIPS 2025arXiv
1
citations

Handling Label Noise via Instance-Level Difficulty Modeling and Dynamic Optimization

NeurIPS 2025arXiv
1
citations

High-order Interactions Modeling for Interpretable Multi-Agent Q-Learning

NeurIPS 2025arXiv
1
citations

NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding

NeurIPS 2025arXiv
1
citations

VADTree: Explainable Training-Free Video Anomaly Detection via Hierarchical Granularity-Aware Tree

NeurIPS 2025arXiv
1
citations

Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments

NeurIPS 2025arXiv
0
citations

MuSLR: Multimodal Symbolic Logical Reasoning

NeurIPS 2025arXiv
0
citations

Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models

NeurIPS 2025arXiv
0
citations

Spiking Neural Networks Need High-Frequency Information

NeurIPS 2025arXiv
0
citations

Learning with Restricted Boltzmann Machines: Asymptotics of AMP and GD in High Dimensions

NeurIPS 2025arXiv
0
citations

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

NeurIPS 2025arXiv
0
citations

OccProphet: Pushing the Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with an Observer-Forecaster-Refiner Framework

ICLR 2025
0
citations

Functional Matching of Logic Subgraphs: Beyond Structural Isomorphism

NeurIPS 2025arXiv
0
citations

HyRF: Hybrid Radiance Fields for Memory-efficient and High-quality Novel View Synthesis

NeurIPS 2025arXiv
0
citations

EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval

NeurIPS 2025arXiv
0
citations