XU
55
Papers
1,511
Total Citations
Papers (55)
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
ICLR 2025arXiv
351
citations
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
ECCV 2024arXiv
163
citations
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
ICLR 2025arXiv
121
citations
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
NeurIPS 2025arXiv
118
citations
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
ECCV 2024arXiv
112
citations
MoBA: Mixture of Block Attention for Long-Context LLMs
NeurIPS 2025arXiv
94
citations
Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance
ECCV 2024arXiv
92
citations
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
ICLR 2025arXiv
65
citations
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
NeurIPS 2025arXiv
52
citations
On the Role of Attention Heads in Large Language Model Safety
ICLR 2025arXiv
40
citations
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
ICLR 2025arXiv
34
citations
CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning
ICLR 2025arXiv
33
citations
Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection
ICLR 2025arXiv
26
citations
TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling
ECCV 2024arXiv
19
citations
ConGeo: Robust Cross-view Geo-localization across Ground View Variations
ECCV 2024arXiv
19
citations
Implicit Concept Removal of Diffusion Models
ECCV 2024arXiv
18
citations
Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models
ECCV 2024arXiv
15
citations
EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head
ECCV 2024arXiv
14
citations
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer
ECCV 2024arXiv
13
citations
SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
ICLR 2025arXiv
12
citations
LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models
ICLR 2025arXiv
11
citations
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
ICLR 2025arXiv
11
citations
RelitLRM: Generative Relightable Radiance for Large Reconstruction Models
ICLR 2025arXiv
11
citations
Few-shot NeRF by Adaptive Rendering Loss Regularization
ECCV 2024arXiv
10
citations
From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots
NeurIPS 2025arXiv
8
citations
CTSyn: A Foundation Model for Cross Tabular Data Generation
ICLR 2025arXiv
7
citations
Reinforcement learning with combinatorial actions for coupled restless bandits
ICLR 2025arXiv
5
citations
MIRA: Medical Time Series Foundation Model for Real-World Health Data
NeurIPS 2025arXiv
4
citations
Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control
NeurIPS 2025arXiv
3
citations
Optimal Brain Apoptosis
ICLR 2025arXiv
3
citations
ECD: A Machine Learning Benchmark for Predicting Enhanced-Precision Electronic Charge Density in Crystalline Inorganic Materials
ICLR 2025
3
citations
AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
NeurIPS 2025arXiv
3
citations
Efficient Federated Learning against Byzantine Attacks and Data Heterogeneity via Aggregating Normalized Gradients
NeurIPS 2025arXiv
3
citations
Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree Search
NeurIPS 2025arXiv
3
citations
HoPE: Hybrid of Position Embedding for Long Context Vision-Language Models
NeurIPS 2025arXiv
2
citations
Measuring And Improving Engagement of Text-to-Image Generation Models
ICLR 2025
2
citations
Easing Training Process of Rectified Flow Models Via Lengthening Inter-Path Distance
ICLR 2025
2
citations
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
ICLR 2025arXiv
2
citations
OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps
NeurIPS 2025arXiv
1
citations
4DGCPro: Efficient Hierarchical 4D Gaussian Compression for Progressive Volumetric Video Streaming
NeurIPS 2025arXiv
1
citations
Self-Verifying Reflection Helps Transformers with CoT Reasoning
NeurIPS 2025arXiv
1
citations
Handling Label Noise via Instance-Level Difficulty Modeling and Dynamic Optimization
NeurIPS 2025arXiv
1
citations
High-order Interactions Modeling for Interpretable Multi-Agent Q-Learning
NeurIPS 2025arXiv
1
citations
NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding
NeurIPS 2025arXiv
1
citations
VADTree: Explainable Training-Free Video Anomaly Detection via Hierarchical Granularity-Aware Tree
NeurIPS 2025arXiv
1
citations
Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments
NeurIPS 2025arXiv
0
citations
MuSLR: Multimodal Symbolic Logical Reasoning
NeurIPS 2025arXiv
0
citations
Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models
NeurIPS 2025arXiv
0
citations
Spiking Neural Networks Need High-Frequency Information
NeurIPS 2025arXiv
0
citations
Learning with Restricted Boltzmann Machines: Asymptotics of AMP and GD in High Dimensions
NeurIPS 2025arXiv
0
citations
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
NeurIPS 2025arXiv
0
citations
OccProphet: Pushing the Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with an Observer-Forecaster-Refiner Framework
ICLR 2025
0
citations
Functional Matching of Logic Subgraphs: Beyond Structural Isomorphism
NeurIPS 2025arXiv
0
citations
HyRF: Hybrid Radiance Fields for Memory-efficient and High-quality Novel View Synthesis
NeurIPS 2025arXiv
0
citations
EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval
NeurIPS 2025arXiv
0
citations