LU ZHANG

28
Papers
4,473
Total Citations

Papers (28)

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

ECCV 2024arXiv
3,368
citations

Training Language Models to Self-Correct via Reinforcement Learning

ICLR 2025arXiv
305
citations

Segment and Recognize Anything at Any Granularity

ECCV 2024arXiv
226
citations

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

ICLR 2025arXiv
134
citations

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

ECCV 2024arXiv
114
citations

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

ECCV 2024arXiv
83
citations

Asynchronous Large Language Model Enhanced Planner for Autonomous Driving

ECCV 2024arXiv
33
citations

EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models

NeurIPS 2025arXiv
31
citations

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

NeurIPS 2025arXiv
30
citations

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

NeurIPS 2025arXiv
29
citations

SWE-bench Goes Live!

NeurIPS 2025arXiv
22
citations

Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning

ICLR 2025arXiv
20
citations

Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models

ECCV 2024arXiv
18
citations

Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding

ECCV 2024arXiv
11
citations

Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation

ECCV 2024arXiv
10
citations

Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding

ICLR 2025arXiv
8
citations

CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching

NeurIPS 2025arXiv
6
citations

Toward Generalizing Visual Brain Decoding to Unseen Subjects

ICLR 2025arXiv
5
citations

RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video

NeurIPS 2025arXiv
5
citations

General Geometry-aware Weakly Supervised 3D Object Detection

ECCV 2024arXiv
5
citations

ScImage: How good are multimodal large language models at scientific text-to-image generation?

ICLR 2025arXiv
4
citations

MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM

NeurIPS 2025arXiv
2
citations

S'MoRE: Structural Mixture of Residual Experts for Parameter-Efficient LLM Fine-tuning

NeurIPS 2025arXiv
2
citations

Catastrophic Overfitting: A Potential Blessing in Disguise

ECCV 2024arXiv
1
citations

Risk-aware Direct Preference Optimization under Nested Risk Measure

NeurIPS 2025arXiv
1
citations

S-Crescendo: A Nested Transformer Weaving Framework for Scalable Nonlinear System in S-Domain Representation

NeurIPS 2025arXiv
0
citations

D2SA: Dual-Stage Distribution and Slice Adaptation for Efficient Test-Time Adaptation in MRI Reconstruction

NeurIPS 2025arXiv
0
citations

Analyzing the Power of Chain of Thought through Memorization Capabilities

NeurIPS 2025arXiv
0
citations