LU ZHANG
28
Papers
4,473
Total Citations
Papers (28)
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
ECCV 2024arXiv
3,368
citations
Training Language Models to Self-Correct via Reinforcement Learning
ICLR 2025arXiv
305
citations
Segment and Recognize Anything at Any Granularity
ECCV 2024arXiv
226
citations
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
ICLR 2025arXiv
134
citations
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
ECCV 2024arXiv
114
citations
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
ECCV 2024arXiv
83
citations
Asynchronous Large Language Model Enhanced Planner for Autonomous Driving
ECCV 2024arXiv
33
citations
EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models
NeurIPS 2025arXiv
31
citations
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank
NeurIPS 2025arXiv
30
citations
Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
NeurIPS 2025arXiv
29
citations
SWE-bench Goes Live!
NeurIPS 2025arXiv
22
citations
Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
ICLR 2025arXiv
20
citations
Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models
ECCV 2024arXiv
18
citations
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
ECCV 2024arXiv
11
citations
Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation
ECCV 2024arXiv
10
citations
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding
ICLR 2025arXiv
8
citations
CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
NeurIPS 2025arXiv
6
citations
Toward Generalizing Visual Brain Decoding to Unseen Subjects
ICLR 2025arXiv
5
citations
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
NeurIPS 2025arXiv
5
citations
General Geometry-aware Weakly Supervised 3D Object Detection
ECCV 2024arXiv
5
citations
ScImage: How good are multimodal large language models at scientific text-to-image generation?
ICLR 2025arXiv
4
citations
MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM
NeurIPS 2025arXiv
2
citations
S'MoRE: Structural Mixture of Residual Experts for Parameter-Efficient LLM Fine-tuning
NeurIPS 2025arXiv
2
citations
Catastrophic Overfitting: A Potential Blessing in Disguise
ECCV 2024arXiv
1
citations
Risk-aware Direct Preference Optimization under Nested Risk Measure
NeurIPS 2025arXiv
1
citations
S-Crescendo: A Nested Transformer Weaving Framework for Scalable Nonlinear System in S-Domain Representation
NeurIPS 2025arXiv
0
citations
D2SA: Dual-Stage Distribution and Slice Adaptation for Efficient Test-Time Adaptation in MRI Reconstruction
NeurIPS 2025arXiv
0
citations
Analyzing the Power of Chain of Thought through Memorization Capabilities
NeurIPS 2025arXiv
0
citations