Zhen Li

21
Papers
171
Total Citations

Papers (21)

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

ICCV 2025
52
citations

RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation

AAAI 2024arXiv
31
citations

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding

ICLR 2025
23
citations

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

ICCV 2025
20
citations

DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view Representation

ICLR 2024
13
citations

X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-Modal Knowledge Transfer

AAAI 2024arXiv
11
citations

DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation

CVPR 2025
10
citations

Topo2Seq: Enhanced Topology Reasoning via Topology Sequence Learning

AAAI 2025
6
citations

Empowering Large Language Models with 3D Situation Awareness

CVPR 2025
3
citations

SQS: Enhancing Sparse Perception Models via Query-based Splatting in Autonomous Driving

NeurIPS 2025
1
citations

AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models

NeurIPS 2025
1
citations

Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding

ICML 2024
0
citations

DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering

CVPR 2025
0
citations

K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs

CVPR 2025
0
citations

AR-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction

ICCV 2025
0
citations

VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering

AAAI 2025
0
citations

Consistency of Compositional Generalization Across Multiple Levels

AAAI 2025
0
citations

CrossBind: Collaborative Cross-Modal Identification of Protein Nucleic-Acid-Binding Residues

AAAI 2024
0
citations

WeakPCSOD: Overcoming the Bias of Box Annotations for Weakly Supervised Point Cloud Salient Object Detection

AAAI 2024
0
citations

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

CVPR 2024
0
citations

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

CVPR 2024
0
citations