Zhen Li
21
Papers
171
Total Citations
Papers (21)
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
ICCV 2025
52
citations
RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation
AAAI 2024arXiv
31
citations
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
ICLR 2025
23
citations
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
ICCV 2025
20
citations
DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view Representation
ICLR 2024
13
citations
X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-Modal Knowledge Transfer
AAAI 2024arXiv
11
citations
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation
CVPR 2025
10
citations
Topo2Seq: Enhanced Topology Reasoning via Topology Sequence Learning
AAAI 2025
6
citations
Empowering Large Language Models with 3D Situation Awareness
CVPR 2025
3
citations
SQS: Enhancing Sparse Perception Models via Query-based Splatting in Autonomous Driving
NeurIPS 2025
1
citations
AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models
NeurIPS 2025
1
citations
Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding
ICML 2024
0
citations
DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
CVPR 2025
0
citations
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs
CVPR 2025
0
citations
AR-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction
ICCV 2025
0
citations
VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering
AAAI 2025
0
citations
Consistency of Compositional Generalization Across Multiple Levels
AAAI 2025
0
citations
CrossBind: Collaborative Cross-Modal Identification of Protein Nucleic-Acid-Binding Residues
AAAI 2024
0
citations
WeakPCSOD: Overcoming the Bias of Box Annotations for Weakly Supervised Point Cloud Salient Object Detection
AAAI 2024
0
citations
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
CVPR 2024
0
citations
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
CVPR 2024
0
citations