Zhen Li
57
Papers
285
Total Citations
Papers (57)
Learning Semantic Relationships for Better Action Retrieval in Images
CVPR 2015
114
citations
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
ICCV 2025
52
citations
RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation
AAAI 2024arXiv
31
citations
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
ICLR 2025
23
citations
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
ICCV 2025
20
citations
DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view Representation
ICLR 2024
13
citations
X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-Modal Knowledge Transfer
AAAI 2024arXiv
11
citations
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation
CVPR 2025
10
citations
Topo2Seq: Enhanced Topology Reasoning via Topology Sequence Learning
AAAI 2025
6
citations
Empowering Large Language Models with 3D Situation Awareness
CVPR 2025
3
citations
AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models
NeurIPS 2025
1
citations
SQS: Enhancing Sparse Perception Models via Query-based Splatting in Autonomous Driving
NeurIPS 2025
1
citations
Feedback Network for Image Super-Resolution
CVPR 2019
0
citations
PointASNL: Robust Point Clouds Processing Using Nonlocal Neural Networks With Adaptive Sampling
CVPR 2020arXiv
0
citations
Exemplar Normalization for Learning Deep Representation
CVPR 2020arXiv
0
citations
Temporal Modulation Network for Controllable Space-Time Video Super-Resolution
CVPR 2021arXiv
0
citations
Shallow Feature Matters for Weakly Supervised Object Localization
CVPR 2021
0
citations
X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning
CVPR 2022
0
citations
PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images
CVPR 2022
0
citations
Towards an End-to-End Framework for Flow-Guided Video Inpainting
CVPR 2022arXiv
0
citations
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
CVPR 2022arXiv
0
citations
Semantic Human Parsing via Scalable Semantic Transfer Over Multiple Label Domains
CVPR 2023arXiv
0
citations
Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language
CVPR 2023
0
citations
BEV@DC: Bird's-Eye View Assisted Training for Depth Completion
CVPR 2023
0
citations
AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation
CVPR 2023arXiv
0
citations
Multi-View Inverse Rendering for Large-Scale Real-World Indoor Scenes
CVPR 2023arXiv
0
citations
DNF: Decouple and Feedback Network for Seeing in the Dark
CVPR 2023
0
citations
Learning Transformation-Predictive Representations for Detection and Description of Local Features
CVPR 2023
0
citations
High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference
ICCV 2017arXiv
0
citations
Semi-Supervised Video Salient Object Detection Using Pseudo-Labels
ICCV 2019
0
citations
Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds
ICCV 2021arXiv
0
citations
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds Through Instance Multi-Level Contextual Referring
ICCV 2021arXiv
0
citations
SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training
ICCV 2023arXiv
0
citations
SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection
ICCV 2023arXiv
0
citations
LATR: 3D Lane Detection from Monocular Images with Transformer
ICCV 2023arXiv
0
citations
RankMatch: Fostering Confidence and Consistency in Learning with Noisy Labels
ICCV 2023
0
citations
SRFormer: Permuted Self-Attention for Single Image Super-Resolution
ICCV 2023arXiv
0
citations
Towards Content-Independent Multi-Reference Super-Resolution: Adaptive Pattern Matching and Feature Aggregation
ECCV 2020
0
citations
2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds
ECCV 2022
0
citations
Weakly Supervised Object Localization through Inter-class Feature Similarity and Intra-Class Appearance Consistency
ECCV 2022
0
citations
Free-Form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud
ICCV 2021arXiv
0
citations
DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
CVPR 2025
0
citations
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs
CVPR 2025
0
citations
AR-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction
ICCV 2025
0
citations
VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering
AAAI 2025
0
citations
Consistency of Compositional Generalization Across Multiple Levels
AAAI 2025
0
citations
CrossBind: Collaborative Cross-Modal Identification of Protein Nucleic-Acid-Binding Residues
AAAI 2024
0
citations
WeakPCSOD: Overcoming the Bias of Box Annotations for Weakly Supervised Point Cloud Salient Object Detection
AAAI 2024
0
citations
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
CVPR 2024
0
citations
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
CVPR 2024
0
citations
Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding
ICML 2024
0
citations
Blockout: Dynamic Model Selection for Hierarchical Deep Networks
CVPR 2016
0
citations
Deep Neural Nets with Interpolating Function as Output Activation
NeurIPS 2018
0
citations
Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning
NeurIPS 2022
0
citations
Let Images Give You More: Point Cloud Cross-Modal Training for Shape Analysis
NeurIPS 2022
0
citations
AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation
NeurIPS 2022
0
citations
Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation
NeurIPS 2023
0
citations