Zhen Li

57
Papers
285
Total Citations

Papers (57)

Learning Semantic Relationships for Better Action Retrieval in Images

CVPR 2015
114
citations

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

ICCV 2025
52
citations

RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation

AAAI 2024arXiv
31
citations

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding

ICLR 2025
23
citations

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

ICCV 2025
20
citations

DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view Representation

ICLR 2024
13
citations

X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-Modal Knowledge Transfer

AAAI 2024arXiv
11
citations

DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation

CVPR 2025
10
citations

Topo2Seq: Enhanced Topology Reasoning via Topology Sequence Learning

AAAI 2025
6
citations

Empowering Large Language Models with 3D Situation Awareness

CVPR 2025
3
citations

AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models

NeurIPS 2025
1
citations

SQS: Enhancing Sparse Perception Models via Query-based Splatting in Autonomous Driving

NeurIPS 2025
1
citations

Feedback Network for Image Super-Resolution

CVPR 2019
0
citations

PointASNL: Robust Point Clouds Processing Using Nonlocal Neural Networks With Adaptive Sampling

CVPR 2020arXiv
0
citations

Exemplar Normalization for Learning Deep Representation

CVPR 2020arXiv
0
citations

Temporal Modulation Network for Controllable Space-Time Video Super-Resolution

CVPR 2021arXiv
0
citations

Shallow Feature Matters for Weakly Supervised Object Localization

CVPR 2021
0
citations

X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning

CVPR 2022
0
citations

PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images

CVPR 2022
0
citations

Towards an End-to-End Framework for Flow-Guided Video Inpainting

CVPR 2022arXiv
0
citations

Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

CVPR 2022arXiv
0
citations

Semantic Human Parsing via Scalable Semantic Transfer Over Multiple Label Domains

CVPR 2023arXiv
0
citations

Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language

CVPR 2023
0
citations

BEV@DC: Bird's-Eye View Assisted Training for Depth Completion

CVPR 2023
0
citations

AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation

CVPR 2023arXiv
0
citations

Multi-View Inverse Rendering for Large-Scale Real-World Indoor Scenes

CVPR 2023arXiv
0
citations

DNF: Decouple and Feedback Network for Seeing in the Dark

CVPR 2023
0
citations

Learning Transformation-Predictive Representations for Detection and Description of Local Features

CVPR 2023
0
citations

High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference

ICCV 2017arXiv
0
citations

Semi-Supervised Video Salient Object Detection Using Pseudo-Labels

ICCV 2019
0
citations

Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds

ICCV 2021arXiv
0
citations

InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds Through Instance Multi-Level Contextual Referring

ICCV 2021arXiv
0
citations

SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training

ICCV 2023arXiv
0
citations

SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection

ICCV 2023arXiv
0
citations

LATR: 3D Lane Detection from Monocular Images with Transformer

ICCV 2023arXiv
0
citations

RankMatch: Fostering Confidence and Consistency in Learning with Noisy Labels

ICCV 2023
0
citations

SRFormer: Permuted Self-Attention for Single Image Super-Resolution

ICCV 2023arXiv
0
citations

Towards Content-Independent Multi-Reference Super-Resolution: Adaptive Pattern Matching and Feature Aggregation

ECCV 2020
0
citations

2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds

ECCV 2022
0
citations

Weakly Supervised Object Localization through Inter-class Feature Similarity and Intra-Class Appearance Consistency

ECCV 2022
0
citations

Free-Form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud

ICCV 2021arXiv
0
citations

DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering

CVPR 2025
0
citations

K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs

CVPR 2025
0
citations

AR-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction

ICCV 2025
0
citations

VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering

AAAI 2025
0
citations

Consistency of Compositional Generalization Across Multiple Levels

AAAI 2025
0
citations

CrossBind: Collaborative Cross-Modal Identification of Protein Nucleic-Acid-Binding Residues

AAAI 2024
0
citations

WeakPCSOD: Overcoming the Bias of Box Annotations for Weakly Supervised Point Cloud Salient Object Detection

AAAI 2024
0
citations

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

CVPR 2024
0
citations

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

CVPR 2024
0
citations

Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding

ICML 2024
0
citations

Blockout: Dynamic Model Selection for Hierarchical Deep Networks

CVPR 2016
0
citations

Deep Neural Nets with Interpolating Function as Output Activation

NeurIPS 2018
0
citations

Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning

NeurIPS 2022
0
citations

Let Images Give You More: Point Cloud Cross-Modal Training for Shape Analysis

NeurIPS 2022
0
citations

AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

NeurIPS 2022
0
citations

Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation

NeurIPS 2023
0
citations