Deli Zhao

41
Papers
204
Total Citations

Papers (41)

Space Group Constrained Crystal Generation

ICLR 2024
60
citations

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

CVPR 2025
40
citations

Latent Space Editing in Transformer-Based Flow Matching

AAAI 2024arXiv
38
citations

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

NeurIPS 2025
26
citations

Lipschitz Singularities in Diffusion Models

ICLR 2024
21
citations

Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner

CVPR 2024
9
citations

MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra

ICLR 2025
9
citations

Universally Invariant Learning in Equivariant GNNs

NeurIPS 2025
1
citations

RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-Training

CVPR 2023
0
citations

Neural Dependencies Emerging From Learning Massive Categories

CVPR 2023arXiv
0
citations

Dimensionality-Varying Diffusion Process

CVPR 2023arXiv
0
citations

LipFormer: High-Fidelity and Generalizable Talking Face Generation With a Pre-Learned Facial Codebook

CVPR 2023
0
citations

LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis

ICCV 2023arXiv
0
citations

Space-time Prompting for Video Class-incremental Learning

ICCV 2023
0
citations

ViM: Vision Middleware for Unified Downstream Transferring

ICCV 2023arXiv
0
citations

Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-Trained Vision-Language Models

ICCV 2023arXiv
0
citations

Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos

ICCV 2023arXiv
0
citations

Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy

CVPR 2025
0
citations

Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval

ICCV 2023
0
citations

Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers

ICCV 2023
0
citations

Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning

ICCV 2023arXiv
0
citations

RLIPv2: Fast Scaling of Relational Language-Image Pre-Training

ICCV 2023arXiv
0
citations

In-Domain GAN Inversion for Real Image Editing

ECCV 2020
0
citations

Self-Organizing Pathway Expansion for Non-Exemplar Class-Incremental Learning

ICCV 2023
0
citations

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

ICCV 2025
0
citations

Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting

ICCV 2025
0
citations

AnyDoor: Zero-shot Object-level Image Customization

CVPR 2024
0
citations

A New Retraction for Accelerating the Riemannian Three-Factor Low-Rank Matrix Completion Algorithm

CVPR 2015
0
citations

Sparse Coding and Dictionary Learning With Linear Dynamical Systems

CVPR 2016
0
citations

Weakly Supervised High-Fidelity Clothing Model Generation

CVPR 2022arXiv
0
citations

MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition

CVPR 2023arXiv
0
citations

DeepExposure: Learning to Expose Photos with Asynchronously Reinforced Adversarial Learning

NeurIPS 2018
0
citations

Low-Rank Subspaces in GANs

NeurIPS 2021
0
citations

Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator

NeurIPS 2022
0
citations

Improving GANs with A Dynamic Discriminator

NeurIPS 2022
0
citations

Rank Diminishing in Deep Neural Networks

NeurIPS 2022
0
citations

VideoComposer: Compositional Video Synthesis with Motion Controllability

NeurIPS 2023
0
citations

FaceComposer: A Unified Model for Versatile Facial Content Creation

NeurIPS 2023
0
citations

Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone

NeurIPS 2023
0
citations

Customizable Image Synthesis with Multiple Subjects

NeurIPS 2023
0
citations

MomentDiff: Generative Video Moment Retrieval from Random to Real

NeurIPS 2023
0
citations