Deli Zhao
41
Papers
204
Total Citations
Papers (41)
Space Group Constrained Crystal Generation
ICLR 2024
60
citations
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
CVPR 2025
40
citations
Latent Space Editing in Transformer-Based Flow Matching
AAAI 2024arXiv
38
citations
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
NeurIPS 2025
26
citations
Lipschitz Singularities in Diffusion Models
ICLR 2024
21
citations
Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner
CVPR 2024
9
citations
MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra
ICLR 2025
9
citations
Universally Invariant Learning in Equivariant GNNs
NeurIPS 2025
1
citations
RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-Training
CVPR 2023
0
citations
Neural Dependencies Emerging From Learning Massive Categories
CVPR 2023arXiv
0
citations
Dimensionality-Varying Diffusion Process
CVPR 2023arXiv
0
citations
LipFormer: High-Fidelity and Generalizable Talking Face Generation With a Pre-Learned Facial Codebook
CVPR 2023
0
citations
LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis
ICCV 2023arXiv
0
citations
Space-time Prompting for Video Class-incremental Learning
ICCV 2023
0
citations
ViM: Vision Middleware for Unified Downstream Transferring
ICCV 2023arXiv
0
citations
Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-Trained Vision-Language Models
ICCV 2023arXiv
0
citations
Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos
ICCV 2023arXiv
0
citations
Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy
CVPR 2025
0
citations
Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval
ICCV 2023
0
citations
Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers
ICCV 2023
0
citations
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
ICCV 2023arXiv
0
citations
RLIPv2: Fast Scaling of Relational Language-Image Pre-Training
ICCV 2023arXiv
0
citations
In-Domain GAN Inversion for Real Image Editing
ECCV 2020
0
citations
Self-Organizing Pathway Expansion for Non-Exemplar Class-Incremental Learning
ICCV 2023
0
citations
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
ICCV 2025
0
citations
Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting
ICCV 2025
0
citations
AnyDoor: Zero-shot Object-level Image Customization
CVPR 2024
0
citations
A New Retraction for Accelerating the Riemannian Three-Factor Low-Rank Matrix Completion Algorithm
CVPR 2015
0
citations
Sparse Coding and Dictionary Learning With Linear Dynamical Systems
CVPR 2016
0
citations
Weakly Supervised High-Fidelity Clothing Model Generation
CVPR 2022arXiv
0
citations
MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition
CVPR 2023arXiv
0
citations
DeepExposure: Learning to Expose Photos with Asynchronously Reinforced Adversarial Learning
NeurIPS 2018
0
citations
Low-Rank Subspaces in GANs
NeurIPS 2021
0
citations
Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator
NeurIPS 2022
0
citations
Improving GANs with A Dynamic Discriminator
NeurIPS 2022
0
citations
Rank Diminishing in Deep Neural Networks
NeurIPS 2022
0
citations
VideoComposer: Compositional Video Synthesis with Motion Controllability
NeurIPS 2023
0
citations
FaceComposer: A Unified Model for Versatile Facial Content Creation
NeurIPS 2023
0
citations
Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone
NeurIPS 2023
0
citations
Customizable Image Synthesis with Multiple Subjects
NeurIPS 2023
0
citations
MomentDiff: Generative Video Moment Retrieval from Random to Real
NeurIPS 2023
0
citations