Zicheng Liu
58
Papers
283
Total Citations
Papers (58)
MogaNet: Multi-order Gated Aggregation Network
ICLR 2024
125
citations
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
CVPR 2024
49
citations
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
CVPR 2025
32
citations
SemiReward: A General Reward Model for Semi-supervised Learning
ICLR 2024
18
citations
PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for Efficient and Generalizable Compound-Protein Interaction Prediction
AAAI 2024arXiv
18
citations
CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph
ICLR 2025
16
citations
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
ICLR 2025
14
citations
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
CVPR 2025arXiv
6
citations
DaCapo: Score Distillation as Stacked Bridge for Fast and High-quality 3D Editing
CVPR 2025
4
citations
Exploring Invariance in Images through One-way Wave Equations
ICML 2025
1
citations
StrokeNUWA—Tokenizing Strokes for Vector Graphic Synthesis
ICML 2024
0
citations
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
ICML 2024
0
citations
Large Scale Incremental Learning
CVPR 2019
0
citations
Rethinking Classification and Localization for Object Detection
CVPR 2020arXiv
0
citations
Dynamic Convolution: Attention Over Convolution Kernels
CVPR 2020arXiv
0
citations
Probabilistic Model Distillation for Semantic Correspondence
CVPR 2021
0
citations
End-to-End Human Pose and Mesh Reconstruction with Transformers
CVPR 2021arXiv
0
citations
Mobile-Former: Bridging MobileNet and Transformer
CVPR 2022
0
citations
Lifelong Unsupervised Domain Adaptive Person Re-Identification With Coordinated Anti-Forgetting and Adaptation
CVPR 2022arXiv
0
citations
Cross-Modal Representation Learning for Zero-Shot Action Recognition
CVPR 2022arXiv
0
citations
SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning
CVPR 2022arXiv
0
citations
An Empirical Study of Training End-to-End Vision-and-Language Transformers
CVPR 2022arXiv
0
citations
Injecting Semantic Concepts Into End-to-End Image Captioning
CVPR 2022arXiv
0
citations
Scaling Up Vision-Language Pre-Training for Image Captioning
CVPR 2022arXiv
0
citations
Deep Frequency Filtering for Domain Generalization
CVPR 2023arXiv
0
citations
Adaptive Human Matting for Dynamic Videos
CVPR 2023arXiv
0
citations
An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling
CVPR 2023arXiv
0
citations
Binary Latent Diffusion
CVPR 2023arXiv
0
citations
LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling
CVPR 2023arXiv
0
citations
Neural Voting Field for Camera-Space 3D Hand Pose Estimation
CVPR 2023arXiv
0
citations
Compressing Visual-Linguistic Model via Knowledge Distillation
ICCV 2021arXiv
0
citations
End-to-End Semi-Supervised Object Detection With Soft Teacher
ICCV 2021arXiv
0
citations
Mesh Graphormer
ICCV 2021arXiv
0
citations
MicroNet: Improving Image Recognition With Extremely Low FLOPs
ICCV 2021arXiv
0
citations
Equivariant Similarity for Vision-Language Foundation Models
ICCV 2023arXiv
0
citations
Dynamic ReLU
ECCV 2020
0
citations
"A Simple Approach and Benchmark for 21,000-Category Object Detection"
ECCV 2022
0
citations
AutoMix: Unveiling the Power of Mixup for Stronger Classifiers
ECCV 2022
0
citations
Should All Proposals Be Treated Equally in Object Detection?
ECCV 2022
0
citations
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
ECCV 2022
0
citations
ReCo: Region-Controlled Text-to-Image Generation
CVPR 2023arXiv
0
citations
B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens
ICCV 2025
0
citations
MyGO: Virtual Reality Locomotion Prediction using Multitask Learning
ISMAR 2025
0
citations
Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning
CVPR 2024
0
citations
DisCo: Disentangled Control for Realistic Human Dance Generation
CVPR 2024
0
citations
Segment and Caption Anything
CVPR 2024
0
citations
Completing Visual Objects via Bridging Generation and Segmentation
ICML 2024
0
citations
VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling
ICML 2024arXiv
0
citations
PPFLOW: Target-Aware Peptide Design with Torsional Flow Matching
ICML 2024
0
citations
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences
ICML 2024
0
citations
Stronger NAS with Weaker Predictors
NeurIPS 2021
0
citations
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
NeurIPS 2022
0
citations
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
NeurIPS 2022
0
citations
Towards Reasonable Budget Allocation in Untargeted Graph Structure Attacks via Gradient Debias
NeurIPS 2022
0
citations
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
NeurIPS 2022
0
citations
PaintSeg: Painting Pixels for Training-free Segmentation
NeurIPS 2023
0
citations
Harnessing Hard Mixed Samples with Decoupled Regularizer
NeurIPS 2023
0
citations
OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning
NeurIPS 2023
0
citations