Zicheng Liu

58
Papers
283
Total Citations

Papers (58)

MogaNet: Multi-order Gated Aggregation Network

ICLR 2024
125
citations

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

CVPR 2024
49
citations

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

CVPR 2025
32
citations

SemiReward: A General Reward Model for Semi-supervised Learning

ICLR 2024
18
citations

PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for Efficient and Generalizable Compound-Protein Interaction Prediction

AAAI 2024arXiv
18
citations

CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph

ICLR 2025
16
citations

Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization

ICLR 2025
14
citations

MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization

CVPR 2025arXiv
6
citations

DaCapo: Score Distillation as Stacked Bridge for Fast and High-quality 3D Editing

CVPR 2025
4
citations

Exploring Invariance in Images through One-way Wave Equations

ICML 2025
1
citations

StrokeNUWA—Tokenizing Strokes for Vector Graphic Synthesis

ICML 2024
0
citations

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

ICML 2024
0
citations

Large Scale Incremental Learning

CVPR 2019
0
citations

Rethinking Classification and Localization for Object Detection

CVPR 2020arXiv
0
citations

Dynamic Convolution: Attention Over Convolution Kernels

CVPR 2020arXiv
0
citations

Probabilistic Model Distillation for Semantic Correspondence

CVPR 2021
0
citations

End-to-End Human Pose and Mesh Reconstruction with Transformers

CVPR 2021arXiv
0
citations

Mobile-Former: Bridging MobileNet and Transformer

CVPR 2022
0
citations

Lifelong Unsupervised Domain Adaptive Person Re-Identification With Coordinated Anti-Forgetting and Adaptation

CVPR 2022arXiv
0
citations

Cross-Modal Representation Learning for Zero-Shot Action Recognition

CVPR 2022arXiv
0
citations

SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning

CVPR 2022arXiv
0
citations

An Empirical Study of Training End-to-End Vision-and-Language Transformers

CVPR 2022arXiv
0
citations

Injecting Semantic Concepts Into End-to-End Image Captioning

CVPR 2022arXiv
0
citations

Scaling Up Vision-Language Pre-Training for Image Captioning

CVPR 2022arXiv
0
citations

Deep Frequency Filtering for Domain Generalization

CVPR 2023arXiv
0
citations

Adaptive Human Matting for Dynamic Videos

CVPR 2023arXiv
0
citations

An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling

CVPR 2023arXiv
0
citations

Binary Latent Diffusion

CVPR 2023arXiv
0
citations

LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling

CVPR 2023arXiv
0
citations

Neural Voting Field for Camera-Space 3D Hand Pose Estimation

CVPR 2023arXiv
0
citations

Compressing Visual-Linguistic Model via Knowledge Distillation

ICCV 2021arXiv
0
citations

End-to-End Semi-Supervised Object Detection With Soft Teacher

ICCV 2021arXiv
0
citations

Mesh Graphormer

ICCV 2021arXiv
0
citations

MicroNet: Improving Image Recognition With Extremely Low FLOPs

ICCV 2021arXiv
0
citations

Equivariant Similarity for Vision-Language Foundation Models

ICCV 2023arXiv
0
citations

Dynamic ReLU

ECCV 2020
0
citations

"A Simple Approach and Benchmark for 21,000-Category Object Detection"

ECCV 2022
0
citations

AutoMix: Unveiling the Power of Mixup for Stronger Classifiers

ECCV 2022
0
citations

Should All Proposals Be Treated Equally in Object Detection?

ECCV 2022
0
citations

UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling

ECCV 2022
0
citations

ReCo: Region-Controlled Text-to-Image Generation

CVPR 2023arXiv
0
citations

B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens

ICCV 2025
0
citations

MyGO: Virtual Reality Locomotion Prediction using Multitask Learning

ISMAR 2025
0
citations

Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning

CVPR 2024
0
citations

DisCo: Disentangled Control for Realistic Human Dance Generation

CVPR 2024
0
citations

Segment and Caption Anything

CVPR 2024
0
citations

Completing Visual Objects via Bridging Generation and Segmentation

ICML 2024
0
citations

VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling

ICML 2024arXiv
0
citations

PPFLOW: Target-Aware Peptide Design with Torsional Flow Matching

ICML 2024
0
citations

Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences

ICML 2024
0
citations

Stronger NAS with Weaker Predictors

NeurIPS 2021
0
citations

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

NeurIPS 2022
0
citations

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

NeurIPS 2022
0
citations

Towards Reasonable Budget Allocation in Untargeted Graph Structure Attacks via Gradient Debias

NeurIPS 2022
0
citations

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

NeurIPS 2022
0
citations

PaintSeg: Painting Pixels for Training-free Segmentation

NeurIPS 2023
0
citations

Harnessing Hard Mixed Samples with Decoupled Regularizer

NeurIPS 2023
0
citations

OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning

NeurIPS 2023
0
citations