Zhenyu Zhang

34

Papers

82

Total Citations

Papers (34)

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation

Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization

AltNeRF: Learning Robust Neural Radiance Field via Alternating Depth-Pose Optimization

Describe, Don’t Dictate: Semantic Image Editing with Natural Language Intent

ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents

StrandHead: Text to Hair-Disentangled 3D Head Avatars Using Human-Centric Priors

Pattern-Structure Diffusion for Multi-Task Learning

Online Depth Learning Against Forgetting in Monocular Videos

Learning To Restore 3D Face From In-the-Wild Degraded Images

Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free

The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy

Physically-Guided Disentangled Implicit Rendering for 3D Face Modeling

Graph Transformer GANs for Graph-Constrained House Generation

ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model With Knowledge-Enhanced Mixture-of-Denoising-Experts

Learning To Measure the Point Cloud Reconstruction Loss in a Representation Space

Learning Neural Proto-Face Field for Disentangled 3D Face Modeling in the Wild

Regularizing Nighttime Weirdness: Efficient Self-Supervised Monocular Depth Estimation in the Dark

Learning Versatile 3D Shape Generation with Improved Auto-regressive Models

Multi-modal Masked Pre-training for Monocular Panoramic Depth Completion

RigNet: Repetitive Image Guided Network for Depth Completion

Learning To Aggregate and Personalize 3D Face From In-the-Wild Photo Collection

Tri-Perspective View Decomposition for Geometry-Aware Depth Completion

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once

CaM: Cache Merging for Memory-efficient LLMs Inference

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation

You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership

Sparse Winning Tickets are Data-Efficient Image Recognizers

Randomized Channel Shuffling: Minimal-Overhead Backdoor Attack Detection without Clean Datasets

H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models