Zhenyu Zhang

14

Papers

82

Total Citations

Papers (14)

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation

AltNeRF: Learning Robust Neural Radiance Field via Alternating Depth-Pose Optimization

Describe, Don’t Dictate: Semantic Image Editing with Natural Language Intent

StrandHead: Text to Hair-Disentangled 3D Head Avatars Using Human-Centric Priors

ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents

Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once

CaM: Cache Merging for Memory-efficient LLMs Inference

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

Tri-Perspective View Decomposition for Geometry-Aware Depth Completion

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference