Yang Zhang

57

Papers

433

Total Citations

Papers (57)

Dilated Recurrent Neural Networks

NeurIPS 2017arXiv

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models

Online Preference Alignment for Language Models via Count-based Exploration

Correcting Diffusion Generation through Resampling

CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance

Task-Agnostic Pre-training and Task-Guided Fine-tuning for Versatile Diffusion Planner

Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning

IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation

Plug-in Feedback Self-adaptive Attention in CLIP for Training-free Open-Vocabulary Segmentation

Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind

Collaborative Weakly Supervised Video Correlation Learning for Procedure-Aware Instructional Video Analysis

Polyper: Boundary Sensitive Polyp Segmentation

APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation

Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning

The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright BreachesWithout Adjusting Finetuning Pipeline

Speech Self-Supervised Learning Using Diffusion Model Synthetic Data

Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling

Fast Zero-Shot Image Tagging

PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation

Copy and Paste GAN: Face Hallucination From Shaded Thumbnails

Panoptic-PolarNet: Proposal-Free LiDAR Point Cloud Panoptic Segmentation

The Lottery Tickets Hypothesis for Supervised and Self-Supervised Pre-Training in Computer Vision Models

SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency

Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models

Can't Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders

Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes

Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization Without Accessing Target Domain Data

SACoD: Sensor Algorithm Co-Design Towards Efficient CNN-Powered Intelligent PhlatCam

A General Recurrent Tracking Framework Without Real Data

Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis

TempFormer: Temporally Consistent Transformer for Video Denoising

Semi-Leak: Membership Inference Attacks against Semi-Supervised Learning

DoGA: Enhancing Grounded Object Detection via Grouped Pre-Training with Attributes

VSP: Diagnosing the Dual Challenges of Perception and Reasoning in Spatial Planning Tasks for MLLMs

LDIP: Long Distance Information Propagation for Video Super-Resolution

Event-guided HDR Reconstruction with Diffusion Priors

Anti-Tamper Protection for Unauthorized Individual Image Generation

LOTA: Bit-Planes Guided AI-Generated Image Detection

Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions

EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs

Bridging the Gap between Gaussian Diffusion Models and Universal Quantization for Image Compression

VIoTGPT: Learning to Schedule Vision Tools Towards Intelligent Video Internet of Things

Behavior Importance-Aware Graph Neural Architecture Search for Cross-Domain Recommendation

A Game Theoretic Approach to Class-wise Selective Rationalization

The Lottery Ticket Hypothesis for Pre-trained BERT Networks

Understanding Interlocking Dynamics of Cooperative Rationalization

Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks

Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

BCORLE($\lambda$): An Offline Reinforcement Learning and Evaluation Framework for Coupons Allocation in E-commerce Market

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition

Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing

Amplifying Membership Exposure via Data Poisoning

Fairness Reprogramming

A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss