Yao Zhao

28

Papers

126

Total Citations

Papers (28)

Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation

Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation

Lyapunov-Stable Deep Equilibrium Models

EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events

Dual-view X-ray Detection: Can AI Detect Prohibited Items from Dual-view X-ray Images like Humans?

ODDN: Addressing Unpaired Data Challenges in Open-World Deepfake Detection on Online Social Networks

Collapsed Language Models Promote Fairness

Transferable and Principled Efficiency for Open-Vocabulary Segmentation

NTClick: Achieving Precise Interactive Segmentation With Noise-tolerant Clicks

Attend and Enrich: Enhanced Visual Prompt for Zero-Shot Learning

Unsupervised Region-Based Image Editing of Denoising Diffusion Models

Visual Relation Diffusion for Human-Object Interaction Detection

Fixing the Loose Brake: Exponential-Tailed Stopping Time in Best Arm Identification

CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting

ReCoT: Reflective Self-Correction Training for Mitigating Confirmation Bias in Large Vision-Language Models

CharaConsist: Fine-Grained Consistent Character Generation

PixelStitch: Structure-Preserving Pixel-Wise Bidirectional Warps for Unsupervised Image Stitching

Memory Efficient Matting with Adaptive Token Routing

C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection

CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation

On the Unstable Convergence Regime of Gradient Descent

Frequency-Aware Deepfake Detection: Improving Generalizability through Frequency Domain Learning

Endow SAM with Keen Eyes: Temporal-spatial Prompt Learning for Video Camouflaged Object Detection

Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection

PixelLM: Pixel Reasoning with Large Multimodal Model

Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection