Zheng-Jun Zha

101

Papers

115

Total Citations

Papers (101)

Revisiting Single Image Reflection Removal In the Wild

QMambaBSR: Burst Image Super-Resolution with Query State Space Model

Improved Video VAE for Latent Video Diffusion Model

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement

EVDM: Event-based Real-world Video Deblurring with Mamba

Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion

HERO: Human Reaction Generation from Videos

MATE: Motion-Augmented Temporal Consistency for Event-based Point Tracking

EMoTive: Event-guided Trajectory Modeling for 3D Motion Estimation

Enhanced Pansharpening via Quaternion Spatial-Spectral Interactions

EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction

SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation

Boosting Image De-Raining via Central-Surrounding Synergistic Convolution

DCTMamba: Advancing JPEG Image Restoration Through Long-Sequence Modeling and Adaptive Frequency Strategy

HOIMamba: Efficient Mamba-based Disentangled Progressive Learning for HOI Detection

A Lottery Ticket Hypothesis Approach with Sparse Fine-tuning and MAE for Image Forgery Detection and Localization

Fusion-Vital: Video-RF Fusion Transformer for Advanced Remote Physiological Measurement

780 Learning Discriminative Noise Guidance for Image Forgery Detection and Localization

HomoFormer: Homogenized Transformer for Image Shadow Removal

LEMON: Learning 3D Human-Object Interaction Relation from 2D Images

Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection

CCM: Real-Time Controllable Visual Content Creation Using Text-to-Image Consistency Models

Comparative Deep Learning of Hybrid Representations for Image Recommendations

MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition

Camera Lens Super-Resolution

Context-Reinforced Semantic Segmentation

Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition

Adaptive Transfer Network for Cross-Domain Person Re-Identification

State-Relabeling Adversarial Active Learning

Parsing-Based View-Aware Embedding Network for Vehicle Re-Identification

Deep Structure-Revealed Network for Texture Recognition

ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection

Deep Degradation Prior for Low-Quality Image Classification

Real-World Person Re-Identification via Degradation Invariance Learning

Domain-Aware Visual Bias Eliminating for Generalized Zero-Shot Learning

Iterative Context-Aware Graph Inference for Visual Dialog

Spatiotemporal Fusion in 3D CNNs: A Probabilistic View

Self-Supervised Domain-Aware Generative Network for Generalized Zero-Shot Learning

Object Relational Graph With Teacher-Recommended Learning for Video Captioning

Image De-Raining via Continual Learning

Structured Multi-Level Interaction Network for Video Moment Localization via Language Query

Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning

Light Field Super-Resolution With Zero-Shot Learning

Group-aware Label Transfer for Domain Adaptive Person Re-identification

Rethinking Graph Neural Architecture Search From Message-Passing

Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos

Weakly Supervised High-Fidelity Clothing Model Generation

Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment

Lifelong Unsupervised Domain Adaptive Person Re-Identification With Coordinated Anti-Forgetting and Adaptation

Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning

Automatic Relation-Aware Graph Network Proliferation

Multi-Grained Spatio-Temporal Features Perceived Network for Event-Based Lip-Reading

Bijective Mapping Network for Shadow Removal

Degradation-Agnostic Correspondence From Resolution-Asymmetric Stereo

EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification

Decoupling-and-Aggregating for Image Exposure Correction

Edge-Aware Regional Message Passing Controller for Image Forgery Localization

Neural Dependencies Emerging From Learning Massive Categories

Learning To Dub Movies via Hierarchical Prosody Models

Generalized UAV Object Detection via Frequency Domain Disentanglement

Event-Guided Person Re-Identification via Sparse-Dense Complementary Learning

Streaming Video Model

JPEG Artifacts Reduction via Deep Convolutional Sparse Coding

Making History Matter: History-Advantage Sequence Training for Visual Dialog

Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

Deep Multiple-Attribute-Perceived Network for Real-World Texture Recognition

Learning to Assemble Neural Module Tree Networks for Visual Grounding

Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning

Learning Dual Priors for JPEG Compression Artifacts Removal

Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment

Attack-Guided Perceptual Data Generation for Real-World Re-Identification

Self-Supervised Visual Representations Learning by Contrastive Mask Prediction

Cross-Patch Graph Convolutional Network for Image Denoising

Self-supervised Cross-view Representation Reconstruction for Change Captioning

Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-Trained Vision-Language Models

Self-Organizing Pathway Expansion for Non-Exemplar Class-Incremental Learning

Adaptive Frequency Filters As Efficient Global Token Mixers

Text-Driven Generative Domain Adaptation with Spectral Consistency Regularization

Spatial-Aware Token for Weakly Supervised Object Localization

Grounding 3D Object Affordance from 2D Interactions in Images

Learning Cross-Representation Affinity Consistency for Sparsely Supervised Biomedical Instance Segmentation

S2N: Suppression-Strengthen Network for Event-Based Recognition under Variant Illuminations

JPEG Artifacts Removal via Contrastive Representation Learning

Improving De-Raining Generalization via Neural Reorganization

UHD-processer: Unified UHD Image Restoration with Progressive Frequency Learning and Degradation-aware Prompts

Hierarchical Knowledge Prompt Tuning for Multi-task Test-Time Adaptation

WeGen: A Unified Model for Interactive Multimodal Generation as We Chat

GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding

SIGMAN: Scaling 3D Human Gaussian Generation with Millions of Assets

Learning Deep Bilinear Transformation for Fine-grained Image Representation

Abstract Reasoning with Distracting Features

Hierarchical Granularity Transfer Learning

Learning Semantic-aware Normalization for Generative Adversarial Networks

Low-Rank Subspaces in GANs

Stochastic Window Transformer for Image Restoration

Exploring Figure-Ground Assignment Mechanism in Perceptual Organization

Rank Diminishing in Deep Neural Networks

DreamWaltz: Make a Scene with Complex 3D Animatable Avatars