Feng Zheng

37

Papers

116

Total Citations

Papers (37)

Enabling Deep Residual Networks for Weakly Supervised Object Detection

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos

MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection

Beyond Prototypes: Semantic Anchor Regularization for Better Representation Learning

OptiScene: LLM-driven Indoor Scene Layout Generation via Scaled Human-aligned Data Synthesis and Multi-Stage Preference Optimization

NeurIPS 2025arXiv

Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion

MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning

NeurIPS 2025arXiv

A₀ : An Affordance-Aware Hierarchical Model for General Robotic Manipulation

Block Image Compressive Sensing with Local and Global Information Interaction

Unsupervised Continual Anomaly Detection with Contrastively-Learned Prompt

Depth-Aware Concealed Crop Detection in Dense Agricultural Scenes

Salience-Guided Cascaded Suppression Network for Person Re-Identification

One-Shot Adversarial Attacks on Visual Tracking With Dual Attention

Noise-Aware Fully Webly Supervised Object Detection

Learning 3D Shape Feature for Texture-Insensitive Person Re-Identification

Brain Image Synthesis With Unsupervised Multivariate Canonical CSCl4Net

Class-Aware Contrastive Semi-Supervised Learning

Meta Distribution Alignment for Generalizable Person Re-Identification

Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression

Accelerating Vision-Language Pretraining With Free Language Modeling

Resource-Efficient RGBD Aerial Tracking

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline

Saliency-Associated Object Tracking

Seminar Learning for Click-Level Weakly Supervised Semantic Segmentation

FREE: Feature Refinement for Generalized Zero-Shot Learning

DepthTrack: Unveiling the Power of RGBD Tracking

End-to-End Dense Video Captioning With Parallel Decoding

Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples

Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models

Transferable Decoding with Visual Entities for Zero-Shot Image Captioning

Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models

S2Contact: Graph-Based Network for 3D Hand-Object Contact Estimation with Semi-Supervised Learning

Towards Generic 3D Tracking in RGBD Videos: Benchmark and Baseline

Generalized Brain Image Synthesis with Transferable Convolutional Sparse Coding Networks

Multi-task Additive Models for Robust Estimation and Automatic Structure Discovery

SoftPatch: Unsupervised Anomaly Detection with Noisy Data

NeurIPS 2022arXiv

Real3D-AD: A Dataset of Point Cloud Anomaly Detection

NeurIPS 2023arXiv