Yan Wang

88

Papers

307

Total Citations

Papers (88)

A Powerful Generative Model Using Random Weights for the Deep Image Representation

NeurIPS 2016arXiv

Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding

Enabling Deep Residual Networks for Weakly Supervised Object Detection

Language-Image Models with 3D Understanding

MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

MambaIC: State Space Models for High-Performance Learned Image Compression

Rethinking Diffusion Posterior Sampling: From Conditional Score Estimator to Maximizing a Posterior

Task-Aware Encoder Control for Deep Video Compression

Partial Label Learning with a Partner

Probability-Polarized Optimal Transport for Unsupervised Domain Adaptation

Spatially-Variant Degradation Model for Dataset-free Super-resolution

LLM4RSR: Large Language Models as Data Correctors for Robust Sequential Recommendation

Physical-aware Neural Radiance Fields for Efficient Exposure Correction

Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering

Object Attribute Matters in Visual Question Answering

Pixel-level Semantic Correspondence through Layout-aware Representation Learning and Multi-scale Matching Integration

CAMixerSR: Only Details Need More "Attention"

Boosting Neural Representations for Videos with a Conditional Decoder

Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models

Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning

AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring

CogAgent: A Visual Language Model for GUI Agents

RepAn: Enhanced Annealing through Re-parameterization

PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving

Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities

An Embodied Generalist Agent in 3D World

DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection

Object Skeleton Extraction in Natural Images by Fusing Scale-Associated Deep Side Outputs

Deep Regression Forests for Age Estimation

Generative Adversarial Learning Towards Fast Weakly Supervised Detection

Resource Aware Person Re-Identification Across Multiple Resolutions

Recurrent Saliency Transformation Network: Incorporating Multi-Stage Visual Cues for Small Organ Segmentation

Cyclic Guidance for Weakly Supervised Joint Detection and Segmentation

Fully Quantized Network for Object Detection

Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

Deep Distance Transform for Tubular Structure Segmentation in CT Scans

HRank: Filter Pruning Using High-Rank Feature Map

End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection

Train in Germany, Test in the USA: Making 3D Object Detectors Generalize

Checkerboard Context Model for Efficient Learned Image Compression

ContrastMask: Contrastive Learning To Segment Every Thing

ELIC: Efficient Learned Image Compression With Unevenly Grouped Space-Channel Contextual Adaptive Coding

Ithaca365: Dataset and Driving Perception Under Repeated and Challenging Weather Conditions

Practical Learned Lossless JPEG Recompression With Multi-Level Cross-Channel Entropy Model in the DCT Domain

Class Balanced Adaptive Pseudo Labeling for Federated Semi-Supervised Learning

Privacy-Preserving Adversarial Facial Features

MagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery

Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation

Meta Architecture for Point Cloud Analysis

SORT: Second-Order Response Transform for Visual Recognition

Multi-Stage Multi-Recursive-Input Fully Convolutional Networks for Neuronal Boundary Detection

Recognition of Action Units in the Wild With Deep Nets and a New Global-Local Loss

Deep Co-Training With Task Decomposition for Semi-Supervised Domain Adaptation

AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception

Efficient Decision-based Black-box Patch Attacks on Video Recognition

Rethinking Safe Semi-supervised Learning: Transferring the Open-set Problem to A Close-set One

Automatic Network Pruning via Hilbert-Schmidt Independence Criterion Lasso under Information Bottleneck Principle

RIBAC: Towards Robust and Imperceptible Backdoor Attack against Compact DNN

Black-Box Dissector: Towards Erasing-Based Hard-Label Model Stealing Attack

FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos

Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding

PICD: Versatile Perceptual Image Compression with Diffusion Rendering

D2SP: Dynamic Dual-Stage Purification Framework for Dual Noise Mitigation in Vision-based Affective Recognition.

Medusa: A Multi-Scale High-order Contrastive Dual-Diffusion Approach for Multi-View Clustering

Extrapolated Urban View Synthesis Benchmark

MamV2XCalib: V2X-based Target-less Infrastructure Camera Calibration with State Space Model

OUS: Bridging Scene Context and Facial Features to Overcome the Rigid Cognitive Problem

CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression

GapMatch: Bridging Instance and Model Perturbations for Enhanced Semi-Supervised Medical Image Segmentation

Variable Importance in High-Dimensional Settings Requires Grouping

Fine-Tuning Large Language Model Based Explainable Recommendation with Explainable Quality Reward

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image

LLMRG: Improving Recommendations through Large Language Model Reasoning Graphs

Collaborative Consortium of Foundation Models for Open-World Few-Shot Learning

Variational Structured Semantic Inference for Diverse Image Captioning

Rotated Binary Neural Network

Wasserstein Distances for Stereo Disparity Estimation

Neural Surface Reconstruction of Dynamic Scenes with Monocular RGB-D Camera

Multi-Sample Training for Neural Image Compression

Flexible Neural Image Compression via Code Editing

A Contrastive Framework for Neural Text Generation

Theoretically Guaranteed Bidirectional Data Rectification for Robust Sequential Recommendation

Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research

Idempotent Learned Image Compression with Right-Inverse

Prompt-augmented Temporal Point Process for Streaming Event Sequence

Stability of Random Forests and Coverage of Random-Forest Prediction Intervals

Exploiting Contextual Objects and Relations for 3D Visual Grounding