Chunhua Shen

154

Papers

1,217

Total Citations

Papers (154)

Depth and Surface Normal Estimation From Monocular Images Using Regression on Deep Features and Hierarchical CRFs

Efficient Semantic Video Segmentation with Per-frame Inference

8976 PointAttN: You Only Need Attention for Point Cloud Completion

Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetric Skip Connections

NeurIPS 2016arXiv

Weighing Counts: Sequential Crowd Counting by Reinforcement Learning

What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?

Aether: Geometric-Aware Unified World Modeling

Representative Graph Neural Network

Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings

Revisiting Convolution Architecture in the Realm of DNA Foundation Models

On the Trajectory Regularity of ODE-based Diffusion Sampling

Supervised Discrete Hashing

Mid-Level Deep Pattern Mining

Learning to Rank in Person Re-Identification With Metric Ensembles

Efficient SDP Inference for Fully-Connected CRFs Based on Low-Rank Decomposition

Learning Graph Structure for Multi-Label Image Classification via Clique Generation

The Treasure Beneath Convolutional Layers: Cross-Convolutional-Layer Pooling for Image Classification

Deep Convolutional Neural Fields for Depth Estimation From a Single Image

What Value Do Explicit High Level Concepts Have in Vision to Language Problems?

What's Wrong With That Object? Identifying Images of Unusual Objects by Modelling the Detection Score Distribution

Less Is More: Zero-Shot Learning From Online Textual Documents With Noise Suppression

Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation

Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge From External Sources

Fast Training of Triplet-Based Deep Binary Embedding Networks

The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions

Sequential Person Recognition in Photo Albums With a Recurrent Network

Attend in Groups: A Weakly-Supervised Deep Learning Framework for Learning From Web Data

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

From Motion Blur to Motion Flow: A Deep Learning Solution for Removing Heterogeneous Motion Blur

Multi-Attention Network for One Shot Learning

Monocular Relative Depth Perception With Web Stereo Data Supervision

Bootstrapping the Performance of Webly Supervised Semantic Segmentation

FSRNet: End-to-End Learning Face Super-Resolution With Facial Priors

Parallel Attention: A Unified Framework for Visual Object Discovery Through Dialogs and Queries

An End-to-End TextSpotter With Explicit Alignment and Attention

Are You Talking to Me? Reasoned Visual Dialog Generation Through Adversarial Learning

Visual Question Answering With Memory-Augmented Networks

Repulsion Loss: Detecting Pedestrians in a Crowd

Towards Effective Low-Bitwidth Convolutional Neural Networks

VITAL: VIsual Tracking via Adversarial Learning

Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation

Knowledge Adaptation for Efficient Semantic Segmentation

Attention-Guided Network for Ghost-Free High Dynamic Range Imaging

Neighbourhood Watch: Referring Expression Comprehension via Language-Guided Graph Attention Networks

Mind Your Neighbours: Image Annotation With Metadata Neighbourhood Graph Co-Attention Networks

Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation

Associatively Segmenting Instances and Semantics in Point Clouds

CANet: Class-Agnostic Segmentation Networks With Iterative Refinement and Attentive Few-Shot Learning

Visual Question Answering as Reading Comprehension

Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells

Training Quantized Neural Networks With a Full-Precision Auxiliary Module

Memory-Efficient Hierarchical Neural Architecture Search for Image Denoising

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover's Distance and Structured Classifiers

ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network

Context Prior for Scene Segmentation

Mask Encoding for Single Shot Instance Segmentation

NAS-FCOS: Fast Neural Architecture Search for Object Detection

On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

Self-Trained Deep Ordinal Regression for End-to-End Video Anomaly Detection

PolarMask: Single Shot Instance Segmentation With Polar Representation

DoDNet: Learning To Segment Multi-Organ and Tumors From Multiple Partially Labeled Datasets

Learning To Recover 3D Scene Shape From a Single Image

Graph Attention Tracking

AQD: Towards Accurate Quantized Object Detection

Generic Perceptual Loss for Modeling Structured Output Dependencies

DyCo3D: Robust Instance Segmentation of 3D Point Clouds Through Dynamic Convolution

Learning Spatial-Semantic Relationship for Facial Attribute Recognition With Limited Labeled Data

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

End-to-End Video Instance Segmentation With Transformers

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

FCPose: Fully Convolutional Multi-Person Pose Estimation With Dynamic Instance-Aware Convolutions

BoxInst: High-Performance Instance Segmentation With Box Annotations

HCRF-Flow: Scene Flow From Point Clouds With Continuous High-Order CRFs and Position-Aware Flow Embedding

Learning Affinity-Aware Upsampling for Deep Image Matting

FreeSOLO: Learning To Segment Objects Without Annotations

RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds by Local Rigidity Prior

Catching Both Gray and Black Swans: Open-Set Supervised Anomaly Detection

Retrieval Augmented Classification for Long-Tail Visual Recognition

Boosting Robustness of Image Matting With Context Assembling and Strong Data Augmentation

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

Learning Conditional Attributes for Compositional Zero-Shot Learning

Images Speak in Images: A Generalist Painter for In-Context Visual Learning

Hyperspectral Compressive Sensing Using Manifold-Structured Sparsity Prior

Towards Context-Aware Interaction Recognition for Visual Relationship Detection

When Unsupervised Domain Adaptation Meets Tensor Representations

Adversarial PoseNet: A Structure-Aware Convolutional Network for Human Pose Estimation

Towards End-To-End Text Spotting With Convolutional Recurrent Neural Networks

Exploiting Temporal Consistency for Real-Time Video Depth Estimation

Indices Matter: Learning to Index for Deep Image Matting

Enforcing Geometric Constraints of Virtual Normal for Depth Prediction

Self-Training With Progressive Augmentation for Unsupervised Cross-Domain Person Re-Identification

From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer

Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network

FCOS: Fully Convolutional One-Stage Object Detection

FATNN: Fast and Accurate Ternary Neural Networks

BV-Person: A Large-Scale Dataset for Bird-View Person Re-Identification

Channel-Wise Knowledge Distillation for Dense Prediction

A Simple Baseline for Semi-Supervised Semantic Segmentation With Strong Data Augmentation

Meta Navigator: Search for a Good Adaptation Policy for Few-Shot Learning

Occluded Person Re-Identification With Single-Scale Global Representations

DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models

SegGPT: Towards Segmenting Everything in Context

Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering

CTVIS: Consistent Training for Online Video Instance Segmentation

Generative Prompt Model for Weakly Supervised Object Localization

Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction

FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models

SegPrompt: Boosting Open-World Segmentation via Category-Level Prompt Learning

Conditional Convolutions for Instance Segmentation

Soft Expert Reward Learning for Vision-and-Language Navigation

Scene Text Image Super-resolution in the wild

Segmenting Transparent Objects in the Wild

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

Learning and Memorizing Representative Prototypes for 3D Point Cloud Semantic and Instance Segmentation

SOLO: Segmenting Objects by Locations

Instance-Aware Embedding for Point Cloud Instance Segmentation

PointInst3D: Segmenting 3D Instances by Points

Poseur: Direct Human Pose Regression with Transformers

Efficient Decoder-Free Object Detection with Transformers

DisCo: Remedying Self-Supervised Learning on Lightweight Models with Distilled Contrastive Learning

Deeply Learning the Messages in Message Passing Inference

NeurIPS 2015arXiv

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation

POMATO: Marrying Pointmap Matching with Temporal Motions for Dynamic 3D Reconstruction

Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection

SMSTracker: Tri-path Score Mask Sigma Fusion for Multi-Modal Tracking

SurfaceSplat: Connecting Surface Reconstruction and Gaussian Splatting

Unified Open-World Segmentation with Multi-Modal Prompts

Retrieval-Augmented Primitive Representations for Compositional Zero-Shot Learning

DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data

Traffic Scene Parsing through the TSP6K Dataset

FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

Floating Anchor Diffusion Model for Multi-motif Scaffolding

Generative Active Learning for Long-tailed Instance Segmentation

Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video

Multi-marginal Wasserstein GAN

SOLOv2: Dynamic and Fast Instance Segmentation

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation

SegViT: Semantic Segmentation with Plain Vision Transformers

Hierarchical Normalization for Robust Monocular Depth Estimation

Multi-dataset Training of Transformers for Robust Action Recognition

DENSE: Data-Free One-Shot Federated Learning

Adv-Attribute: Inconspicuous and Transferable Adversarial Attack on Face Recognition

Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images

PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining

Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

Adversarial Learning with Local Coordinate Coding