Xiaohui Shen

53

Papers

732

Total Citations

Papers (53)

Learning Progressive Joint Propagation for Human Motion Prediction

Matching-CNN Meets KNN: Quasi-Parametric Human Parsing

SURGE: Surface Regularized Geometry Estimation from a Single Image

Predicting Scene Parsing and Motion Dynamics in the Future

NeurIPS 2017arXiv

MaskBit: Embedding-free Image Generation via Bit Tokens

Regional Homogeneity: Towards Learning Transferable Universal Adversarial Perturbations Against Defenses

Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation

COCONut: Modernizing COCO Segmentation

Reversible Recursive Instance-Level Object Segmentation

A Multi-Level Contextual Model For Person Recognition in Photo Albums

Shortlist Selection With Residual-Aware Distance Estimator for K-Nearest Neighbor Search

Automatic Content-Aware Color and Tone Stylization

Semantic Object Parsing With Local-Global Long Short-Term Memory

Event-Specific Image Importance

Unconstrained Salient Object Detection via Proposal Subset Optimization

Look Into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing

Interpretable Structure-Evolving LSTM

Deep Image Harmonization

Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition

MAttNet: Modular Attention Network for Referring Expression Comprehension

Good View Hunting: Learning Photo Composition From Dense View Pairs

Generative Image Inpainting With Contextual Attention

Learning to Understand Image Blur

Graphonomy: Universal Human Parsing via Graph Transfer Learning

Semantic Component Decomposition for Face Attribute Manipulation

Fashion Editing With Adversarial Parsing Learning

SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

R2Former: Unified Retrieval and Reranking Transformer for Place Recognition

Deep Multi-Patch Aggregation Network for Image Style, Aesthetics, and Quality Estimation

Human Parsing With Contextualized Convolutional Neural Network

Minimum Barrier Salient Object Detection at 80 FPS

Joint Object and Part Segmentation Using Deep Learned Potentials

Personalized Image Aesthetics

FoveaNet: Perspective-Aware Urban Scene Parsing

Recurrent Multimodal Interaction for Referring Image Segmentation

Scene Parsing With Global Context Embedding

D-Attn: Decomposed Attention for Large Vision-and-Language Model

FW-GAN: Flow-Navigated Warping GAN for Video Virtual Try-On

Free-Form Image Inpainting With Gated Convolution

Towards Multi-Pose Guided Virtual Try-On Network

Towards Interpretable Face Recognition

A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder

Video Object Detection via Object-level Temporal Aggregation

Video Scene Parsing With Predictive Feature Learning

Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens

Randomized Autoregressive Visual Generation

MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval

ViTamin: Designing Scalable Vision Models in the Vision-Language Era

Towards Unified Depth and Semantic Prediction From a Single Image

Salient Object Subitizing

A Convolutional Neural Network Cascade for Face Detection

Sequence-to-Segment Networks for Segment Detection

Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP