Jiashi Feng

127

Papers

1,954

Total Citations

Papers (127)

Dual Path Networks

NeurIPS 2017arXiv

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

Dual-Agent GANs for Photorealistic and Identity Preserving Profile Face Synthesis

Tree-Structured Reinforcement Learning for Sequential Object Localization

NeurIPS 2016arXiv

Predicting Scene Parsing and Motion Dynamics in the Future

NeurIPS 2017arXiv

Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition

Multimodal Learning and Reasoning for Visual Question Answering

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

MagicArticulate: Make Your 3D Models Articulation-Ready

AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models

Flash-VStream: Efficient Real-Time Understanding for Long Video Streams

Deep Joint Rain Detection and Removal From a Single Image

Deep Self-Taught Learning for Weakly Supervised Object Localization

Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Object Region Mining With Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach

Outlier-Robust Tensor PCA

Deep Future Gaze: Gaze Anticipation on Egocentric Videos Using Adversarial Networks

Learning Detection With Diverse Proposals

MoNet: Deep Motion Exploitation for Video Object Segmentation

Adversarial Complementary Learning for Weakly Supervised Object Localization

Deep Adversarial Subspace Clustering

Human Pose Estimation With Parsing Induced Learner

Towards Pose Invariant Face Recognition in the Wild

Left-Right Comparative Recurrent Model for Stereo Matching

Zigzag Learning for Weakly Supervised Object Detection

Weakly Supervised Phrase Localization With Multi-Scale Anchored Transformer Network

Learning Markov Clustering Networks for Scene Text Detection

Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation

Graph-Based Global Reasoning Networks

Frame-Consistent Recurrent Video Deraining With Dual-Level Flow

A Simple Pooling-Based Design for Real-Time Salient Object Detection

Distilling Object Detectors With Fine-Grained Feature Imitation

Few-Shot Adaptive Faster R-CNN

Partial Order Pruning: For Best Speed/Accuracy Trade-Off in Neural Architecture Search

PPDM: Parallel Point Detection and Matching for Real-Time Human-Object Interaction Detection

Central Similarity Quantization for Efficient Image and Video Retrieval

Revisiting Knowledge Distillation via Label Smoothing Regularization

Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer

Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax

Boosting Few-Shot Learning With Adaptive Margin Loss

Improving Convolutional Networks With Self-Calibrated Convolutions

Body Meshes as Points

Coordinate Attention for Efficient Mobile Network Design

Domain Adaptation With Auxiliary Target Domain-Oriented Classifier

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation

Continual Learning via Bit-Level Information Preserving

DINE: Domain Adaptation From Single and Multiple Black-Box Predictors

Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning

MetaFormer Is Actually What You Need for Vision

Shunted Self-Attention via Multi-Scale Token Aggregation

PoseTriplet: Co-Evolving 3D Human Pose Estimation, Imitation, and Hallucination Under Self-Supervision

Revisiting Temporal Modeling for CLIP-Based Image-to-Video Knowledge Transferring

TAPS3D: Text-Guided 3D Textured Shape Generation From Pseudo Supervision

Diffusion Probabilistic Model Made Slim

OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis

Clover: Towards a Unified Video-Language Alignment and Fusion Model

Learning The Structure of Deep Convolutional Networks

Neural Person Search Machines

FoveaNet: Perspective-Aware Urban Scene Parsing

Recurrent 3D-2D Dual Learning for Large-Pose Facial Landmark Detection

Regional Interactive Image Segmentation Networks

Video Scene Parsing With Predictive Feature Learning

MultiSeg: Semantically Meaningful, Scale-Diverse Segmentations From Minimal User Input

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution

Dynamic Kernel Distillation for Efficient Pose Estimation in Videos

Single-Stage Multi-Person Pose Machines

Few-Shot Object Detection via Feature Reweighting

Foreground-Aware Pyramid Reconstruction for Alignment-Free Occluded Person Re-Identification

PANet: Few-Shot Image Semantic Segmentation With Prototype Alignment

PnP-DETR: Towards Efficient Visual Analysis With Transformers

Voxel Transformer for 3D Object Detection

Tokens-to-Token ViT: Training Vision Transformers From Scratch on ImageNet

AutoSpace: Neural Architecture Search With Less Human Interference

Global Knowledge Calibration for Fast Open-Vocabulary Segmentation

Dataset Quantization

GETAvatar: Generative Textured Meshes for Animatable Human Avatars

Rethinking Bottleneck Structure for Efficient Mobile Network Design

A Balanced and Uncertainty-aware Approach for Partial Domain Adaptation

The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation

Geometry-Guided Progressive NeRF for Generalizable and Efficient Neural Human Rendering

Slim Scissors: Segmenting Thin Object from Synthetic Background

Memory-Augmented Attribute Manipulation Networks for Interactive Fashion Search

Parallelized Autoregressive Visual Generation

QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing

MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval

PixelLM: Pixel Reasoning with Large Multimodal Model

Video Recognition in Portrait Mode

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens

Reversible Recursive Instance-Level Object Segmentation

Recurrently Target-Attending Tracking

Recurrent Face Aging

Highway Vehicle Counting in Compressed Domain

Semantic Object Parsing With Local-Global Long Short-Term Memory

Natural Language Object Retrieval

Tensor Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Tensors via Convex Optimization

Interpretable Structure-Evolving LSTM

Perceptual Generative Adversarial Networks for Small Object Detection

New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity

A^2-Nets: Double Attention Networks

Efficient Stochastic Gradient Hard Thresholding

Efficient Meta Learning via Minibatch Proximal Update

Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation

Improving Generalization in Reinforcement Learning with Mixture Regularization

Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Towards Theoretically Understanding Why Sgd Generalizes Better Than Adam in Deep Learning

No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data

Direct Multi-view Multi-person 3D Pose Estimation

All Tokens Matter: Token Labeling for Training Better Vision Transformers

Towards Understanding Why Lookahead Generalizes Better Than SGD and Beyond

Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning

Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning

Sharpness-Aware Training for Free

Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition

XAGen: 3D Expressive Human Avatars Generation

Expanding Small-Scale Datasets with Guided Imagination

WSNet: Compact and Efficient Networks Through Weight Sampling

Policy Optimization with Demonstrations

Understanding Generalization and Optimization Performance of Deep CNNs