Jue Wang

59

Papers

30

Total Citations

Papers (59)

FoldToken: Learning Protein Language via Vector Quantization and Beyond

Text-Guided Video Masked Autoencoder

FloE: On-the-Fly MoE Inference on Memory-constrained GPU

Coherent Parametric Contours for Interactive Video Object Segmentation

Automatic Fence Segmentation in Videos of Dynamic Scenes

Deep Video Deblurring for Hand-Held Cameras

Video Representation Learning Using Discriminative Pooling

DocUNet: Document Image Unwarping via a Stacked U-Net

Scale-Recurrent Network for Deep Image Deblurring

GIF2Video: Color Dequantization and Temporal Interpolation of GIF Images

GeoNet: Deep Geodesic Networks for Point Cloud Analysis

Audio Visual Scene-Aware Dialog

UPFlow: Upsampling Pyramid for Unsupervised Optical Flow Learning

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs

Motion-Aware Contrastive Video Representation Learning via Foreground-Background Merging

Long-Short Temporal Contrastive Learning of Video Transformers

FENeRF: Face Editing in Neural Radiance Fields

Deformable Video Transformer

Hallucinated Neural Radiance Fields in the Wild

Deblur-NeRF: Neural Radiance Fields From Blurry Images

Multi-Robot Active Mapping via Neural Bipartite Graph Matching

LAS-AT: Adversarial Training With Learnable Attack Strategy

Unsupervised Pre-Training for Temporal Action Localization Tasks

Exploring Denoised Cross-Video Contrast for Weakly-Supervised Temporal Action Localization

High-Fidelity GAN Inversion for Image Attribute Editing

Self-Supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection

Patch-Based 3D Natural Scene Generation From a Single Example

CodeTalker: Speech-Driven 3D Facial Animation With Discrete Motion Prior

Fine-Grained Face Swapping via Regional GAN Inversion

ACR: Attention Collaboration-Based Regressor for Arbitrary Two-Hand Reconstruction

Learning Anchor Transformations for 3D Garment Animation

Skinned Motion Retargeting With Residual Perception of Motion Semantics & Geometry

UV Volumes for Real-Time Rendering of Editable Free-View Human Performance

Selective Structured State-Spaces for Long-Form Video Understanding

Zero-Order Reverse Filtering

Detail-Revealing Deep Video Super-Resolution

Semi-Supervised Skin Detection by Network With Mutual Guidance

Not All Parts Are Created Equal: 3D Pose Estimation by Modeling Bi-Directional Dependencies of Body Parts

GODS: Generalized One-Class Discriminative Subspaces for Anomaly Detection

Disentangled Image Matting

Motion-Guided Masking for Spatiotemporal Representation Learning

Ultra High-Resolution Image Inpainting with Patch-Based Content Consistency Adapter

Practical Deep Raw Image Denoising on Mobile Devices

Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards

Prior-Guided Adversarial Initialization for Fast Adversarial Training

Spatial-Separated Curve Rendering Network for Efficient and High-Resolution Image Harmonization

Towards Accurate Active Camera Localization

StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN

LocVTP: Video-Text Pre-training for Temporal Localization

Content-Aware Unsupervised Deep Homography Estimation

Soft Prompt Recovers Compressed LLMs, Transferably

Blind Optical Aberration Correction by Exploring Geometric and Visual Priors

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Stability Analysis and Generalization Bounds of Adversarial Training

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

OST: Improving Generalization of DeepFake Detection via One-Shot Test-Time Training

One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations

Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation