Jian Wang

74

Papers

530

Total Citations

Papers (74)

SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery

Premise Selection for Theorem Proving by Deep Graph Embedding

NeurIPS 2017arXiv

RobustSAM: Segment Anything Robustly on Degraded Images

Cooper: Coordinating Specialized Agents towards a Complex Dialogue Goal

KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems

DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer

Robust Communicative Multi-Agent Reinforcement Learning with Active Defense

Training-Free Text-Guided Image Editing with Visual Autoregressive Model

Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input

POT: Prototypical Optimal Transport for Weakly Supervised Semantic Segmentation

Delving Deep into Engagement Prediction of Short Videos

EcoMatcher: Efficient Clustering Oriented Matcher for Detector-free Image Matching

SceneMI: Motion In-betweening for Modeling Human-Scene Interaction

Discrete Curvature Graph Information Bottleneck

Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation

Style Quantization for Data-Efficient GAN Training

FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video

DeepFLASH: An Efficient Network for Learning-Based Medical Image Registration

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

One Shot Face Swapping on Megapixels

Seeing in Extra Darkness Using a Deep-Red Flash

Human-Object Interaction Detection via Disentangled Transformer

MixFormer: Mixing Features Across Windows and Dimensions

Estimating Egocentric 3D Human Pose in the Wild With External Weak Supervision

Training Object Detectors From Scratch: An Empirical Study in the Era of Vision Transformer

Implicit Sample Extension for Unsupervised Person Re-Identification

3D Photo Stylization: Learning To Generate Stylized Novel Views From a Single Image

Energy-Efficient Adaptive 3D Sensing

Scene-Aware Egocentric 3D Human Pose Estimation

PSVT: End-to-End Multi-Person 3D Pose and Shape Estimation With Progressive Video Transformers

Simultaneously Short- and Long-Term Temporal Modeling for Semi-Supervised Video Semantic Segmentation

Photometric Stereo With Small Angular Variations

Deep Metric Learning With Angular Loss

Reflectance Capture Using Univariate Sampling of BRDFs

Micro-Baseline Structured Light

Agile Depth Sensing Using Triangulation Light Curtains

Mining Contextual Information Beyond Image for Semantic Segmentation

MFNet: Multi-Filter Directive Network for Weakly Supervised Salient Object Detection

Estimating Egocentric 3D Human Pose in Global Space

KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception

Group Pose: A Simple Baseline for End-to-End Multi-Person Pose Estimation

s-Adaptive Decoupled Prototype for Few-Shot Object Detection

Unified Pre-Training with Pseudo Texts for Text-To-Image Person Re-Identification

Uncertainty-guided Learning for Improving Image Manipulation Detection

Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement

Action Quality Assessment with Temporal Parsing Transformer

UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture

Seeing Far in the Dark with Patterned Flash

UFO: Unified Feature Optimization

Hierarchical Memory Learning for Fine-Grained Scene Graph Generation

Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment

SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling

Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation

T2Bs: Text-to-Character Blendshapes via Video Generation

TextMaster: A Unified Framework for Realistic Text Editing via Glyph-Style Dual-Control

RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation

Class Token as Proxy: Optimal Transport-assisted Proxy Learning for Weakly Supervised Semantic Segmentation

Similar Modality Enhancement and Action Consistency Learning for Weakly Supervised Temporal Action Localization

Federated Recommendation with Explicitly Encoding Item Bias

3D Human Pose Perception from Egocentric Stereo Videos

Towards Better Vision-Inspired Vision-Language Models

EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams

REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning

Exponential Spectral Pursuit: An Effective Initialization Method for Sparse Phase Retrieval

Mobile Attention: Mobile-Friendly Linear-Attention for Vision Transformers

MS$^3$D: A RG Flow-Based Regularization for GAN Training with Limited Data

Re-Identification Supervised Texture Generation

Watch out! Motion is Blurring the Vision of Your Deep Neural Networks

Group Contextual Encoding for 3D Point Clouds

RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer

Geo-SIC: Learning Deformable Geometric Shapes in Deep Image Classifiers

Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning

A Unified Conditional Framework for Diffusion-based Image Restoration

HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception