Wei Liu

148

Papers

698

Total Citations

1

Affiliations

Affiliations

The Hong Kong University of Science and Technology

Papers (148)

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Discrete Hyper-Graph Matching

BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

MathAttack: Attacking Large Language Models towards Math Solving Ability

IDOL: Instant Photorealistic 3D Human Creation from a Single Image

STIV: Scalable Text and Image Conditioned Video Generation

MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls

Local Conditional Controlling for Text-to-Image Diffusion Models

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

Auto-Regressive Diffusion for Generating 3D Human-Object Interactions

Fix-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text

Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets

EBMDock: Neural Probabilistic Protein-Protein Docking via a Differentiable Energy Model

Enhancing Human Experience in Human-Agent Collaboration: A Human-Centered Modeling Approach Based on Positive Human Gain

ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering

Stability and Generalization of Zeroth-Order Decentralized Stochastic Gradient Descent with Changing Topology

Quantifying Distributional Invariance in Causal Subgraph for IRM-Free Graph Generalization

Supervised Discrete Hashing

Saliency Propagation From Simple to Difficult

Towards 3D Object Detection With Bimodal Deep Boltzmann Machines Over RGBD Imagery

Understanding Image Structure via Hierarchical Shape Parsing

Tensor Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Tensors via Convex Optimization

Real-Time Neural Style Transfer for Videos

Deep Self-Taught Learning for Weakly Supervised Object Localization

Diverse Image Annotation

SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning

Frustum PointNets for 3D Object Detection From RGB-D Data

Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks

Learning to Generate Time-Lapse Videos Using Multi-Stage Dynamic Generative Adversarial Networks

Gated Fusion Network for Single Image Dehazing

Left-Right Comparative Recurrent Model for Stereo Matching

Dual Skipping Networks

Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval

CosFace: Large Margin Cosine Loss for Deep Face Recognition

CNN in MRF: Video Object Segmentation via Inference in a CNN-Based Higher-Order Spatio-Temporal MRF

Bidirectional Attentive Fusion With Context Gating for Dense Video Captioning

Reconstruction Network for Video Captioning

Tagging Like Humans: Diverse and Distinct Image Annotation

Regularizing RNNs for Caption Generation by Reconstructing the Past With the Present

MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation

MVF-Net: Multi-View 3D Face Morphable Model Regression

Spatio-Temporal Video Re-Localization by Warp LSTM

Unsupervised Deep Tracking

DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs

NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction

Joint Representation and Estimator Learning for Facial Action Unit Intensity Estimation

Face Anti-Spoofing: Model Matters, so Does Data

Decorrelated Adversarial Learning for Age-Invariant Face Recognition

Multi-Granularity Generator for Temporal Action Proposal

Compressing Convolutional Neural Networks via Factorized Convolutional Filters

Self-Supervised Spatio-Temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

Residual Regression With Semantic Prior for Crowd Counting

Deep Spectral Clustering Using Dual Autoencoder Network

Unsupervised Image Captioning

Exact Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables

Learning Joint Gait Representation via Quintuplet Loss Minimization

High-Level Semantic Feature Detection: A New Perspective for Pedestrian Detection

Learning to Compose Dynamic Tree Structures for Visual Contexts

Efficient Decision-Based Black-Box Adversarial Attacks on Face Recognition

Image Deformation Meta-Networks for One-Shot Learning

A Sufficient Condition for Convergences of Adam and RMSProp

Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning

Central Similarity Quantization for Efficient Image and Video Retrieval

Deblurring by Realistic Blurring

Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content

MTL-NAS: Task-Agnostic Neural Architecture Search Towards General-Purpose Multi-Task Learning

Human-Like Controllable Image Captioning With Verb-Specific Semantic Roles

VideoMoCo: Contrastive Video Representation Learning With Temporally Adversarial Examples

Disentangled Cycle Consistency for Highly-Realistic Virtual Try-On

ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows

Parser-Free Virtual Try-On via Distilling Appearance Flows

DeFLOCNet: Deep Image Editing via Flexible Low-Level Controls

Generalizing Face Forgery Detection With High-Frequency Features

Coherent Point Drift Revisited for Non-Rigid Shape Matching and Registration

SWEM: Towards Real-Time Video Object Segmentation With Sequential Weighted Expectation-Maximization

Improving Visual Grounding With Visual-Linguistic Verification and Iterative Reasoning

XMP-Font: Self-Supervised Cross-Modality Pre-Training for Few-Shot Font Generation

Seeing What You Miss: Vision-Language Pre-Training With Semantic Completion Learning

Top Rank Supervised Binary Coding for Visual Search

Learning Binary Codes for Maximum Inner Product Search

Detecting Faces Using Inside Cascaded Contextual CNN

Semi-Global Weighted Least Squares in Image Filtering

Occlusion Robust Face Recognition Based on Mask Learning With Pairwise Differential Siamese Network

Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network

Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion

Learning a Mixture of Granularity-Specific Experts for Fine-Grained Categorization

Chinese Street View Text: Large-Scale Chinese Text Reading With Partially Supervised Learning

Leveraging Long-Range Temporal Relationships Between Proposals for Video Object Detection

Benchmarking Ultra-High-Definition Image Super-Resolution

SynFace: Face Recognition With Synthetic Data

Pyramid Architecture Search for Real-Time Image Deblurring

Adversarial Attack on Deep Cross-Modal Hamming Retrieval

Heterogeneous Diversity Driven Active Learning for Multi-Object Tracking

Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos

Face Super-Resolution Guided by 3D Facial Priors

PointPWC-Net: Cost Volume on Point Clouds for (Self-)Supervised Scene Flow Estimation

Context-Gated Convolution

Masked Autoencoders for Point Cloud Self-Supervised Learning

Hardly Perceptible Trojan Attack against Neural Networks with Bit Flips

Triangle Attack: A Query-Efficient Decision-Based Adversarial Attack

Towards Efficient Adversarial Training on Vision Transformers

Improving Vision Transformers by Revisiting High-Frequency Components

Mixture-Rank Matrix Approximation for Collaborative Filtering

Geometric Descent Method for Convex Composite Minimization

NeurIPS 2017arXiv

RFNet: Unsupervised Network for Mutually Reinforcing Multi-Modal Image Registration and Fusion

Enhancing Testing-Time Robustness for Trusted Multi-View Classification in the Wild

WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions

GraspCoT: Integrating Physical Property Reasoning for 6-DoF Grasping under Flexible Language Instructions

HarmonySeg: Tubular Structure Segmentation with Deep-Shallow Feature Fusion and Growth-Suppression Balanced Loss

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

Towards More Discriminative Feature Learning in SNNs with Temporal-Self-Erasing Supervision

Infinite-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation

Follow-Your-Click: Open-domain Regional Image Animation via Motion Prompts

Just a Few Glances: Open-Set Visual Perception with Image Prompt Paradigm

Modeling All Response Surfaces in One for Conditional Search Spaces

Enhancing Multi-View Classification Reliability with Adaptive Rejection

Decoupling Representation and Knowledge for Few-Shot Intent Classification and Slot Filling

DreamIdentity: Enhanced Editability for Efficient Face-Identity Preserved Image Generation

SeqGPT: An Out-of-the-Box Large Language Model for Open Domain Sequence Understanding

Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration

UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation

Going Deeper With Convolutions

Deep Non-Blind Deconvolution via Generalized Low-Rank Approximation

Distilled Wasserstein Learning for Word Embedding and Topic Modeling

Generalizing Graph Matching beyond Quadratic Assignment Model

Parsimonious Quantile Regression of Financial Asset Tail Dynamics via Sequential Learning

Nonlocal Neural Networks, Nonlocal Diffusion and Nonlocal Modeling

Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos

Category Anchor-Guided Unsupervised Domain Adaptation for Semantic Segmentation

Cross-Modal Learning with Adversarial Samples

Towards Playing Full MOBA Games with Deep Reinforcement Learning

Optimal Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization

Adversarial Learning for Robust Deep Clustering

Fewer is More: A Deep Graph Metric Learning Perspective Using Fewer Proxies

Generalized and Discriminative Few-Shot Object Detection via SVD-Dictionary Enhancement

Neural Routing by Memory

FR: Folded Rationalization with a Unified Encoder

Egocentric Video-Language Pretraining

D-Separation for Causal Self-Explanation

Punctuation-level Attack: Single-shot and Single Punctuation Can Fool Text Models

Exploiting Contextual Objects and Relations for 3D Visual Grounding

Evaluating Post-hoc Explanations for Graph Neural Networks via Robustness Analysis

GSOS: Gauss-Seidel Operator Splitting Algorithm for Multi-Term Nonsmooth Convex Composite Optimization

Scaling Up Sparse Support Vector Machines by Simultaneous Feature and Sample Reduction

End-to-end Active Object Tracking via Reinforcement Learning

An Algorithmic Framework of Variable Metric Over-Relaxed Hybrid Proximal Extra-Gradient Method

Safe Element Screening for Submodular Function Minimization