Sifei Liu

47

Papers

374

Total Citations

Papers (47)

Learning Affinity via Spatial Propagation Networks

NeurIPS 2017arXiv

Describe Anything: Detailed Localized Image and Video Captioning

BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Parallel Sequence Modeling via Generalized Spatial Propagation Network

3D-SPATIAL MULTIMODAL MEMORY

A Unified Approach for Text- and Image-guided 4D Scene Generation

HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

Communication-Efficient Collaborative Perception via Information Filling with Codebook

RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos

Compositional Text-to-Image Generation with Dense Blob Representations

Multi-Objective Convolutional Learning for Face Labeling

Generative Face Completion

Learning Dual Convolutional Neural Networks for Low-Level Vision

SCOPS: Self-Supervised Co-Part Segmentation

Learning Linear Transformations for Fast Image and Video Style Transfer

Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments

Self-Supervised Viewpoint Learning From Image Collections

Semi-Supervised 3D Hand-Object Poses Estimation With Interactions in Time

Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes

Learning to Track Instances without Video Annotations

Learning Continuous Image Representation With Local Implicit Image Function

CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs

GroupViT: Semantic Segmentation Emerges From Text Supervision

Zero-Shot Pose Transfer for Unrigged Stylized 3D Characters

Self-Supervised Super-Plane for Neural 3D Reconstruction

Affordance Diffusion: Synthesizing Hand-Object Interactions

Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

Learning Propagation for Arbitrarily-Structured Data

Video Autoencoder: Self-Supervised Disentanglement of Static 3D Structure and Motion

Self-Supervised Object Detection via Generative Image Synthesis

Video Matting via Consistency-Regularized Graph Neural Networks

Self-supervised Single-view 3D Reconstruction via Semantic Consistency

Autoregressive 3D Shape Generation via Canonical Mapping

Scraping Textures from Natural Images for Synthesis and Editing

Open-Vocabulary Panoptic Segmentation With Text-to-Image Diffusion Models

Scaling Vision Pre-Training to 4K Resolution

NVILA: Efficient Frontier Visual Language Models

Token-Efficient VLM: High-Resolution Image Understanding via Dynamic Region Proposal

COLMAP-Free 3D Gaussian Splatting

RegionGPT: Towards Region Understanding Vision Language Model

Context-aware Synthesis and Placement of Object Instances

Joint-task Self-supervised Learning for Temporal Correspondence

Online Adaptation for Consistent Mesh Reconstruction in the Wild

Coupled Segmentation and Edge Learning via Dynamic Graph Propagation

Learning 3D Dense Correspondence via Canonical Point Autoencoder

Generalizable One-shot 3D Neural Head Avatar