Jianfei Cai

49

Papers

277

Total Citations

Papers (49)

HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression

DrVideo: Document Retrieval Based Long Video Understanding

How Far Can We Compress Instant-NGP-Based NeRF?

Diversified and Personalized Multi-rater Medical Image Segmentation

Efficient Stitchable Task Adaptation

McGrids: Monte Carlo-Driven Adaptive Grids for Iso-Surface Extraction

Differentiable Convex Polyhedra Optimization from Multi-view Images

Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis

PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior

Stitched ViTs are Flexible Vision Backbones

Generative Region-Language Pretraining for Open-Ended Object Detection

Taming Stable Diffusion for Text to 360 Panorama Image Generation

JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

Sharpness-Aware Data Generation for Zero-shot Quantization

Exploring Bottom-Up and Top-Down Cues With Attentive Learning for Webly Supervised Object Detection

End-to-End 3D Point Cloud Instance Segmentation Without Detection

The Spatially-Correlative Loss for Various Image Translation Tasks

RSG: A Simple but Effective Module for Learning Imbalanced Datasets

Causal Attention for Vision-Language Tasks

GMFlow: Learning Optical Flow via Global Matching

Bridging Global Context Interactions for High-Fidelity Image Completion

ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues

Dynamic Focus-Aware Positional Queries for Semantic Segmentation

MARLIN: Masked Autoencoder for Facial Video Representation LearnINg

Transformer Scale Gate for Semantic Segmentation

Stitchable Neural Networks

JRDB-Pose: A Large-Scale Dataset for Multi-Person Pose Estimation and Tracking

CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing

Domain-Invariant Disentangled Network for Generalizable Object Detection

High-Resolution Optical Flow From 1D Attention and Correlation

Learning Meta-Class Memory for Few-Shot Semantic Segmentation

A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder

Scalable Vision Transformers With Hierarchical Pooling

Auto-Parsing Network for Image Captioning and Visual Question Answering

ObjectSDF++: Improved Object-Compositional Neural Implicit Surfaces

Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning

Learning Progressive Joint Propagation for Human Motion Prediction

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

Splitting vs. Merging: Mining Object Regions with Discrepancy and Intersection Loss for Weakly Supervised Semantic Segmentation

ExtrudeNet: Unsupervised Inverse Sketch-and-Extrude for Shape Parsing

Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields

Object-Compositional Neural Implicit Surfaces

Dual Adaptive Transformations for Weakly Supervised Point Cloud Segmentation

Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation

Self-Supervised Relationship Probing

EcoFormer: Energy-Saving Attention with Linear Complexity

NeurIPS 2022arXiv

Fast Vision Transformers with HiLo Attention

NeurIPS 2022arXiv

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

NeurIPS 2022arXiv