Long Mai

19

Papers

1

Total Citations

Papers (19)

TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models

Progressive Growing of Video Tokenizers for Temporally Compact Latent Spaces

REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder

Kernel Fusion for Better Image Deblurring

Composition-Preserving Deep Photo Aesthetics Assessment

Video Frame Interpolation via Adaptive Convolution

Spatial-Semantic Image Search by Visual Feature Synthesis

Strike (With) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects

Structure-Guided Ranking Loss for Single Image Depth Prediction

Context-Aware Group Captioning via Self-Attention and Contrastive Features

Active Speakers in Context

Learning To Recover 3D Scene Shape From a Single Image

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Motion-Adjustable Neural Implicit Video Representation

Video Frame Interpolation via Adaptive Separable Convolution

MultiSeg: Semantically Meaningful, Scale-Diverse Segmentations From Minimal User Input

An Internal Learning Approach to Video Inpainting

GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting

BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images