Boqing Gong

51

Papers

630

Total Citations

Papers (51)

Language Model Beats Diffusion - Tokenizer is key to visual generation

Improved Dropout for Shallow and Deep Learning

NeurIPS 2016arXiv

Distilling Vision-Language Models on Millions of Videos

HypDAE: Hyperbolic Diffusion Autoencoders for Hierarchical Few-shot Image Generation

BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning

VideoPrism: A Foundational Visual Encoder for Video Understanding

Learning Attributes Equals Multi-Source Domain Generalization

Synthesized Classifiers for Zero-Shot Learning

Fast Zero-Shot Image Tagging

Query-Focused Video Summarization: Dataset, Evaluation, and a Memory Network Based Approach

Improving Facial Attribute Prediction Using Semantic Segmentation

Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning

Deep Face Detector Adaptation Without Negative Transfer or Catastrophic Forgetting

End-to-End Learning of Motion Representation for Video Understanding

Large-Scale Long-Tailed Recognition in an Open World

Not All Frames Are Equal: Weakly-Supervised Video Grounding With Contextual Similarity and Visual Clustering Losses

Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model

Adversarial Examples Improve Image Recognition

Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition From a Domain Adaptation Perspective

PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation

Open Compound Domain Adaptation

Ranking Neural Checkpoints

Robust and Accurate Object Detection via Adversarial Learning

Complete & Label: A Domain Adaptation Approach to Semantic Segmentation of LiDAR Point Clouds

Adversarially Adaptive Normalization for Single Domain Generalization

Spatiotemporal Contrastive Video Representation Learning

MoViNets: Mobile Video Networks for Efficient Video Recognition

Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision

On Calibrating Semantic Segmentation Models: Analyses and an Algorithm

VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation

Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes

Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization Without Accessing Target Domain Data

A Fast and Accurate One-Stage Approach to Visual Grounding

Constructing Self-Motivated Pyramid Curriculums for Cross-Domain Semantic Segmentation: A Non-Adversarial Approach

A Lazy Approach to Long-Horizon Gradient-Based Meta-Learning

MosaicOS: A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection

Unified Visual Relationship Detection with Vision and Language Models

Improving Object Detection with Selective Self-Supervised Self-Training

Anti-Neuron Watermarking: Protecting Personal Data against Unauthorized Neural Networks

LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds

Contrastive Learning for Label Efficient Semantic Segmentation

Attention to Neural Plagiarism: Diffusion Models Can Plagiarize Your Copyrighted Images!

VideoAds for Fast-Paced Video Understanding

SITE: towards Spatial Intelligence Thorough Evaluation

On Discrete Prompt Optimization for Diffusion Models

Synthesized Policies for Transfer and Adaptation across Tasks and Environments

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

Video Timeline Modeling For News Story Understanding

Module-wise Adaptive Distillation for Multimodality Foundation Models

NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks