Song Bai

46
Papers
463
Total Citations

Papers (46)

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

CVPR 2024
308
citations

General Object Foundation Model for Images and Videos at Scale

CVPR 2024
79
citations

Regional Homogeneity: Towards Learning Transferable Universal Adversarial Perturbations Against Defenses

ECCV 2020
58
citations

DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data

CVPR 2024
17
citations

Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval

ICCV 2025
1
citations

Re-Ranking via Metric Fusion for Object Retrieval and Person Re-Identification

CVPR 2019
0
citations

Learning Attraction Field Representation for Robust Line Segment Detection

CVPR 2019
0
citations

Improving Transferability of Adversarial Examples With Input Diversity

CVPR 2019
0
citations

Holistically-Attracted Wireframe Parsing

CVPR 2020arXiv
0
citations

Neural Architecture Search for Lightweight Non-Local Networks

CVPR 2020arXiv
0
citations

Multi-Shot Temporal Event Localization: A Benchmark

CVPR 2021arXiv
0
citations

SwiftNet: Real-Time Video Object Segmentation

CVPR 2021arXiv
0
citations

Anchor-Free Person Search

CVPR 2021arXiv
0
citations

Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning

CVPR 2022arXiv
0
citations

An Empirical Study of End-to-End Temporal Action Detection

CVPR 2022arXiv
0
citations

Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability

CVPR 2022
0
citations

Fourier Document Restoration for Robust Document Dewarping and Recognition

CVPR 2022arXiv
0
citations

TransMix: Attend To Mix for Vision Transformers

CVPR 2022arXiv
0
citations

YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset

CVPR 2022
0
citations

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

CVPR 2022arXiv
0
citations

PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

CVPR 2023arXiv
0
citations

InstMove: Instance Motion for Object-Centric Video Segmentation

CVPR 2023arXiv
0
citations

Ensemble Diffusion for Retrieval

ICCV 2017
0
citations

Asymmetric Non-Local Neural Networks for Semantic Segmentation

ICCV 2019
0
citations

Anchor Diffusion for Unsupervised Video Object Segmentation

ICCV 2019
0
citations

CenterNet: Keypoint Triplets for Object Detection

ICCV 2019
0
citations

View N-Gram Network for 3D Object Retrieval

ICCV 2019
0
citations

Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting

ICCV 2019
0
citations

Symmetry-Constrained Rectification Network for Scene Text Recognition

ICCV 2019
0
citations

Prior-Aware Neural Network for Partially-Supervised Multi-Organ Segmentation

ICCV 2019
0
citations

PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

ICCV 2021arXiv
0
citations

Versatile Transition Generation with Image-to-Video Diffusion

ICCV 2025
0
citations

SRFormer: Permuted Self-Attention for Single Image Super-Resolution

ICCV 2023arXiv
0
citations

Corner Proposal Network for Anchor-free, Two-stage Object Detection

ECCV 2020
0
citations

XingGAN for Person Image Generation

ECCV 2020
0
citations

Explicit Occlusion Reasoning for Multi-Person 3D Human Pose Estimation

ECCV 2022
0
citations

Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting

ECCV 2022
0
citations

Contextual Text Block Detection towards Scene Text Understanding

ECCV 2022
0
citations

SeqFormer: Sequential Transformer for Video Instance Segmentation

ECCV 2022
0
citations

In Defense of Online Models for Video Instance Segmentation

ECCV 2022
0
citations

MOSE: A New Dataset for Video Object Segmentation in Complex Scenes

ICCV 2023arXiv
0
citations

TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding

ICCV 2025
0
citations

GIFT: A Real-Time and Scalable 3D Shape Search Engine

CVPR 2016
0
citations

Scalable Person Re-Identification on Supervised Smoothed Manifold

CVPR 2017arXiv
0
citations

Triplet-Center Loss for Multi-View 3D Object Retrieval

CVPR 2018arXiv
0
citations

Mixed Samples as Probes for Unsupervised Model Selection in Domain Adaptation

NeurIPS 2023
0
citations