Xiang Bai

81
Papers
1,047
Total Citations

Papers (81)

EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection

ECCV 2020
440
citations

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

CVPR 2024
384
citations

General Object Foundation Model for Images and Videos at Scale

CVPR 2024
79
citations

ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation

ICCV 2025arXiv
62
citations

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

ICCV 2025
22
citations

SEED: A Simple and Effective 3D DETR in Point Clouds

ECCV 2024
19
citations

OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection

ECCV 2024
12
citations

Bridging the Gap Between End-to-End and Two-Step Text Spotting

CVPR 2024
11
citations

AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation

ICCV 2025
10
citations

DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding

ICCV 2025arXiv
4
citations

PlayerOne: Egocentric World Simulator

NeurIPS 2025
3
citations

Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval

ICCV 2025
1
citations

DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection

CVPR 2015
0
citations

Object Skeleton Extraction in Natural Images by Fusing Scale-Associated Deep Side Outputs

CVPR 2016
0
citations

Multi-Oriented Text Detection With Fully Convolutional Networks

CVPR 2016
0
citations

Robust Scene Text Recognition With Automatic Rectification

CVPR 2016
0
citations

GIFT: A Real-Time and Scalable 3D Shape Search Engine

CVPR 2016
0
citations

Scalable Person Re-Identification on Supervised Smoothed Manifold

CVPR 2017arXiv
0
citations

Detecting Oriented Text in Natural Images by Linking Segments

CVPR 2017arXiv
0
citations

Multiple Instance Detection Network With Online Instance Classifier Refinement

CVPR 2017arXiv
0
citations

Richer Convolutional Features for Edge Detection

CVPR 2017arXiv
0
citations

Triplet-Center Loss for Multi-View 3D Object Retrieval

CVPR 2018arXiv
0
citations

DOTA: A Large-Scale Dataset for Object Detection in Aerial Images

CVPR 2018arXiv
0
citations

Rotation-Sensitive Regression for Oriented Scene Text Detection

CVPR 2018arXiv
0
citations

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

CVPR 2018arXiv
0
citations

Progressive Pose Attention Transfer for Person Image Generation

CVPR 2019
0
citations

DeepFlux for Skeletons in the Wild

CVPR 2019
0
citations

Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation

CVPR 2020
0
citations

Semantically Multi-Modal Image Synthesis

CVPR 2020arXiv
0
citations

Improving OCR-Based Image Captioning by Incorporating Geometrical Relationship

CVPR 2021
0
citations

Scene Text Retrieval via Joint Text Detection and Similarity Learning

CVPR 2021arXiv
0
citations

Multi-Shot Temporal Event Localization: A Benchmark

CVPR 2021arXiv
0
citations

MOST: A Multi-Oriented Scene Text Detector With Localization Refinement

CVPR 2021arXiv
0
citations

Knowledge Mining With Scene Text for Fine-Grained Recognition

CVPR 2022arXiv
0
citations

An Empirical Study of End-to-End Temporal Action Detection

CVPR 2022arXiv
0
citations

Vision-Language Pre-Training for Boosting Scene Text Detectors

CVPR 2022arXiv
0
citations

Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection

CVPR 2022arXiv
0
citations

Syntax-Aware Network for Handwritten Mathematical Expression Recognition

CVPR 2022arXiv
0
citations

InstMove: Instance Motion for Object-Centric Video Segmentation

CVPR 2023arXiv
0
citations

Turning a CLIP Model Into a Scene Text Detector

CVPR 2023arXiv
0
citations

CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

CVPR 2023arXiv
0
citations

Side Adapter Network for Open-Vocabulary Semantic Segmentation

CVPR 2023arXiv
0
citations

SOOD: Towards Semi-Supervised Oriented Object Detection

CVPR 2023arXiv
0
citations

CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model

CVPR 2023arXiv
0
citations

Modeling Entities As Semantic Points for Visual Information Extraction in the Wild

CVPR 2023arXiv
0
citations

Relaxed Multiple-Instance SVM With Application to Object Discovery

ICCV 2015
0
citations

Ensemble Diffusion for Retrieval

ICCV 2017
0
citations

Asymmetric Non-Local Neural Networks for Semantic Segmentation

ICCV 2019
0
citations

View N-Gram Network for 3D Object Retrieval

ICCV 2019
0
citations

MINIMA: Modality Invariant Image Matching

CVPR 2025
0
citations

Symmetry-Constrained Rectification Network for Scene Text Recognition

ICCV 2019
0
citations

End-to-End Semi-Supervised Object Detection With Soft Teacher

ICCV 2021arXiv
0
citations

A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection

ICCV 2023
0
citations

ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

ICCV 2023arXiv
0
citations

Intra-class Feature Variation Distillation for Semantic Segmentation

ECCV 2020
0
citations

Scene Text Image Super-resolution in the wild

ECCV 2020
0
citations

Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting

ECCV 2020
0
citations

AutoSTR: Efficient Backbone Search for Scene Text Recognition

ECCV 2020
0
citations

An End-to-End Transformer Model for Crowd Localization

ECCV 2022
0
citations

GitNet: Geometric Prior-Based Transformation for Birds-Eye-View Segmentation

ECCV 2022
0
citations

CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer

ECCV 2022
0
citations

When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition

ECCV 2022
0
citations

Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning

ECCV 2022
0
citations

Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition

ECCV 2022
0
citations

SeqFormer: Sequential Transformer for Video Instance Segmentation

ECCV 2022
0
citations

In Defense of Online Models for Video Instance Segmentation

ECCV 2022
0
citations

A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-Language Model

ECCV 2022
0
citations

Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting

ICCV 2019
0
citations

A Unified Image-Dense Annotation Generation Model for Underwater Scenes

CVPR 2025
0
citations

SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting

CVPR 2025
0
citations

LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance

ICCV 2025
0
citations

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

ICCV 2025
0
citations

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

ICCV 2025
0
citations

Towards Comprehensive Lecture Slides Understanding: Large-scale Dataset and Effective Method

ICCV 2025
0
citations

Multi-scenario Overlapping Text Segmentation with Depth Awareness

ICCV 2025
0
citations

Training-free Geometric Image Editing on Diffusion Models

ICCV 2025arXiv
0
citations

OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition

CVPR 2024
0
citations

Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis

CVPR 2024
0
citations

Symmetry-Based Text Line Detection in Natural Scenes

CVPR 2015
0
citations

Bootstrap Your Object Detector via Mixed Training

NeurIPS 2021
0
citations

Query-based Temporal Fusion with Explicit Motion for 3D Object Detection

NeurIPS 2023
0
citations