Xiang Bai
81
Papers
1,047
Total Citations
Papers (81)
EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection
ECCV 2020
440
citations
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
CVPR 2024
384
citations
General Object Foundation Model for Images and Videos at Scale
CVPR 2024
79
citations
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
ICCV 2025arXiv
62
citations
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
ICCV 2025
22
citations
SEED: A Simple and Effective 3D DETR in Point Clouds
ECCV 2024
19
citations
OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection
ECCV 2024
12
citations
Bridging the Gap Between End-to-End and Two-Step Text Spotting
CVPR 2024
11
citations
AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
ICCV 2025
10
citations
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
ICCV 2025arXiv
4
citations
PlayerOne: Egocentric World Simulator
NeurIPS 2025
3
citations
Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval
ICCV 2025
1
citations
DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection
CVPR 2015
0
citations
Object Skeleton Extraction in Natural Images by Fusing Scale-Associated Deep Side Outputs
CVPR 2016
0
citations
Multi-Oriented Text Detection With Fully Convolutional Networks
CVPR 2016
0
citations
Robust Scene Text Recognition With Automatic Rectification
CVPR 2016
0
citations
GIFT: A Real-Time and Scalable 3D Shape Search Engine
CVPR 2016
0
citations
Scalable Person Re-Identification on Supervised Smoothed Manifold
CVPR 2017arXiv
0
citations
Detecting Oriented Text in Natural Images by Linking Segments
CVPR 2017arXiv
0
citations
Multiple Instance Detection Network With Online Instance Classifier Refinement
CVPR 2017arXiv
0
citations
Richer Convolutional Features for Edge Detection
CVPR 2017arXiv
0
citations
Triplet-Center Loss for Multi-View 3D Object Retrieval
CVPR 2018arXiv
0
citations
DOTA: A Large-Scale Dataset for Object Detection in Aerial Images
CVPR 2018arXiv
0
citations
Rotation-Sensitive Regression for Oriented Scene Text Detection
CVPR 2018arXiv
0
citations
Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation
CVPR 2018arXiv
0
citations
Progressive Pose Attention Transfer for Person Image Generation
CVPR 2019
0
citations
DeepFlux for Skeletons in the Wild
CVPR 2019
0
citations
Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation
CVPR 2020
0
citations
Semantically Multi-Modal Image Synthesis
CVPR 2020arXiv
0
citations
Improving OCR-Based Image Captioning by Incorporating Geometrical Relationship
CVPR 2021
0
citations
Scene Text Retrieval via Joint Text Detection and Similarity Learning
CVPR 2021arXiv
0
citations
Multi-Shot Temporal Event Localization: A Benchmark
CVPR 2021arXiv
0
citations
MOST: A Multi-Oriented Scene Text Detector With Localization Refinement
CVPR 2021arXiv
0
citations
Knowledge Mining With Scene Text for Fine-Grained Recognition
CVPR 2022arXiv
0
citations
An Empirical Study of End-to-End Temporal Action Detection
CVPR 2022arXiv
0
citations
Vision-Language Pre-Training for Boosting Scene Text Detectors
CVPR 2022arXiv
0
citations
Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection
CVPR 2022arXiv
0
citations
Syntax-Aware Network for Handwritten Mathematical Expression Recognition
CVPR 2022arXiv
0
citations
InstMove: Instance Motion for Object-Centric Video Segmentation
CVPR 2023arXiv
0
citations
Turning a CLIP Model Into a Scene Text Detector
CVPR 2023arXiv
0
citations
CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
CVPR 2023arXiv
0
citations
Side Adapter Network for Open-Vocabulary Semantic Segmentation
CVPR 2023arXiv
0
citations
SOOD: Towards Semi-Supervised Oriented Object Detection
CVPR 2023arXiv
0
citations
CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model
CVPR 2023arXiv
0
citations
Modeling Entities As Semantic Points for Visual Information Extraction in the Wild
CVPR 2023arXiv
0
citations
Relaxed Multiple-Instance SVM With Application to Object Discovery
ICCV 2015
0
citations
Ensemble Diffusion for Retrieval
ICCV 2017
0
citations
Asymmetric Non-Local Neural Networks for Semantic Segmentation
ICCV 2019
0
citations
View N-Gram Network for 3D Object Retrieval
ICCV 2019
0
citations
MINIMA: Modality Invariant Image Matching
CVPR 2025
0
citations
Symmetry-Constrained Rectification Network for Scene Text Recognition
ICCV 2019
0
citations
End-to-End Semi-Supervised Object Detection With Soft Teacher
ICCV 2021arXiv
0
citations
A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection
ICCV 2023
0
citations
ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
ICCV 2023arXiv
0
citations
Intra-class Feature Variation Distillation for Semantic Segmentation
ECCV 2020
0
citations
Scene Text Image Super-resolution in the wild
ECCV 2020
0
citations
Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting
ECCV 2020
0
citations
AutoSTR: Efficient Backbone Search for Scene Text Recognition
ECCV 2020
0
citations
An End-to-End Transformer Model for Crowd Localization
ECCV 2022
0
citations
GitNet: Geometric Prior-Based Transformation for Birds-Eye-View Segmentation
ECCV 2022
0
citations
CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer
ECCV 2022
0
citations
When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition
ECCV 2022
0
citations
Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning
ECCV 2022
0
citations
Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition
ECCV 2022
0
citations
SeqFormer: Sequential Transformer for Video Instance Segmentation
ECCV 2022
0
citations
In Defense of Online Models for Video Instance Segmentation
ECCV 2022
0
citations
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-Language Model
ECCV 2022
0
citations
Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting
ICCV 2019
0
citations
A Unified Image-Dense Annotation Generation Model for Underwater Scenes
CVPR 2025
0
citations
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting
CVPR 2025
0
citations
LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance
ICCV 2025
0
citations
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
ICCV 2025
0
citations
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
ICCV 2025
0
citations
Towards Comprehensive Lecture Slides Understanding: Large-scale Dataset and Effective Method
ICCV 2025
0
citations
Multi-scenario Overlapping Text Segmentation with Depth Awareness
ICCV 2025
0
citations
Training-free Geometric Image Editing on Diffusion Models
ICCV 2025arXiv
0
citations
OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition
CVPR 2024
0
citations
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis
CVPR 2024
0
citations
Symmetry-Based Text Line Detection in Natural Scenes
CVPR 2015
0
citations
Bootstrap Your Object Detector via Mixed Training
NeurIPS 2021
0
citations
Query-based Temporal Fusion with Explicit Motion for 3D Object Detection
NeurIPS 2023
0
citations