Xiang Bai
22
Papers
607
Total Citations
Papers (22)
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
CVPR 2024
384
citations
General Object Foundation Model for Images and Videos at Scale
CVPR 2024
79
citations
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
ICCV 2025arXiv
62
citations
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
ICCV 2025
22
citations
SEED: A Simple and Effective 3D DETR in Point Clouds
ECCV 2024
19
citations
OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection
ECCV 2024
12
citations
Bridging the Gap Between End-to-End and Two-Step Text Spotting
CVPR 2024
11
citations
AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
ICCV 2025
10
citations
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
ICCV 2025arXiv
4
citations
PlayerOne: Egocentric World Simulator
NeurIPS 2025
3
citations
Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval
ICCV 2025
1
citations
LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance
ICCV 2025
0
citations
Training-free Geometric Image Editing on Diffusion Models
ICCV 2025arXiv
0
citations
OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition
CVPR 2024
0
citations
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting
CVPR 2025
0
citations
A Unified Image-Dense Annotation Generation Model for Underwater Scenes
CVPR 2025
0
citations
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis
CVPR 2024
0
citations
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
ICCV 2025
0
citations
MINIMA: Modality Invariant Image Matching
CVPR 2025
0
citations
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
ICCV 2025
0
citations
Towards Comprehensive Lecture Slides Understanding: Large-scale Dataset and Effective Method
ICCV 2025
0
citations
Multi-scenario Overlapping Text Segmentation with Depth Awareness
ICCV 2025
0
citations