Xiangtai Li
25
Papers
618
Total Citations
Papers (25)
OMG-Seg: Is One Model Good Enough For All Segmentation?
CVPR 2024
106
citations
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
ICLR 2024
104
citations
Point Cloud Mamba: Point Cloud Learning via State Space Model
AAAI 2025
81
citations
Towards Semantic Equivalence of Tokenization in Multimodal LLM
ICLR 2025
57
citations
RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation
CVPR 2024
53
citations
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
ICLR 2025
43
citations
PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning
AAAI 2025
30
citations
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
CVPR 2024
30
citations
Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning
CVPR 2024
26
citations
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
ICCV 2025
20
citations
Improving Video Segmentation via Dynamic Anchor Queries
ECCV 2024
19
citations
Explore In-Context Segmentation via Latent Diffusion Models
AAAI 2025
14
citations
DreamRelation: Bridging Customization and Relation Generation
CVPR 2025arXiv
10
citations
Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs
ICCV 2025
10
citations
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers For Motion Transfer
ICCV 2025
6
citations
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene
CVPR 2025
4
citations
Conditional Panoramic Image Generation via Masked Autoregressive Modeling
NeurIPS 2025
4
citations
PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model
AAAI 2025
1
citations
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
CVPR 2025
0
citations
SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model
CVPR 2025
0
citations
QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing
ICCV 2025
0
citations
Referring Image Editing: Object-level Image Editing via Referring Expressions
CVPR 2024
0
citations
BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
CVPR 2024
0
citations
Unified Dense Prediction of Video Diffusion
CVPR 2025
0
citations
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
CVPR 2025
0
citations