Xiaoshuai Sun
34
Papers
182
Total Citations
Papers (34)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
CVPR 2024
89
citations
Towards General Visual-Linguistic Face Forgery Detection
CVPR 2025
34
citations
Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks
AAAI 2024arXiv
19
citations
AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
ICCV 2025arXiv
13
citations
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
ECCV 2024arXiv
9
citations
StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization
AAAI 2025
6
citations
Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
NeurIPS 2025arXiv
5
citations
FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression
CVPR 2025
4
citations
IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation
AAAI 2025
3
citations
DIFNet: Boosting Visual Information Flow for Image Captioning
CVPR 2022
0
citations
Active Teacher for Semi-Supervised Object Detection
CVPR 2022
0
citations
RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension
CVPR 2023
0
citations
Clover: Towards a Unified Video-Language Alignment and Fusion Model
CVPR 2023arXiv
0
citations
RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension
CVPR 2023
0
citations
Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images
ICCV 2019
0
citations
TRAR: Routing the Attention Spans in Transformer for Visual Question Answering
ICCV 2021
0
citations
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance
ICCV 2023
0
citations
An Information Theoretic Approach for Attention-Driven Face Forgery Detection
ECCV 2022
0
citations
PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation
ECCV 2022arXiv
0
citations
SeqTR: A Simple Yet Universal Network for Visual Grounding
ECCV 2022
0
citations
RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words
CVPR 2021
0
citations
ACL: Activating Capability of Linear Attention for Image Restoration
CVPR 2025
0
citations
X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks
AAAI 2024
0
citations
X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation
ICML 2024
0
citations
Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models
ICML 2024
0
citations
SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation
ICML 2024
0
citations
Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization
ICML 2024
0
citations
GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints
CVPR 2018
0
citations
Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
CVPR 2020arXiv
0
citations
Information Competing Process for Learning Diversified Representations
NeurIPS 2019
0
citations
Variational Structured Semantic Inference for Diverse Image Captioning
NeurIPS 2019
0
citations
Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach
NeurIPS 2022
0
citations
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
NeurIPS 2023
0
citations
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
NeurIPS 2023
0
citations