Xiaoshuai Sun

34
Papers
182
Total Citations

Papers (34)

Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation

CVPR 2024
89
citations

Towards General Visual-Linguistic Face Forgery Detection

CVPR 2025
34
citations

Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks

AAAI 2024arXiv
19
citations

AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models

ICCV 2025arXiv
13
citations

Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model

ECCV 2024arXiv
9
citations

StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization

AAAI 2025
6
citations

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings

NeurIPS 2025arXiv
5
citations

FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression

CVPR 2025
4
citations

IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation

AAAI 2025
3
citations

DIFNet: Boosting Visual Information Flow for Image Captioning

CVPR 2022
0
citations

Active Teacher for Semi-Supervised Object Detection

CVPR 2022
0
citations

RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension

CVPR 2023
0
citations

Clover: Towards a Unified Video-Language Alignment and Fusion Model

CVPR 2023arXiv
0
citations

RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension

CVPR 2023
0
citations

Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images

ICCV 2019
0
citations

TRAR: Routing the Attention Spans in Transformer for Visual Question Answering

ICCV 2021
0
citations

X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance

ICCV 2023
0
citations

An Information Theoretic Approach for Attention-Driven Face Forgery Detection

ECCV 2022
0
citations

PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation

ECCV 2022arXiv
0
citations

SeqTR: A Simple Yet Universal Network for Visual Grounding

ECCV 2022
0
citations

RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words

CVPR 2021
0
citations

ACL: Activating Capability of Linear Attention for Image Restoration

CVPR 2025
0
citations

X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks

AAAI 2024
0
citations

X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation

ICML 2024
0
citations

Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models

ICML 2024
0
citations

SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation

ICML 2024
0
citations

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization

ICML 2024
0
citations

GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints

CVPR 2018
0
citations

Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

CVPR 2020arXiv
0
citations

Information Competing Process for Learning Diversified Representations

NeurIPS 2019
0
citations

Variational Structured Semantic Inference for Diverse Image Captioning

NeurIPS 2019
0
citations

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach

NeurIPS 2022
0
citations

Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models

NeurIPS 2023
0
citations

Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models

NeurIPS 2023
0
citations