Xi Li

15

Papers

92

Total Citations

Papers (15)

Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer

Exploring Unbiased Deepfake Detection via Token-Level Shuffling and Mixing

ScanFormer: Referring Expression Comprehension by Iteratively Scanning

Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning

Boosting Vision Semantic Density with Anatomy Normality Modeling for Medical Vision-language Pre-training

Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models

Libra-Merging: Importance-redundancy and Pruning-merging Trade-off for Acceleration Plug-in in Large Vision-Language Model

BEVSpread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision-based Roadside 3D Object Detection

PiD: Generalized AI-Generated Images Detection with Pixelwise Decomposition Residuals

EDM: Efficient Deep Feature Matching

RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control

Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities

Temporal-Distributed Backdoor Attack against Video Based Action Recognition

Virtual Immunohistochemistry Staining for Histological Images Assisted by Weakly-supervised Learning