Xinyu Wei
5
Papers
101
Total Citations
Papers (5)
Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
NeurIPS 2025arXiv
29
citations
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
ICLR 2025arXiv
26
citations
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
NeurIPS 2025arXiv
25
citations
Cloud-Device Collaborative Learning for Multimodal Large Language Models
CVPR 2024
18
citations
Event2Tracking: Reconstructing Multi-Agent Soccer Trajectories Using Long-Term Multimodal Context
AAAI 2025
3
citations