Saining Xie
15
Papers
1,615
Total Citations
Papers (15)
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
CVPR 2024
570
citations
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
CVPR 2025
342
citations
V?: Guided Visual Search as a Core Mechanism in Multimodal LLMs
CVPR 2024
327
citations
Demystifying CLIP Data
ICLR 2024
205
citations
REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
ICCV 2025arXiv
73
citations
Scaling Language-Free Visual Representation Learning
ICCV 2025arXiv
39
citations
MoDE: CLIP Data Experts via Clustering
CVPR 2024
25
citations
DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing
ICLR 2025
14
citations
Scaling Inference Time Compute for Diffusion Models
CVPR 2025
13
citations
Fast Encoding and Decoding for Implicit Video Representation
ECCV 2024
7
citations
Dynamic Test-Time Compute Scaling in Control Policy: Difficulty-Aware Stochastic Interpolant Policy
NeurIPS 2025
0
citations
Science-T2I: Addressing Scientific Illusions in Image Synthesis
CVPR 2025
0
citations
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
ICCV 2025
0
citations
Image Sculpting: Precise Object Editing with 3D Geometry Control
CVPR 2024
0
citations
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
CVPR 2025
0
citations