Yongshuo Zong
4
Papers
3
Total Citations
Papers (4)
Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels
CVPR 2025
3
citations
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
CVPR 2024
0
citations
Fool Your (Vision and) Language Model with Embarrassingly Simple Permutations
ICML 2024
0
citations
Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
ICML 2024
0
citations