Bin Wen
3
Papers
21
Total Citations
Papers (3)
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
CVPR 2025
12
citations
TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
ICLR 2025arXiv
6
citations
CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
NeurIPS 2025
3
citations