Bin Wen

3

Papers

21

Total Citations

Papers (3)

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types

CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness