Zhaokai Wang

4

Papers

102

Total Citations

Papers (4)

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

Video Background Music Generation: Dataset, Method and Evaluation