Wanrong Zhu

6

Papers

34

Total Citations

1

Affiliations

Affiliations

University of California, Santa Barbara

Papers (6)

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

VisIT-Bench: A Dynamic Benchmark for Evaluating Instruction-Following Vision-and-Language Models