Yifan Xu

5

Papers

261

Total Citations

Papers (5)

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning

MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation

Libra: Building Decoupled Vision System on Large Language Models