Zhenfei Yin

6

Papers

179

Total Citations

Papers (6)

Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior

B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens