Yiming Zhang

9

Papers

102

Total Citations

Papers (9)

STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?

Persistent Pre-training Poisoning of LLMs

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior

RANKCLIP: Ranking-Consistent Language-Image Pretraining

InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for Model Fusion

Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds

Table as a Modality for Large Language Models

Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models