Yuxiao Dong

14

Papers

1,689

Total Citations

Papers (14)

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

LVBench: An Extreme Long Video Understanding Benchmark

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Scaling Speech-Text Pre-training with Synthetic Interleaved Data

CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

TriSampler: A Better Negative Sampling Principle for Dense Retrieval

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

VPO: Aligning Text-to-Video Generation Models with Prompt Optimization

CogAgent: A Visual Language Model for GUI Agents

Graph Random Neural Networks for Semi-Supervised Learning on Graphs

Open Graph Benchmark: Datasets for Machine Learning on Graphs

Adaptive Diffusion in Graph Neural Networks

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation