Shiyu Huang

7

Papers

1,535

Total Citations

Papers (7)

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

LVBench: An Extreme Long Video Understanding Benchmark

DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models

Expecting the Unexpected: Training Detectors for Unusual Pedestrians With Adversarial Imposters

SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks