Shoubin Yu

5

Papers

24

Total Citations

Papers (5)

CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation