Zhenheng Yang

14

Papers

169

Total Citations

Papers (14)

Show-o2: Improved Native Unified Multimodal Models

Long Context Tuning for Video Generation

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling

Activity Driven Weakly Supervised Object Detection

UnOS: Unified Unsupervised Optical-Flow and Stereo-Depth Estimation by Watching Videos

Weakly Supervised Instance Segmentation for Videos With Temporal Mask Consistency

TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals

TALL: Temporal Activity Localization via Language Query

Parallelized Autoregressive Visual Generation

SPAN: Spatial Pyramid Attention Network for Image Manipulation Localization

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

LEGO: Learning Edge With Geometry All at Once by Watching Videos

Occlusion Aware Unsupervised Learning of Optical Flow