Bo He

8

Papers

29

Total Citations

Papers (8)

OmniViD: A Generative Framework for Universal Video Understanding

ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization

Towards Scalable Neural Representation for Diverse Videos

Align and Attend: Multimodal Summarization With Dual Contrastive Losses

Chop & Learn: Recognizing and Generating Object-State Compositions

Learning Semantic Correspondence with Sparse Annotations

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

NeRV: Neural Representations for Videos