Papers (10)
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024
2,210
citations
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
CVPR 2024
864
citations
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
CVPR 2024
84
citations
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
ICLR 2025
39
citations
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
ICLR 2025
11
citations
EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs
NeurIPS 2025
10
citations
Retrieval-Augmented Egocentric Video Captioning
CVPR 2024
0
citations
AVSegFormer: Audio-Visual Segmentation with Transformer
AAAI 2024arXiv
0
citations
NeuralIndicator: Implicit Surface Reconstruction from Neural Indicator Priors
ICML 2024
0
citations
Memory-and-Anticipation Transformer for Online Action Understanding
ICCV 2023arXiv
0
citations