Lorenzo Torresani

41
Papers
150
Total Citations

Papers (41)

Video ReCap: Recursive Captioning of Hour-Long Videos

CVPR 2024
82
citations

Learning to Inpaint for Image Compression

NeurIPS 2017arXiv
58
citations

Step Differences in Instructional Video

CVPR 2024
10
citations

Learning to Segment Referred Objects from Narrated Egocentric Videos

CVPR 2024
0
citations

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

CVPR 2024
0
citations

DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection

CVPR 2015
0
citations

Semantic Segmentation With Boundary Neural Fields

CVPR 2016
0
citations

Convolutional Random Walk Networks for Semantic Image Segmentation

CVPR 2017arXiv
0
citations

BIMBA: Selective-Scan Compression for Long-Range Video Question Answering

CVPR 2025
0
citations

A Closer Look at Spatiotemporal Convolutions for Action Recognition

CVPR 2018arXiv
0
citations

What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets

CVPR 2018
0
citations

Video Modeling With Correlation Networks

CVPR 2020arXiv
0
citations

Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation

CVPR 2020arXiv
0
citations

Listen to Look: Action Recognition by Previewing Audio

CVPR 2020arXiv
0
citations

Beyond Short Clips: End-to-End Video-Level Learning With Collaborative Memories

CVPR 2021arXiv
0
citations

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs

CVPR 2021arXiv
0
citations

Long-Short Temporal Contrastive Learning of Video Transformers

CVPR 2022arXiv
0
citations

Learning To Recognize Procedural Activities With Distant Supervision

CVPR 2022arXiv
0
citations

Deformable Video Transformer

CVPR 2022arXiv
0
citations

Ego4D: Around the World in 3,000 Hours of Egocentric Video

CVPR 2022
0
citations

HierVL: Learning Hierarchical Video-Language Embeddings

CVPR 2023arXiv
0
citations

Relational Space-Time Query in Long-Form Videos

CVPR 2023
0
citations

Egocentric Video Task Translation

CVPR 2023
0
citations

High-for-Low and Low-for-High: Efficient Boundary Detection From Deep Object Features and its Applications to High-Level Vision

ICCV 2015
0
citations

Learning Spatiotemporal Features With 3D Convolutional Networks

ICCV 2015
0
citations

DistInit: Learning Video Representations Without a Single Labeled Video

ICCV 2019
0
citations

Video Classification With Channel-Separated Convolutional Networks

ICCV 2019
0
citations

SCSampler: Sampling Salient Clips From Video for Efficient Action Recognition

ICCV 2019
0
citations

HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization

ICCV 2019
0
citations

Learning to Ground Instructional Articles in Videos through Narrations

ICCV 2023arXiv
0
citations

Ego-Only: Egocentric Action Detection without Exocentric Transferring

ICCV 2023
0
citations

Detect-and-Track: Efficient Pose Estimation in Videos

CVPR 2018arXiv
0
citations

VITED: Video Temporal Evidence Distillation

CVPR 2025
0
citations

Enrich and Detect: Video Temporal Grounding with Multimodal LLMs

ICCV 2025
0
citations

Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization

NeurIPS 2018
0
citations

Learning Temporal Pose Estimation from Sparsely-Labeled Videos

NeurIPS 2019
0
citations

STAR-Caps: Capsule Networks with Straight-Through Attentive Routing

NeurIPS 2019
0
citations

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

NeurIPS 2020
0
citations

COBE: Contextualized Object Embeddings from Narrated Instructional Video

NeurIPS 2020
0
citations

Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities

NeurIPS 2023
0
citations

HT-Step: Aligning Instructional Articles with How-To Videos

NeurIPS 2023
0
citations