Lorenzo Torresani
41
Papers
150
Total Citations
Papers (41)
Video ReCap: Recursive Captioning of Hour-Long Videos
CVPR 2024
82
citations
Learning to Inpaint for Image Compression
NeurIPS 2017arXiv
58
citations
Step Differences in Instructional Video
CVPR 2024
10
citations
Learning to Segment Referred Objects from Narrated Egocentric Videos
CVPR 2024
0
citations
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
CVPR 2024
0
citations
DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection
CVPR 2015
0
citations
Semantic Segmentation With Boundary Neural Fields
CVPR 2016
0
citations
Convolutional Random Walk Networks for Semantic Image Segmentation
CVPR 2017arXiv
0
citations
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
CVPR 2025
0
citations
A Closer Look at Spatiotemporal Convolutions for Action Recognition
CVPR 2018arXiv
0
citations
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets
CVPR 2018
0
citations
Video Modeling With Correlation Networks
CVPR 2020arXiv
0
citations
Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation
CVPR 2020arXiv
0
citations
Listen to Look: Action Recognition by Previewing Audio
CVPR 2020arXiv
0
citations
Beyond Short Clips: End-to-End Video-Level Learning With Collaborative Memories
CVPR 2021arXiv
0
citations
Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
CVPR 2021arXiv
0
citations
Long-Short Temporal Contrastive Learning of Video Transformers
CVPR 2022arXiv
0
citations
Learning To Recognize Procedural Activities With Distant Supervision
CVPR 2022arXiv
0
citations
Deformable Video Transformer
CVPR 2022arXiv
0
citations
Ego4D: Around the World in 3,000 Hours of Egocentric Video
CVPR 2022
0
citations
HierVL: Learning Hierarchical Video-Language Embeddings
CVPR 2023arXiv
0
citations
Relational Space-Time Query in Long-Form Videos
CVPR 2023
0
citations
Egocentric Video Task Translation
CVPR 2023
0
citations
High-for-Low and Low-for-High: Efficient Boundary Detection From Deep Object Features and its Applications to High-Level Vision
ICCV 2015
0
citations
Learning Spatiotemporal Features With 3D Convolutional Networks
ICCV 2015
0
citations
DistInit: Learning Video Representations Without a Single Labeled Video
ICCV 2019
0
citations
Video Classification With Channel-Separated Convolutional Networks
ICCV 2019
0
citations
SCSampler: Sampling Salient Clips From Video for Efficient Action Recognition
ICCV 2019
0
citations
HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
ICCV 2019
0
citations
Learning to Ground Instructional Articles in Videos through Narrations
ICCV 2023arXiv
0
citations
Ego-Only: Egocentric Action Detection without Exocentric Transferring
ICCV 2023
0
citations
Detect-and-Track: Efficient Pose Estimation in Videos
CVPR 2018arXiv
0
citations
VITED: Video Temporal Evidence Distillation
CVPR 2025
0
citations
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
ICCV 2025
0
citations
Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization
NeurIPS 2018
0
citations
Learning Temporal Pose Estimation from Sparsely-Labeled Videos
NeurIPS 2019
0
citations
STAR-Caps: Capsule Networks with Straight-Through Attentive Routing
NeurIPS 2019
0
citations
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
NeurIPS 2020
0
citations
COBE: Contextualized Object Embeddings from Narrated Instructional Video
NeurIPS 2020
0
citations
Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities
NeurIPS 2023
0
citations
HT-Step: Aligning Instructional Articles with How-To Videos
NeurIPS 2023
0
citations