Silvio Savarese

50

Papers

637

Total Citations

Papers (50)

Learning Transferrable Representations for Unsupervised Domain Adaptation

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

HIVE: Harnessing Human Feedback for Instructional Visual Editing

Data-Driven 3D Voxel Patterns for Object Category Recognition

Enriching Object Detection With 2D-3D Registration and Continuous Viewpoint Estimation

Watch-n-Patch: Unsupervised Understanding of Actions and Relations

DeLay: Robust Spatial Layout Estimation for Cluttered Indoor Scenes

Social LSTM: Human Trajectory Prediction in Crowded Spaces

3D Semantic Parsing of Large-Scale Indoor Spaces

Deep Metric Learning via Lifted Structured Feature Embedding

Structural-RNN: Deep Learning on Spatio-Temporal Graphs

Feedback Networks

Deep View Morphing

Social Scene Understanding: End-To-End Multi-Person Action Localization and Collective Activity Recognition

Demo2Vec: Reasoning Object Affordances From Online Videos

Social GAN: Socially Acceptable Trajectories With Generative Adversarial Networks

Taskonomy: Disentangling Task Transfer Learning

Im2Pano3D: Extrapolating 360° Structure and Semantics Beyond the Field of View

Adversarial Feature Augmentation for Unsupervised Domain Adaptation

Deep Learning Under Privileged Information Using Heteroscedastic Dropout

Gibson Env: Real-World Perception for Embodied Agents

TopNet: Structural Point Cloud Decoder

Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks

Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression

SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints

4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks

DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion

Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration

Topological Planning With Transformers for Vision-and-Language Navigation

JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

Procedure-Aware Pretraining for Instructional Video Understanding

Unsupervised Semantic Parsing of Video Collections

Action Recognition by Hierarchical Mid-Level Action Elements

Learning to Track: Online Multi-Object Tracking by Decision Making

Text2Data: Low-Resource Data Generation with Textual Control

Lattice Long Short-Term Memory for Human Action Recognition

Situational Fusion of Visual Representation for Visual Navigation

3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera

TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild

Generative Sparse Detection Networks for 3D Single-shot Object Detection

Universal Correspondence Network

NeurIPS 2016arXiv

Tracking the Untrackable: Learning to Track Multiple Cues With Long-Term Dependencies

Unified Training of Universal Time Series Forecasting Transformers

A Coarse-to-Fine Model for 3D Pose Estimation and Sub-Category Recognition

Generalizing to Unseen Domains via Adversarial Data Augmentation

Regression Planning Networks

Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks

CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild