Stefano Soatto

79

Papers

66

Total Citations

Papers (79)

CPR: Retrieval Augmented Generation for Copyright Protection

Diffusion Soup: Model Merging for Text-to-Image Diffusion Models

Enhancing Vision-Language Pre-training with Rich Supervisions

Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding

Diffeomorphic Template Registration for Atmospheric Turbulence Mitigation

Multi-Modal Hallucination Control by Visual Information Grounding

THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models

Fewer Truncations Improve Language Modeling

Sub-token ViT Embedding via Stochastic Resonance Transformers

Efficient Minimal-Surface Regularization of Perspective Depth Maps in Variational Stereo

Texture Representations for Image and Video Synthesis

Multi-View Feature Engineering and Learning

Causal Video Object Segmentation From Persistence of Occlusions

Domain-Size Pooling in Local Descriptors: DSP-SIFT

Scaling up Image Segmentation across Data and Tasks

Visual-Inertial-Semantic Scene Representation for 3D Object Detection

S2F: Slow-To-Fast Interpolator Flow

Zero Shot Learning via Multi-Scale Manifold Regularization

OATM: Occlusion Aware Template Matching by Consensus Set Maximization

Empirical Study of the Topology and Geometry of Deep Networks

Unsupervised Moving Object Detection via Contextual Information Separation

Dense Depth Posterior (DDP) From Single Image and Sparse Range

Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction

GeoNet: Deep Geodesic Networks for Point Cloud Analysis

Meta-Learning With Differentiable Convex Optimization

FDA: Fourier Domain Adaptation for Semantic Segmentation

Towards Backward-Compatible Representation Learning

Learning to Manipulate Individual Objects in an Image

Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks

Phase Consistent Ecological Domain Adaptation

Exponential Moving Average Normalization for Self-Supervised and Semi-Supervised Learning

Mixed-Privacy Forgetting in Deep Networks

Compatibility-Aware Heterogeneous Visual Search

LQF: Linear Quadratic Fine-Tuning

Positive-Congruent Training: Towards Regression-Free Model Updates

DyStaB: Unsupervised Object Segmentation via Dynamic-Static Bootstrapping

Learning Semantic-Aware Dynamics for Video Prediction

Mixed Differential Privacy in Computer Vision

Class-Incremental Learning With Strong Pre-Trained Models

Task Adaptive Parameter Sharing for Multi-Task Learning

MeMOT: Multi-Object Tracking With Memory

Omni-DETR: Omni-Supervised Object Detection With Transformers

Stereoscopic Universal Perturbations Across Different Architectures and Datasets

Train/Test-Time Adaptation With Retrieval

Critical Learning Periods for Multisensory Integration in Deep Networks

Depth Estimation From Camera Image and mmWave Radar Point Cloud

A-La-Carte Prompt Tuning (APT): Combining Distinct Data via Composable Prompting

A Meta-Learning Approach to Predicting Performance and Data Requirements

Guided Recommendation for Model Fine-Tuning

Self-Occlusions and Disocclusions in Causal Video Object Segmentation

Few-Shot Learning With Embedded Class Models and Shot-Free Meta Training

Unsupervised Domain Adaptation via Regularized Conditional Alignment

Task2Vec: Task Embedding for Meta-Learning

Learning Hierarchical Graph Neural Networks for Image Clustering

ARCH++: Animation-Ready Clothed Human Reconstruction Revisited

Unsupervised Depth Completion With Calibrated Backprojection Layers

Visual Relationship Detection Using Part-and-Sum Transformers With Composite Queries

SAFE: Machine Unlearning With Shard Graphs

Linear Spaces of Meanings: Compositional Structures in Vision-Language Models

Tangent Model Composition for Ensembling and Continual Fine-tuning

Incremental Few-Shot Meta-Learning via Indirect Discriminant Alignment

Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations

Not Just Streaks: Towards Ground Truth for Single Image Deraining

X-DETR: A Versatile Architecture for Instance-Wise Vision-Language Tasks

An Empirical Evaluation of Current Convolutional Architectures' Ability to Manage Nuisance Location and Scale Variability

WorDepth: Variational Language Prior for Monocular Depth Estimation

Non-autoregressive Sequence-to-Sequence Vision-Language Models

On the Scalability of Diffusion-based Text-to-Image Generation

Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence

Predicting Training Time Without Training

Targeted Adversarial Perturbations for Monocular Depth Prediction

Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction

Long Short-Term Transformer for Online Action Detection

Uniform Sampling over Episode Difficulty

On Leave-One-Out Conditional Mutual Information For Generalization

Semi-supervised Vision Transformers at Scale

Leveraging sparse and shared feature activations for disentangled representation learning

Your representations are in the network: composable and parallel adaptation for large scale models

Gacs-Korner Common Information Variational Autoencoder