Jan Kautz

116

Papers

5,222

Total Citations

Papers (116)

Unsupervised Image-to-Image Translation Networks

NeurIPS 2017arXiv

VILA: On Pre-training for Visual Language Models

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Learning Affinity via Spatial Propagation Networks

NeurIPS 2017arXiv

A Variational Perspective on Solving Inverse Problems with Diffusion Models

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

Gated Delta Networks: Improving Mamba2 with Delta Rule

FoundationStereo: Zero-Shot Stereo Matching

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

One-Minute Video Generation with Test-Time Training

Hymba: A Hybrid-head Architecture for Small Language Models

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing

HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis

Parallel Sequence Modeling via Generalized Spatial Propagation Network

AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion

Learning Superpixels With Segmentation-Aware Affinity Loss

MoCoGAN: Decomposing Motion and Content for Video Generation

Improving Landmark Localization With Semi-Supervised Learning

SPLATNet: Sparse Lattice Networks for Point Cloud Processing

Geometry-Aware Learning of Maps for Camera Localization

Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals

Making Convolutional Networks Recurrent for Visual Sequence Learning

Deep Semantic Face Deblurring

High-Resolution Image Synthesis and Semantic Manipulation With Conditional GANs

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume

Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation

STEP: Spatio-Temporal Progressive Learning for Video Action Detection

SCOPS: Self-Supervised Co-Part Segmentation

Joint Discriminative and Generative Learning for Person Re-Identification

Learning Linear Transformations for Fast Image and Video Style Transfer

PlaneRCNN: 3D Plane Detection and Reconstruction From a Single Image

Neural RGB(r)D Sensing: Depth and Uncertainty From a Video Camera

Pixel-Adaptive Convolutional Neural Networks

Importance Estimation for Neural Network Pruning

Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments

Bi3D: Stereo Depth Estimation via Binary Classifications

Meshlet Priors for 3D Mesh Reconstruction

Self-Supervised Viewpoint Learning From Image Collections

Two-Shot Spatially-Varying BRDF and Shape Estimation

Novel View Synthesis of Dynamic Scenes With Globally Coherent Depths From a Monocular Camera

Weakly-Supervised 3D Human Pose Learning via Multi-View Images in the Wild

Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion

UNAS: Differentiable Architecture Search Meets Reinforcement Learning

Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection

Binary TTC: A Temporal Geofence for Autonomous Navigation

Learning to Track Instances without Video Annotations

Self-Supervised Learning on 3D Point Clouds by Learning Discrete Generative Models

Weakly-Supervised Physically Unconstrained Gaze Estimation

See Through Gradients: Image Batch Recovery via GradInversion

DexYCB: A Benchmark for Capturing Hand Grasping of Objects

FreeSOLO: Learning To Segment Objects Without Annotations

CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs

GradViT: Gradient Inversion of Vision Transformers

GLAMR: Global Occlusion-Aware Human Mesh Recovery With Dynamic Cameras

GroupViT: Semantic Segmentation Emerges From Text Supervision

A-ViT: Adaptive Tokens for Efficient Vision Transformer

Zero-Shot Pose Transfer for Unrigged Stylized 3D Characters

Global Vision Transformer Pruning With Hessian-Aware Saliency

Recurrence Without Recurrence: Stable Video Landmark Detection With Deep Equilibrium Models

BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects

Heterogeneous Continual Learning

The Best Defense Is a Good Offense: Adversarial Augmentation Against Adversarial Attacks

Robust Model-Based 3D Head Pose Estimation

A Lightweight Approach for On-The-Fly Reflectance Estimation

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization With Spatially-Varying Lighting

Learning Propagation for Arbitrarily-Structured Data

Unsupervised Video Interpolation Using Cycle Consistency

SENSE: A Shared Encoder Network for Scene-Flow Estimation

Extreme View Synthesis

Neural Inverse Rendering of an Indoor Scene From a Single Image

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Few-Shot Unsupervised Image-to-Image Translation

Learning Indoor Inverse Rendering With 3D Spatially-Varying Lighting

Self-Supervised Object Detection via Generative Image Synthesis

RANA: Relightable Articulated Neural Avatars

PhysDiff: Physics-Guided Human Motion Diffusion Model

Joint Disentangling and Adaptation for Cross-Domain Person Re-Identification

Contrastive Learning for Weakly Supervised Phrase Grounding

DeepGMR: Learning Latent Gaussian Mixture Models for Registration

Self-supervised Single-view 3D Reconstruction via Semantic Consistency

Weakly Supervised 3D Hand Pose Estimation via Biomechanical Constraints

UFO²: A Unified Framework towards Omni-supervised Object Detection

Neural Light Field Estimation for Street Scenes with Differentiable Virtual Object Insertion

LANA: Latency Aware Network Acceleration

Few-Shot Adaptive Gaze Estimation

OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning

Scaling Vision Pre-Training to 4K Resolution

NVILA: Efficient Frontier Visual Language Models

RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models

SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing

Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation

GENMO: A GENeralist Model for Human MOtion

GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion

COLMAP-Free 3D Gaussian Splatting

AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One

Flextron: Many-in-One Flexible Large Language Model

Modeling Object Appearance Using Context-Conditioned Component Analysis

Online Detection and Classification of Dynamic Hand Gestures With Recurrent 3D Convolutional Neural Network

Accelerated Generative Models for 3D Point Cloud Data

Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Network

Polarimetric Multi-View Stereo

Context-aware Synthesis and Placement of Object Instances

Video-to-Video Synthesis

Joint-task Self-supervised Learning for Temporal Correspondence

Few-shot Video-to-Video Synthesis

Dancing to Music

Convolutional Tensor-Train LSTM for Spatio-Temporal Learning

Online Adaptation for Consistent Mesh Reconstruction in the Wild

NVAE: A Deep Hierarchical Variational Autoencoder

A Contrastive Learning Approach for Training Variational Autoencoder Priors

Coupled Segmentation and Edge Learning via Dynamic Graph Propagation

Score-based Generative Modeling in Latent Space

Generalizable One-shot 3D Neural Head Avatar

Convolutional State Space Models for Long-Range Spatiotemporal Modeling