Jan Kautz

27
Papers
2,030
Total Citations

Papers (27)

VILA: On Pre-training for Visual Language Models

CVPR 2024
685
citations

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

CVPR 2024
412
citations

A Variational Perspective on Solving Inverse Problems with Diffusion Models

ICLR 2024
207
citations

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

CVPR 2024
169
citations

Gated Delta Networks: Improving Mamba2 with Delta Rule

ICLR 2025
141
citations

FoundationStereo: Zero-Shot Stereo Matching

CVPR 2025
98
citations

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

NeurIPS 2025
96
citations

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

CVPR 2024
69
citations

One-Minute Video Generation with Test-Time Training

CVPR 2025
65
citations

Hymba: A Hybrid-head Architecture for Small Language Models

ICLR 2025arXiv
55
citations

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

CVPR 2025arXiv
19
citations

LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing

ICLR 2025
4
citations

HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis

ICCV 2025
4
citations

AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion

ICCV 2025
3
citations

Parallel Sequence Modeling via Generalized Spatial Propagation Network

CVPR 2025arXiv
3
citations

Flextron: Many-in-One Flexible Large Language Model

ICML 2024
0
citations

OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning

CVPR 2025
0
citations

Scaling Vision Pre-Training to 4K Resolution

CVPR 2025
0
citations

NVILA: Efficient Frontier Visual Language Models

CVPR 2025
0
citations

RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models

CVPR 2025
0
citations

SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing

CVPR 2025
0
citations

Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation

CVPR 2025
0
citations

GENMO: A GENeralist Model for Human MOtion

ICCV 2025
0
citations

GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion

ICCV 2025
0
citations

COLMAP-Free 3D Gaussian Splatting

CVPR 2024
0
citations

AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One

CVPR 2024
0
citations

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

CVPR 2025
0
citations