Sergey Tulyakov

66

Papers

982

Total Citations

Papers (66)

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers

Wonderland: Navigating 3D Scenes from a Single Image

Multi-subject Open-set Personalization in Video Generation

SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

Improving the Diffusability of Autoencoders

Scalable Ranked Preference Optimization for Text-to-Image Generation

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Video Motion Transfer with Diffusion Transformers

4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion

MaskControl: Spatio-Temporal Control for Masked Motion Synthesis

DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models

Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach

Efficient Training with Denoised Neural Weights

Flow Guided Transformable Bottleneck Networks for Motion Retargeting

Motion Representations for Articulated Animation

Playable Video Generation

Teachers Do More Than Teach: Compressing Image-to-Image Models

Playable Environments: Video Manipulation in Space and Time

InOut: Diverse Image Outpainting via GAN Inversion

Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

StyleGAN-V: A Continuous Video Generator With the Price, Image Quality and Perks of StyleGAN2

DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scene Synthesis

Make-a-Story: Visual Memory Conditioned Consistent Story Generation

SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation

Invertible Neural Skinning

Affection: Learning Affective Explanations for Real-World Visual Data

Real-Time Neural Light Field on Mobile Devices

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars

Unsupervised Volumetric Animation

ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations

Can Text-to-Video Generation help Video-Language Alignment?

Transformable Bottleneck Networks

Laplace Landmark Localization

Rethinking Vision Transformers for MobileNet Size and Speed

Text2Tex: Text-driven Texture Synthesis via Diffusion Models

InfiniCity: Infinite-Scale City Synthesis

Neural Hair Rendering

Cross-Modal 3D Shape Generation and Manipulation

R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis

Quantized GAN for Complex Music Generation from Dance Videos

Regressing a 3D Face Shape From a Single Image

Mind the Time: Temporally-Controlled Multi-Event Video Generation

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

Omni-ID: Holistic Identity Representation Designed for Generative Tasks

T2Bs: Text-to-Character Blendshapes via Video Generation

AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation

SPAD: Spatially Aware Multi-View Diffusers

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

TextCraftor: Your Text Encoder Can be Image Quality Controller

Towards Text-guided 3D Scene Composition

Hierarchical Patch Diffusion Models for High-Resolution Video Generation

E$^2$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation

Self-Adaptive Matrix Completion for Heart Rate Estimation From Face Videos Under Realistic Conditions

MoCoGAN: Decomposing Motion and Content for Video Generation

Animating Arbitrary Objects via Deep Motion Transfer

3D Guided Fine-Grained Face Manipulation

First Order Motion Model for Image Animation

EfficientFormer: Vision Transformers at MobileNet Speed

Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training

EpiGRAF: Rethinking training of 3D GANs

SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds

LightSpeed: Light and Fast Neural Light Fields on Mobile Devices

Autodecoding Latent 3D Diffusion Models