Sergey Tulyakov

66
Papers
982
Total Citations

Papers (66)

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

CVPR 2024
341
citations

4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling

CVPR 2024
168
citations

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

ICLR 2025arXiv
114
citations

AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers

CVPR 2025
78
citations

Wonderland: Navigating 3D Scenes from a Single Image

CVPR 2025
54
citations

Multi-subject Open-set Personalization in Video Generation

CVPR 2025arXiv
40
citations

SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

CVPR 2024
40
citations

Improving the Diffusability of Autoencoders

ICML 2025
34
citations

Scalable Ranked Preference Optimization for Text-to-Image Generation

ICCV 2025
21
citations

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

CVPR 2025
20
citations

Video Motion Transfer with Diffusion Transformers

CVPR 2025
18
citations

4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion

CVPR 2025
18
citations

MaskControl: Spatio-Temporal Control for Masked Motion Synthesis

ICCV 2025
12
citations

DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models

NeurIPS 2025
11
citations

Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach

NeurIPS 2025
8
citations

Efficient Training with Denoised Neural Weights

ECCV 2024
5
citations

Flow Guided Transformable Bottleneck Networks for Motion Retargeting

CVPR 2021arXiv
0
citations

Motion Representations for Articulated Animation

CVPR 2021arXiv
0
citations

Playable Video Generation

CVPR 2021arXiv
0
citations

Teachers Do More Than Teach: Compressing Image-to-Image Models

CVPR 2021arXiv
0
citations

Playable Environments: Video Manipulation in Space and Time

CVPR 2022
0
citations

InOut: Diverse Image Outpainting via GAN Inversion

CVPR 2022
0
citations

Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

CVPR 2022arXiv
0
citations

StyleGAN-V: A Continuous Video Generator With the Price, Image Quality and Perks of StyleGAN2

CVPR 2022
0
citations

DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scene Synthesis

CVPR 2023arXiv
0
citations

Make-a-Story: Visual Memory Conditioned Consistent Story Generation

CVPR 2023
0
citations

SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation

CVPR 2023arXiv
0
citations

Invertible Neural Skinning

CVPR 2023arXiv
0
citations

Affection: Learning Affective Explanations for Real-World Visual Data

CVPR 2023arXiv
0
citations

Real-Time Neural Light Field on Mobile Devices

CVPR 2023arXiv
0
citations

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars

CVPR 2023arXiv
0
citations

Unsupervised Volumetric Animation

CVPR 2023arXiv
0
citations

ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations

CVPR 2023
0
citations

Can Text-to-Video Generation help Video-Language Alignment?

CVPR 2025
0
citations

Transformable Bottleneck Networks

ICCV 2019
0
citations

Laplace Landmark Localization

ICCV 2019
0
citations

Rethinking Vision Transformers for MobileNet Size and Speed

ICCV 2023arXiv
0
citations

Text2Tex: Text-driven Texture Synthesis via Diffusion Models

ICCV 2023arXiv
0
citations

InfiniCity: Infinite-Scale City Synthesis

ICCV 2023arXiv
0
citations

Neural Hair Rendering

ECCV 2020
0
citations

Cross-Modal 3D Shape Generation and Manipulation

ECCV 2022
0
citations

R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis

ECCV 2022
0
citations

Quantized GAN for Complex Music Generation from Dance Videos

ECCV 2022
0
citations

Regressing a 3D Face Shape From a Single Image

ICCV 2015
0
citations

Mind the Time: Temporally-Controlled Multi-Event Video Generation

CVPR 2025
0
citations

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

CVPR 2025
0
citations

Omni-ID: Holistic Identity Representation Designed for Generative Tasks

CVPR 2025
0
citations

T2Bs: Text-to-Character Blendshapes via Video Generation

ICCV 2025
0
citations

AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation

ICCV 2025
0
citations

SPAD: Spatially Aware Multi-View Diffusers

CVPR 2024
0
citations

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

CVPR 2024
0
citations

TextCraftor: Your Text Encoder Can be Image Quality Controller

CVPR 2024
0
citations

Towards Text-guided 3D Scene Composition

CVPR 2024
0
citations

Hierarchical Patch Diffusion Models for High-Resolution Video Generation

CVPR 2024
0
citations

E$^2$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation

ICML 2024
0
citations

Self-Adaptive Matrix Completion for Heart Rate Estimation From Face Videos Under Realistic Conditions

CVPR 2016
0
citations

MoCoGAN: Decomposing Motion and Content for Video Generation

CVPR 2018arXiv
0
citations

Animating Arbitrary Objects via Deep Motion Transfer

CVPR 2019
0
citations

3D Guided Fine-Grained Face Manipulation

CVPR 2019
0
citations

First Order Motion Model for Image Animation

NeurIPS 2019
0
citations

EfficientFormer: Vision Transformers at MobileNet Speed

NeurIPS 2022
0
citations

Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training

NeurIPS 2022
0
citations

EpiGRAF: Rethinking training of 3D GANs

NeurIPS 2022
0
citations

SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds

NeurIPS 2023
0
citations

LightSpeed: Light and Fast Neural Light Fields on Mobile Devices

NeurIPS 2023
0
citations

Autodecoding Latent 3D Diffusion Models

NeurIPS 2023
0
citations