Yujun Shen

78

Papers

795

Total Citations

Papers (78)

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

Language-Image Pre-training with Long Captions

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

SAM-guided Graph Cut for 3D Instance Segmentation

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

MagicQuill: An Intelligent Interactive Image Editing System

Lipschitz Singularities in Diffusion Models

Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

EnvGS: Modeling View-Dependent Appearance with Environment Gaussian

Mimir: Improving Video Diffusion Models for Precise Text Understanding

Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis

Rectified Diffusion Guidance for Conditional Generation

NEAT: Distilling 3D Wireframes from Neural Attraction Fields

Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner

PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes

Contextual AD Narration with Interleaved Multimodal Sequence

BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation

Learning Visual Generative Priors without Text

Neural Shell Texture Splatting: More Details and Fewer Primitives

ScaleLSD: Scalable Deep Line Segment Detection Streamlined

BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scene Synthesis

Neural Dependencies Emerging From Learning Massive Categories

GLeaD: Improving GANs With a Generator-Leading Task

Balancing Logit Variation for Long-Tailed Semantic Segmentation

Learning 3D-Aware Image Synthesis With Unknown Pose Distribution

Dimensionality-Varying Diffusion Process

LipFormer: High-Fidelity and Generalizable Talking Face Generation With a Pre-Learned Facial Codebook

LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis

ViM: Vision Middleware for Unified Downstream Transferring

One-Shot Generative Domain Adaptation

Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-Trained Vision-Language Models

Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos

In-Domain GAN Inversion for Real Image Editing

High-Fidelity GAN Inversion with Padding Space

3D-Aware Indoor Scene Synthesis with Depth Priors

ReTracker: Exploring Image Matching for Robust Online Any Point Tracking

AvatarArtist: Open-Domain 4D Avatarization

MangaNinja: Line Art Colorization with Precise Reference Following

AniDoc: Animation Creation Made Easier

DiffDoctor: Diagnosing Image Diffusion Models Before Treating

Learning Temporally Consistent Video Depth from Video Diffusion Priors

SpatialTrackerV2: Advancing 3D Point Tracking with Explicit Camera Motion

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Edicho: Consistent Image Editing in the Wild

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

AnyDoor: Zero-shot Object-level Image Customization

SpatialTracker: Tracking Any 2D Pixels in 3D Space

4K4D: Real-Time 4D View Synthesis at 4K Resolution

CCM: Real-Time Controllable Visual Content Creation Using Text-to-Image Consistency Models

SMaRt: Improving GANs with Score Matching Regularity

FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis

Image Processing Using Multi-Code GAN Prior

Interpreting the Latent Space of GANs for Semantic Face Editing

Closed-Form Factorization of Latent Semantics in GANs

Glancing at the Patch: Anomaly Localization With Global and Local Feature Comparison

Generative Hierarchical Features From Synthesizing Images

3D-Aware Image Synthesis via Learning Structural and Textural Representations

Improving GAN Equilibrium by Raising Spatial Awareness

Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition

Data-Efficient Instance Generation from Instance Discrimination

Low-Rank Subspaces in GANs

A Unified Model for Multi-class Anomaly Detection

Learning from Future: A Novel Self-Training Framework for Semantic Segmentation

Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator

Improving GANs with A Dynamic Discriminator

Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase

Learning Modulated Transformation in GANs

VideoComposer: Compositional Video Synthesis with Motion Controllability

Revisiting the Evaluation of Image Synthesis with GANs

FaceComposer: A Unified Model for Versatile Facial Content Creation

Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone

Customizable Image Synthesis with Multiple Subjects

Compact Neural Volumetric Video Representations with Dynamic Codebooks