Ying Shan
106
Papers
2,552
Total Citations
Papers (106)
T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion
AAAI 2024arXiv
1,423
citations
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
CVPR 2024
237
citations
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
CVPR 2024
139
citations
ST-LLM: Large Language Models Are Effective Temporal Learners
ECCV 2024
124
citations
ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models
ICLR 2024
110
citations
Taming Rectified Flow for Inversion and Editing
ICML 2025
110
citations
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing
CVPR 2024
89
citations
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
ECCV 2024
50
citations
Image Conductor: Precision Control for Interactive Video Synthesis
AAAI 2025
46
citations
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
CVPR 2025
44
citations
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
ICCV 2025
35
citations
Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh
CVPR 2025arXiv
23
citations
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
ICCV 2025
19
citations
Programmable Motion Generation for Open-Set Motion Control Tasks
CVPR 2024
16
citations
Scalable Image Tokenization with Index Backpropagation Quantization
ICCV 2025
16
citations
ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis
CVPR 2024
15
citations
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
CVPR 2024
11
citations
SC-NeuS: Consistent Neural Surface Reconstruction from Sparse and Noisy Views
AAAI 2024arXiv
10
citations
GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers
ICCV 2025
9
citations
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
CVPR 2025
7
citations
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
ICML 2025
6
citations
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images
CVPR 2025
6
citations
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
NeurIPS 2025arXiv
4
citations
Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
CVPR 2025arXiv
3
citations
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
CVPR 2024
0
citations
ViT-Lens: Towards Omni-modal Representations
CVPR 2024
0
citations
How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?
CVPR 2024
0
citations
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
CVPR 2021arXiv
0
citations
Open-Book Video Captioning With Retrieve-Copy-Generate Network
CVPR 2021arXiv
0
citations
Towards Real-World Blind Face Restoration With Generative Facial Prior
CVPR 2021arXiv
0
citations
Bridging Video-Text Retrieval With Multiple Choice Questions
CVPR 2022arXiv
0
citations
Object-Aware Video-Language Pre-Training for Retrieval
CVPR 2022arXiv
0
citations
BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild
CVPR 2022
0
citations
Temporally Efficient Vision Transformer for Video Instance Segmentation
CVPR 2022arXiv
0
citations
UMT: Unified Multi-Modal Transformers for Joint Video Moment Retrieval and Highlight Detection
CVPR 2022arXiv
0
citations
Accelerating Vision-Language Pretraining With Free Language Modeling
CVPR 2023arXiv
0
citations
3D GAN Inversion With Facial Symmetry Prior
CVPR 2023arXiv
0
citations
Generating Human Motion From Textual Descriptions With Discrete Representations
CVPR 2023arXiv
0
citations
DPE: Disentanglement of Pose and Expression for General Video Portrait Editing
CVPR 2023arXiv
0
citations
DropMAE: Masked Autoencoders With Spatial-Attention Dropout for Tracking Tasks
CVPR 2023arXiv
0
citations
Improved Test-Time Adaptation for Domain Generalization
CVPR 2023arXiv
0
citations
HRDFuse: Monocular 360deg Depth Estimation by Collaboratively Learning Holistic-With-Regional Depth Distributions
CVPR 2023
0
citations
High-Fidelity Facial Avatar Reconstruction From Monocular Video With Generative Priors
CVPR 2023arXiv
0
citations
All in One: Exploring Unified Video-Language Pre-Training
CVPR 2023arXiv
0
citations
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
CVPR 2023arXiv
0
citations
Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields
CVPR 2023arXiv
0
citations
LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation
CVPR 2023arXiv
0
citations
OSRT: Omnidirectional Image Super-Resolution With Distortion-Aware Transformer
CVPR 2023arXiv
0
citations
Learning Anchor Transformations for 3D Garment Animation
CVPR 2023arXiv
0
citations
ViLEM: Visual-Language Error Modeling for Image-Text Retrieval
CVPR 2023
0
citations
RILS: Masked Visual Reconstruction in Language Semantic Space
CVPR 2023arXiv
0
citations
SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes
CVPR 2023arXiv
0
citations
Skinned Motion Retargeting With Residual Perception of Motion Semantics & Geometry
CVPR 2023arXiv
0
citations
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
CVPR 2023arXiv
0
citations
Instances As Queries
ICCV 2021
0
citations
Crossover Learning for Fast Online Video Instance Segmentation
ICCV 2021arXiv
0
citations
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
ICCV 2023arXiv
0
citations
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
ICCV 2023
0
citations
Order-Prompted Tag Sequence Generation for Video Tagging
ICCV 2023
0
citations
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
ICCV 2023arXiv
0
citations
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
ICCV 2023arXiv
0
citations
Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video
ICCV 2023arXiv
0
citations
HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video
ICCV 2023arXiv
0
citations
OmniZoomer: Learning to Move and Zoom in on Sphere at High-Resolution
ICCV 2023arXiv
0
citations
Exploring Model Transferability through the Lens of Potential Energy
ICCV 2023arXiv
0
citations
Fast Video Object Segmentation using the Global Context Module
ECCV 2020
0
citations
Metric Learning Based Interactive Modulation for Real-World Super-Resolution
ECCV 2022
0
citations
VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder
ECCV 2022
0
citations
Mc-BEiT: Multi-Choice Discretization for Image BERT Pre-training
ECCV 2022
0
citations
Not All Models Are Equal: Predicting Model Transferability in a Self-Challenging Fisher Space
ECCV 2022
0
citations
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval
ECCV 2022
0
citations
Towards Vivid and Diverse Image Colorization With Generative Color Prior
ICCV 2021arXiv
0
citations
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
CVPR 2025
0
citations
DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation
CVPR 2025
0
citations
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
ICCV 2025
0
citations
VisionMath: Vision-Form Mathematical Problem-Solving
ICCV 2025
0
citations
FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction
ICCV 2025
0
citations
Mamba-3VL: Taming State Space Model for 3D Vision Language Learning
ICCV 2025
0
citations
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
ICCV 2025
0
citations
DepthSync: Diffusion Guidance-Based Depth Synchronization for Scale- and Geometry-Consistent Video Depth Estimation
ICCV 2025
0
citations
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
AAAI 2025
0
citations
A Pre-convolved Representation for Plug-and-Play Neural Illumination Fields
AAAI 2024
0
citations
SparseGNV: Generating Novel Views of Indoor Scenes with Sparse RGB-D Images
AAAI 2024
0
citations
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
CVPR 2024
0
citations
GS-IR: 3D Gaussian Splatting for Inverse Rendering
CVPR 2024
0
citations
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
CVPR 2024
0
citations
YOLO-World: Real-Time Open-Vocabulary Object Detection
CVPR 2024
0
citations
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models
CVPR 2024
0
citations
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
CVPR 2024
0
citations
HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion
CVPR 2024
0
citations
SEED-Bench: Benchmarking Multimodal Large Language Models
CVPR 2024
0
citations
Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs
CVPR 2024
0
citations
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition
CVPR 2024
0
citations
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
CVPR 2024
0
citations
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
CVPR 2024
0
citations
Detecting Interactions from Neural Networks via Topological Analysis
NeurIPS 2020
0
citations
Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution
NeurIPS 2021
0
citations
AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos
NeurIPS 2022
0
citations
DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes
NeurIPS 2022
0
citations
PanoGRF: Generalizable Spherical Radiance Fields for Wide-baseline Panoramas
NeurIPS 2023
0
citations
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
NeurIPS 2023
0
citations
CL-NeRF: Continual Learning of Neural Radiance Fields for Evolving Scene Representation
NeurIPS 2023
0
citations
Exploiting Contextual Objects and Relations for 3D Visual Grounding
NeurIPS 2023
0
citations
Meta-Adapter: An Online Few-shot Learner for Vision-Language Model
NeurIPS 2023
0
citations
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
NeurIPS 2023
0
citations
Inserting Anybody in Diffusion Models via Celeb Basis
NeurIPS 2023
0
citations