Li Yuan

24

Papers

1,028

Total Citations

Papers (24)

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images

Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

MoH: Multi-Head Attention as Mixture-of-Head Attention

Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

Epona: Autoregressive Diffusion World Model for Autonomous Driving

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations

DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable Repainting

CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step

LangBridge: Interpreting Image as a Combination of Language Embeddings

Parallel Vertex Diffusion for Unified Visual Grounding

RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing

GraCo: Granularity-Controllable Interactive Segmentation

SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement

Regressor-Segmenter Mutual Prompt Learning for Crowd Counting

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scenes