Li Yuan

24
Papers
1,028
Total Citations

Papers (24)

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

CVPR 2024
354
citations

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

ICCV 2025
338
citations

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

ICLR 2024
54
citations

EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images

ICCV 2025
53
citations

Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

CVPR 2025
39
citations

MoH: Multi-Head Attention as Mixture-of-Head Attention

ICML 2025
37
citations

Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

AAAI 2025
35
citations

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

ICLR 2025
31
citations

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

NeurIPS 2025
25
citations

Epona: Autoregressive Diffusion World Model for Autonomous Driving

ICCV 2025
23
citations

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations

NeurIPS 2025
17
citations

DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses

ICCV 2025arXiv
12
citations

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable Repainting

ECCV 2024
7
citations

CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step

NeurIPS 2025
3
citations

LangBridge: Interpreting Image as a Combination of Language Embeddings

ICCV 2025
0
citations

Parallel Vertex Diffusion for Unified Visual Grounding

AAAI 2024arXiv
0
citations

RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing

CVPR 2025
0
citations

GraCo: Granularity-Controllable Interactive Segmentation

CVPR 2024
0
citations

SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement

CVPR 2024
0
citations

Regressor-Segmenter Mutual Prompt Learning for Crowd Counting

CVPR 2024
0
citations

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

CVPR 2025
0
citations

UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation

CVPR 2025
0
citations

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

CVPR 2025
0
citations

AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scenes

AAAI 2025
0
citations