Li Yuan
24
Papers
1,028
Total Citations
Papers (24)
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
CVPR 2024
354
citations
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
ICCV 2025
338
citations
Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts
ICLR 2024
54
citations
EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images
ICCV 2025
53
citations
Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning
CVPR 2025
39
citations
MoH: Multi-Head Attention as Mixture-of-Head Attention
ICML 2025
37
citations
Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
AAAI 2025
35
citations
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
ICLR 2025
31
citations
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation
NeurIPS 2025
25
citations
Epona: Autoregressive Diffusion World Model for Autonomous Driving
ICCV 2025
23
citations
Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations
NeurIPS 2025
17
citations
DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses
ICCV 2025arXiv
12
citations
Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable Repainting
ECCV 2024
7
citations
CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step
NeurIPS 2025
3
citations
LangBridge: Interpreting Image as a Combination of Language Embeddings
ICCV 2025
0
citations
Parallel Vertex Diffusion for Unified Visual Grounding
AAAI 2024arXiv
0
citations
RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing
CVPR 2025
0
citations
GraCo: Granularity-Controllable Interactive Segmentation
CVPR 2024
0
citations
SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement
CVPR 2024
0
citations
Regressor-Segmenter Mutual Prompt Learning for Crowd Counting
CVPR 2024
0
citations
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
CVPR 2025
0
citations
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
CVPR 2025
0
citations
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
CVPR 2025
0
citations
AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scenes
AAAI 2025
0
citations