Peng Jin
12
Papers
781
Total Citations
Papers (12)
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
CVPR 2024
354
citations
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
ICCV 2025
338
citations
MoH: Multi-Head Attention as Mixture-of-Head Attention
ICML 2025
37
citations
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
ICLR 2025
31
citations
Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
ECCV 2024
13
citations
Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable Repainting
ECCV 2024
7
citations
VSNet: Focusing on the Linguistic Characteristics of Sign Language
CVPR 2025
1
citations
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
ICML 2024
0
citations
Aligning Instance Brownian Bridge with Texts for Open-Vocabulary Video Instance Segmentation
AAAI 2025
0
citations
MUSE: Mamba Is Efficient Multi-scale Learner for Text-video Retrieval
AAAI 2025
0
citations
Parallel Vertex Diffusion for Unified Visual Grounding
AAAI 2024arXiv
0
citations
Auto-Linear Phenomenon in Subsurface Imaging
ICML 2024
0
citations