Peng Jin

12

Papers

781

Total Citations

Papers (12)

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

MoH: Multi-Head Attention as Mixture-of-Head Attention

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable Repainting

VSNet: Focusing on the Linguistic Characteristics of Sign Language

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Aligning Instance Brownian Bridge with Texts for Open-Vocabulary Video Instance Segmentation

MUSE: Mamba Is Efficient Multi-scale Learner for Text-video Retrieval

Parallel Vertex Diffusion for Unified Visual Grounding

Auto-Linear Phenomenon in Subsurface Imaging