HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

63citations
Project
63
Citations
15
Authors
1
Data Points

Abstract

We presentHealthGPT, a powerful Medical Large Vision-Language Model (Med-LVLM) that integrates medical visual comprehension and generation capabilities within a unified autoregressive paradigm. Our bootstrapping philosophy is to progressively adapt heterogeneous comprehension and generation knowledge to pre-trained Large Language Models (LLMs). This is achieved through a novel heterogeneous low-rank adaptation(H-LoRA)technique, which is complemented by a tailored hierarchical visual perception(HVP)approach and a three-stage learning strategy(TLS). To effectively learn the HealthGPT, we devise a comprehensive medical domain-specific comprehension and generation dataset calledVL-Health. Experimental results demonstrate exceptional performance and scalability of HealthGPT in medical visual unified tasks. Our project can be accessed at https://github.com/DCDmllm/HealthGPT.

Citation History

Jan 28, 2026
63