Haoyuan Li

8

Papers

107

Total Citations

Papers (8)

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

Anomaly Detection of Integrated Circuits Package Substrates Using the Large Vision Model SAIC: Dataset Construction, Methodology, and Application

DATE: Domain Adaptive Product Seeker for E-Commerce

Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos

Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization