Yunhao Gou

3

Papers

44

Total Citations

Papers (3)

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Leveraging per Image-Token Consistency for Vision-Language Pre-Training

Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification