Hongyu Li
3
Papers
0
Total Citations
Papers (3)
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding
CVPR 2025arXiv
0
citations
Revisiting Audio-Visual Segmentation with Vision-Centric Transformer
CVPR 2025arXiv
0
citations
Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
AAAI 2025arXiv
0
citations