Hang Hua
6
Papers
122
Total Citations
Papers (6)
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
AAAI 2025
47
citations
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
AAAI 2025
24
citations
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
CVPR 2025
17
citations
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
CVPR 2025
16
citations
FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
ECCV 2024arXiv
14
citations
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
NeurIPS 2025
4
citations