Yunlong Tang
7
Papers
107
Total Citations
Papers (7)
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
AAAI 2025
47
citations
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
AAAI 2025
24
citations
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
CVPR 2025
16
citations
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion
AAAI 2025
13
citations
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
NeurIPS 2025
4
citations
ZeroSep: Separate Anything in Audio with Zero Training
NeurIPS 2025
3
citations
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach
CVPR 2025
0
citations