Fengyun Rao
12
Papers
274
Total Citations
Papers (12)
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
ICCV 2025
247
citations
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
CVPR 2025
13
citations
Spatial-Semantic Collaborative Cropping for User Generated Content
AAAI 2024arXiv
7
citations
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
CVPR 2025
7
citations
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
ICCV 2025
0
citations
Inter-X: Towards Versatile Human-Human Interaction Analysis
CVPR 2024
0
citations
ReGenNet: Towards Human Action-Reaction Synthesis
CVPR 2024
0
citations
Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation
CVPR 2022
0
citations
CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation
ECCV 2022
0
citations
Number it: Temporal Grounding Videos like Flipping Manga
CVPR 2025
0
citations
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
ICCV 2025
0
citations
HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models
ICCV 2025
0
citations