Xiangyu Yue

19
Papers
345
Total Citations

Papers (19)

Video-R1: Reinforcing Video Reasoning in MLLMs

NeurIPS 2025arXiv
236
citations

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

CVPR 2025arXiv
44
citations

Unleashing Vecset Diffusion Model for Fast Shape Generation

ICCV 2025arXiv
14
citations

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

CVPR 2024
11
citations

RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models

CVPR 2025
8
citations

SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance

CVPR 2025
7
citations

From Easy to Hard: Progressive Active Learning Framework for Infrared Small Target Detection with Single Point Supervision

ICCV 2025arXiv
6
citations

SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data

ICCV 2025arXiv
6
citations

Training Matting Models Without Alpha Labels

AAAI 2025arXiv
4
citations

FairGen: Enhancing Fairness in Text-to-Image Diffusion Models via Self-Discovering Latent Directions

ICCV 2025arXiv
3
citations

Breaking the Encoder Barrier for Seamless Video-Language Understanding

ICCV 2025
3
citations

CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation

ICCV 2025
2
citations

HypDAE: Hyperbolic Diffusion Autoencoders for Hierarchical Few-shot Image Generation

ICCV 2025arXiv
1
citations

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition

CVPR 2024
0
citations

Chimera: Improving Generalist Model with Domain-Specific Experts

ICCV 2025
0
citations

OneLLM: One Framework to Align All Modalities with Language

CVPR 2024
0
citations

UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines

CVPR 2025
0
citations

Learning Beyond Still Frames: Scaling Vision-Language Models with Video

ICCV 2025
0
citations

Scaling Omni-modal Pretraining with Multimodal Context: Advancing Universal Representation Learning Across Modalities

ICCV 2025
0
citations