Yun Zheng

9

Papers

72

Total Citations

Papers (9)

Relevant Intrinsic Feature Enhancement Network for Few-Shot Semantic Segmentation

UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface

NeurIPS 2025arXiv

Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models

Aligned Better, Listen Better for Audio-Visual Large Language Models

CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness

ContextHOI: Spatial Context Learning for Human-Object Interaction Detection

DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding

Orchestrating the Symphony of Prompt Distribution Learning for Human-Object Interaction Detection

CrossMAE: Cross-Modality Masked Autoencoders for Region-Aware Audio-Visual Pre-Training