Yue Fan

8

Papers

97

Total Citations

Papers (8)

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices

Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding

Factored-NeuS: Reconstructing Surfaces, Illumination, and Materials of Possibly Glossy Objects

CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning

SSB: Simple but Strong Baseline for Boosting Performance of Open-Set Semi-Supervised Learning

USB: A Unified Semi-supervised Learning Benchmark for Classification