Yi Wang

24
Papers
1,758
Total Citations
1
Affiliations

Affiliations

The Hong Kong Polytechnic University

Papers (24)

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

CVPR 2024
864
citations

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

ICLR 2024
408
citations

VideoMamba: State Space Model for Efficient Video Understanding

ECCV 2024
396
citations

When Hypergraph Meets Heterophily: New Benchmark Datasets and Baseline

AAAI 2025
27
citations

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

CVPR 2025arXiv
19
citations

ViLLa: Video Reasoning Segmentation with Large Language Model

ICCV 2025
16
citations

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

ICLR 2025
9
citations

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

ICCV 2025
8
citations

Make Your Training Flexible: Towards Deployment-Efficient Video Models

ICCV 2025
5
citations

ML-GOOD: Towards Multi-Label Graph Out-Of-Distribution Detection

AAAI 2025
3
citations

StreamForest: Efficient Online Video Understanding with Persistent Event Memory

NeurIPS 2025
3
citations

PointPatchMix: Point Cloud Mixing with Patch Scoring

AAAI 2024arXiv
0
citations

FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering

CVPR 2025
0
citations

CW Complex Hypothesis for Image Data

ICML 2024
0
citations

ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction

CVPR 2025
0
citations

Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models

CVPR 2025
0
citations

Adaptive Learning of High-Value Regions for Semi-Supervised Medical Image Segmentation

ICCV 2025
0
citations

DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs

ICCV 2025
0
citations

Towards a Unified Copernicus Foundation Model for Earth Vision

ICCV 2025
0
citations

DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations

ICCV 2025
0
citations

OccProphet: Pushing the Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with an Observer-Forecaster-Refiner Framework

ICLR 2025
0
citations

MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding

AAAI 2025
0
citations

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

AAAI 2025
0
citations

Deep Hypergraph Neural Networks with Tight Framelets

AAAI 2025
0
citations