Yi Wang

24

Papers

1,758

Total Citations

1

Affiliations

Affiliations

The Hong Kong Polytechnic University

Papers (24)

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

VideoMamba: State Space Model for Efficient Video Understanding

When Hypergraph Meets Heterophily: New Benchmark Datasets and Baseline

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

ViLLa: Video Reasoning Segmentation with Large Language Model

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

Make Your Training Flexible: Towards Deployment-Efficient Video Models

ML-GOOD: Towards Multi-Label Graph Out-Of-Distribution Detection

StreamForest: Efficient Online Video Understanding with Persistent Event Memory

PointPatchMix: Point Cloud Mixing with Patch Scoring

FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering

CW Complex Hypothesis for Image Data

ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction

Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models

Adaptive Learning of High-Value Regions for Semi-Supervised Medical Image Segmentation

DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs

Towards a Unified Copernicus Foundation Model for Earth Vision

DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations

OccProphet: Pushing the Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with an Observer-Forecaster-Refiner Framework

MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

Deep Hypergraph Neural Networks with Tight Framelets