Yi Wang
24
Papers
1,758
Total Citations
1
Affiliations
Affiliations
The Hong Kong Polytechnic University
Papers (24)
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
CVPR 2024
864
citations
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
ICLR 2024
408
citations
VideoMamba: State Space Model for Efficient Video Understanding
ECCV 2024
396
citations
When Hypergraph Meets Heterophily: New Benchmark Datasets and Baseline
AAAI 2025
27
citations
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
CVPR 2025arXiv
19
citations
ViLLa: Video Reasoning Segmentation with Large Language Model
ICCV 2025
16
citations
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
ICLR 2025
9
citations
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
ICCV 2025
8
citations
Make Your Training Flexible: Towards Deployment-Efficient Video Models
ICCV 2025
5
citations
ML-GOOD: Towards Multi-Label Graph Out-Of-Distribution Detection
AAAI 2025
3
citations
StreamForest: Efficient Online Video Understanding with Persistent Event Memory
NeurIPS 2025
3
citations
PointPatchMix: Point Cloud Mixing with Patch Scoring
AAAI 2024arXiv
0
citations
FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering
CVPR 2025
0
citations
CW Complex Hypothesis for Image Data
ICML 2024
0
citations
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
CVPR 2025
0
citations
Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models
CVPR 2025
0
citations
Adaptive Learning of High-Value Regions for Semi-Supervised Medical Image Segmentation
ICCV 2025
0
citations
DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs
ICCV 2025
0
citations
Towards a Unified Copernicus Foundation Model for Earth Vision
ICCV 2025
0
citations
DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations
ICCV 2025
0
citations
OccProphet: Pushing the Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with an Observer-Forecaster-Refiner Framework
ICLR 2025
0
citations
MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding
AAAI 2025
0
citations
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
AAAI 2025
0
citations
Deep Hypergraph Neural Networks with Tight Framelets
AAAI 2025
0
citations