Zihan Wang

17
Papers
963
Total Citations

Papers (17)

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

CVPR 2025
858
citations

Re-thinking Temporal Search for Long-Form Video Understanding

CVPR 2025
36
citations

Implicit bias of SGD in $L_2$-regularized linear DNNs: One-way jumps from high to low rank

ICLR 2024
23
citations

Reducing Tool Hallucination via Reliability Alignment

ICML 2025
19
citations

g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks

CVPR 2025
8
citations

Modeling Cell Dynamics and Interactions with Unbalanced Mean Field Schrödinger Bridge

NeurIPS 2025
7
citations

Variational Regularized Unbalanced Optimal Transport: Single Network, Least Action

NeurIPS 2025arXiv
7
citations

MonoFusion: Sparse-View 4D Reconstruction via Monocular Fusion

ICCV 2025
4
citations

Auxiliary Prompt Tuning of Vision-Language Models for Few-Shot Out-of-Distribution Detection

ICCV 2025
1
citations

Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation

CVPR 2024
0
citations

Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition

CVPR 2024
0
citations

CogAgent: A Visual Language Model for GUI Agents

CVPR 2024
0
citations

KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation

CVPR 2023arXiv
0
citations

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection

CVPR 2023arXiv
0
citations

GridMM: Grid Memory Map for Vision-and-Language Navigation

ICCV 2023arXiv
0
citations

M$^4$I: Multi-modal Models Membership Inference

NeurIPS 2022
0
citations

Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback

NeurIPS 2023
0
citations