Xiaohan Wang

11

Papers

77

Total Citations

Papers (11)

Describing Differences in Image Sets with Natural Language

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

Cross-Sentence Gloss Consistency for Continuous Sign Language Recognition

Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Interpretable3D: An Ad

A Category Agnostic Model for Visual Rearrangment

Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

An Interactive Navigation Method with Effect-oriented Affordance