Chen-Wei Xie
7
Papers
47
Total Citations
Papers (7)
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
CVPR 2025
14
citations
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
NeurIPS 2025arXiv
14
citations
Aligned Better, Listen Better for Audio-Visual Large Language Models
ICLR 2025
8
citations
Learning Visual Generative Priors without Text
CVPR 2025
4
citations
BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs
CVPR 2025
3
citations
CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
NeurIPS 2025
3
citations
DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
ICCV 2025
1
citations