Cong Wei
8
Papers
240
Total Citations
Papers (8)
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
ECCV 2024
127
citations
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
ICLR 2025arXiv
88
citations
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models
ICCV 2025
12
citations
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
CVPR 2025
9
citations
HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver
CVPR 2025
4
citations
Advancing Visual Large Language Model for Multi-granular Versatile Perception
ICCV 2025
0
citations
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
ICCV 2025
0
citations
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
CVPR 2024
0
citations