Boqiang Zhang

5

Papers

45

Total Citations

Papers (5)

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness

SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis

ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing