by Suyuchen Wang Papers
5 papers found
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding
Ahmed Masry, Juan Rodriguez, Tianyu Zhang et al.
NeurIPS 2025posterarXiv:2502.01341
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks
Juan A. Rodriguez, Xiangru Jian, Siba Smarak Panigrahi et al.
ICLR 2025posterarXiv:2412.04626
5
citations
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation
Lu Li, Tianyu Zhang, Zhiqi Bu et al.
ICLR 2025poster
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts
Xiaoqiang Wang, Suyuchen Wang, Yun Zhu et al.
NeurIPS 2025poster
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
Tianyu Zhang, Suyuchen Wang, Lu Li et al.
ICLR 2025poster