NEURIPS 2025 "multimodal document understanding" Papers
2 papers found
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding
Ahmed Masry, Juan Rodriguez, Tianyu Zhang et al.
NEURIPS 2025posterarXiv:2502.01341
Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models
zhentao he, Can Zhang, Ziheng Wu et al.
NEURIPS 2025posterarXiv:2506.20168
2
citations