Chang

16
Papers
733
Total Citations

Papers (16)

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

ECCV 2024arXiv
473
citations

LongVLM: Efficient Long Video Understanding via Large Language Models

ECCV 2024arXiv
128
citations

Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

ICLR 2025arXiv
46
citations

MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models

ICLR 2025arXiv
29
citations

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

ICLR 2025arXiv
17
citations

The Hard Positive Truth about Vision-Language Compositionality

ECCV 2024arXiv
15
citations

Controllable Generation via Locally Constrained Resampling

ICLR 2025arXiv
9
citations

Space Group Equivariant Crystal Diffusion

NeurIPS 2025arXiv
6
citations

How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?

ICLR 2025arXiv
4
citations

SMMILE: An expert-driven benchmark for multimodal medical in-context learning

NeurIPS 2025arXiv
3
citations

Neural-Driven Image Editing

NeurIPS 2025arXiv
2
citations

KnowMol: Advancing Molecular Large Language Models with Multi-Level Chemical Knowledge

NeurIPS 2025arXiv
1
citations

Steering Information Utility in Key-Value Memory for Language Model Post-Training

NeurIPS 2025arXiv
0
citations

Automated Composition of Agents: A Knapsack Approach for Agentic Component Selection

NeurIPS 2025arXiv
0
citations

WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios

NeurIPS 2025arXiv
0
citations

Bayesian Regularization of Latent Representation

ICLR 2025
0
citations