Chang
16
Papers
733
Total Citations
Papers (16)
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
ECCV 2024arXiv
473
citations
LongVLM: Efficient Long Video Understanding via Large Language Models
ECCV 2024arXiv
128
citations
Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos
ICLR 2025arXiv
46
citations
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
ICLR 2025arXiv
29
citations
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
ICLR 2025arXiv
17
citations
The Hard Positive Truth about Vision-Language Compositionality
ECCV 2024arXiv
15
citations
Controllable Generation via Locally Constrained Resampling
ICLR 2025arXiv
9
citations
Space Group Equivariant Crystal Diffusion
NeurIPS 2025arXiv
6
citations
How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?
ICLR 2025arXiv
4
citations
SMMILE: An expert-driven benchmark for multimodal medical in-context learning
NeurIPS 2025arXiv
3
citations
Neural-Driven Image Editing
NeurIPS 2025arXiv
2
citations
KnowMol: Advancing Molecular Large Language Models with Multi-Level Chemical Knowledge
NeurIPS 2025arXiv
1
citations
Steering Information Utility in Key-Value Memory for Language Model Post-Training
NeurIPS 2025arXiv
0
citations
Automated Composition of Agents: A Knapsack Approach for Agentic Component Selection
NeurIPS 2025arXiv
0
citations
WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios
NeurIPS 2025arXiv
0
citations
Bayesian Regularization of Latent Representation
ICLR 2025
0
citations