Chang

16

Papers

733

Total Citations

Papers (16)

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

LongVLM: Efficient Long Video Understanding via Large Language Models

Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

The Hard Positive Truth about Vision-Language Compositionality

Controllable Generation via Locally Constrained Resampling

Space Group Equivariant Crystal Diffusion

NeurIPS 2025arXiv

How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?

SMMILE: An expert-driven benchmark for multimodal medical in-context learning

NeurIPS 2025arXiv

Neural-Driven Image Editing

NeurIPS 2025arXiv

KnowMol: Advancing Molecular Large Language Models with Multi-Level Chemical Knowledge

NeurIPS 2025arXiv

Steering Information Utility in Key-Value Memory for Language Model Post-Training

NeurIPS 2025arXiv

Automated Composition of Agents: A Knapsack Approach for Agentic Component Selection

NeurIPS 2025arXiv

WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios

NeurIPS 2025arXiv

Bayesian Regularization of Latent Representation