Kaipeng Zhang

16
Papers
541
Total Citations

Papers (16)

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

ICLR 2024
320
citations

GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

ICCV 2025
96
citations

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

ICML 2025
72
citations

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

CVPR 2025
18
citations

Neighboring Autoregressive Modeling for Efficient Visual Generation

ICCV 2025
16
citations

REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training

NeurIPS 2025
8
citations

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

CVPR 2024
7
citations

Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification

AAAI 2024arXiv
4
citations

TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP without Training

AAAI 2024
0
citations

OneLLM: One Framework to Align All Modalities with Language

CVPR 2024
0
citations

ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity

ICCV 2025
0
citations

LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation

ICCV 2025
0
citations

ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges

ICCV 2025
0
citations

Position: Towards Implicit Prompt For Text-To-Image Models

ICML 2024
0
citations

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

ICML 2024
0
citations

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

ICML 2024
0
citations