Kaipeng Zhang

20
Papers
541
Total Citations

Papers (20)

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

ICLR 2024
320
citations

GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

ICCV 2025
96
citations

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

ICML 2025
72
citations

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

CVPR 2025
18
citations

Neighboring Autoregressive Modeling for Efficient Visual Generation

ICCV 2025
16
citations

REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training

NeurIPS 2025
8
citations

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

CVPR 2024
7
citations

Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification

AAAI 2024arXiv
4
citations

OneLLM: One Framework to Align All Modalities with Language

CVPR 2024
0
citations

ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity

ICCV 2025
0
citations

LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation

ICCV 2025
0
citations

ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges

ICCV 2025
0
citations

Position: Towards Implicit Prompt For Text-To-Image Models

ICML 2024
0
citations

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

ICML 2024
0
citations

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

ICML 2024
0
citations

Detecting Faces Using Inside Cascaded Contextual CNN

ICCV 2017
0
citations

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

ICCV 2023arXiv
0
citations

TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP without Training

AAAI 2024
0
citations

Neural Routing by Memory

NeurIPS 2021
0
citations

Foundation Model is Efficient Multimodal Multitask Model Selector

NeurIPS 2023
0
citations