Ping Luo

40
Papers
4,444
Total Citations

Papers (40)

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

CVPR 2024
2,210
citations

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

CVPR 2024
864
citations

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

ICLR 2024
408
citations

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

ICLR 2024
320
citations

Generalized Predictive Model for Autonomous Driving

CVPR 2024
122
citations

GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

ICCV 2025
96
citations

AnalogCoder: Analog Circuit Design via Training-Free Code Generation

AAAI 2025
79
citations

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

ICML 2025
72
citations

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

CVPR 2024
64
citations

Goku: Flow Based Video Generative Foundation Models

CVPR 2025arXiv
53
citations

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

ICLR 2024
46
citations

End-to-End Autonomous Driving Through V2X Cooperation

AAAI 2025
44
citations

AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

AAAI 2025
14
citations

Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM

ICCV 2025
10
citations

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

CVPR 2025
10
citations

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

ICLR 2025
7
citations

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

CVPR 2024
7
citations

Cached Transformers: Improving Transformers with Differentiable Memory Cached

AAAI 2024arXiv
5
citations

NADER: Neural Architecture Design via Multi-Agent Collaboration

CVPR 2025
3
citations

UniFS: Universal Few-shot Instance Perception with Point Representations

ECCV 2024
3
citations

BOOD: Boundary-based Out-Of-Distribution Data Generation

ICML 2025
2
citations

JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data

CVPR 2025
2
citations

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis

NeurIPS 2025
2
citations

DETree: DEtecting Human-AI Collaborative Texts via Tree-Structured Hierarchical Representation Learning

NeurIPS 2025
1
citations

LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation

ICCV 2025
0
citations

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

CVPR 2025
0
citations

RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins

CVPR 2025
0
citations

Unsupervised Continual Domain Shift Learning with Multi-Prototype Modeling

CVPR 2025
0
citations

G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation

CVPR 2025
0
citations

CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians

CVPR 2025
0
citations

MangaNinja: Line Art Colorization with Precise Reference Following

CVPR 2025
0
citations

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

CVPR 2025
0
citations

Mind the Boundary: Coreset Selection via Reconstructing the Decision Boundary

ICML 2024
0
citations

Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View

ICML 2024
0
citations

Position: Towards Implicit Prompt For Text-To-Image Models

ICML 2024
0
citations

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

ICML 2024
0
citations

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

ICML 2024
0
citations

GenTron: Diffusion Transformers for Image and Video Generation

CVPR 2024
0
citations

RegionGPT: Towards Region Understanding Vision Language Model

CVPR 2024
0
citations

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

CVPR 2024
0
citations