Ping Luo
40
Papers
4,444
Total Citations
Papers (40)
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024
2,210
citations
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
CVPR 2024
864
citations
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
ICLR 2024
408
citations
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
ICLR 2024
320
citations
Generalized Predictive Model for Autonomous Driving
CVPR 2024
122
citations
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
ICCV 2025
96
citations
AnalogCoder: Analog Circuit Design via Training-Free Code Generation
AAAI 2025
79
citations
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
ICML 2025
72
citations
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
CVPR 2024
64
citations
Goku: Flow Based Video Generative Foundation Models
CVPR 2025arXiv
53
citations
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
ICLR 2024
46
citations
End-to-End Autonomous Driving Through V2X Cooperation
AAAI 2025
44
citations
AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks
AAAI 2025
14
citations
Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM
ICCV 2025
10
citations
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
CVPR 2025
10
citations
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
ICLR 2025
7
citations
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
CVPR 2024
7
citations
Cached Transformers: Improving Transformers with Differentiable Memory Cached
AAAI 2024arXiv
5
citations
NADER: Neural Architecture Design via Multi-Agent Collaboration
CVPR 2025
3
citations
UniFS: Universal Few-shot Instance Perception with Point Representations
ECCV 2024
3
citations
BOOD: Boundary-based Out-Of-Distribution Data Generation
ICML 2025
2
citations
JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data
CVPR 2025
2
citations
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis
NeurIPS 2025
2
citations
DETree: DEtecting Human-AI Collaborative Texts via Tree-Structured Hierarchical Representation Learning
NeurIPS 2025
1
citations
LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
ICCV 2025
0
citations
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
CVPR 2025
0
citations
RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins
CVPR 2025
0
citations
Unsupervised Continual Domain Shift Learning with Multi-Prototype Modeling
CVPR 2025
0
citations
G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation
CVPR 2025
0
citations
CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians
CVPR 2025
0
citations
MangaNinja: Line Art Colorization with Precise Reference Following
CVPR 2025
0
citations
DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation
CVPR 2025
0
citations
Mind the Boundary: Coreset Selection via Reconstructing the Decision Boundary
ICML 2024
0
citations
Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
ICML 2024
0
citations
Position: Towards Implicit Prompt For Text-To-Image Models
ICML 2024
0
citations
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
ICML 2024
0
citations
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
ICML 2024
0
citations
GenTron: Diffusion Transformers for Image and Video Generation
CVPR 2024
0
citations
RegionGPT: Towards Region Understanding Vision Language Model
CVPR 2024
0
citations
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
CVPR 2024
0
citations