Ping Luo
149
Papers
4,598
Total Citations
Papers (149)
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024
2,210
citations
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
CVPR 2024
864
citations
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
ICLR 2024
408
citations
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
ICLR 2024
320
citations
Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation
ECCV 2020
138
citations
Generalized Predictive Model for Autonomous Driving
CVPR 2024
122
citations
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
ICCV 2025
96
citations
AnalogCoder: Analog Circuit Design via Training-Free Code Generation
AAAI 2025
79
citations
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
ICML 2025
72
citations
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
CVPR 2024
64
citations
Goku: Flow Based Video Generative Foundation Models
CVPR 2025arXiv
53
citations
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
ICLR 2024
46
citations
End-to-End Autonomous Driving Through V2X Cooperation
AAAI 2025
44
citations
Webly Supervised Image Classification with Self-Contained Confidence
ECCV 2020
16
citations
AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks
AAAI 2025
14
citations
Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM
ICCV 2025
10
citations
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
CVPR 2025
10
citations
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
ICLR 2025
7
citations
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
CVPR 2024
7
citations
Cached Transformers: Improving Transformers with Differentiable Memory Cached
AAAI 2024arXiv
5
citations
UniFS: Universal Few-shot Instance Perception with Point Representations
ECCV 2024
3
citations
NADER: Neural Architecture Design via Multi-Agent Collaboration
CVPR 2025
3
citations
JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data
CVPR 2025
2
citations
BOOD: Boundary-based Out-Of-Distribution Data Generation
ICML 2025
2
citations
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis
NeurIPS 2025
2
citations
DETree: DEtecting Human-AI Collaborative Texts via Tree-Structured Hierarchical Representation Learning
NeurIPS 2025
1
citations
DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images
CVPR 2019
0
citations
Learning a Reinforced Agent for Flexible Exposure Bracketing Selection
CVPR 2020arXiv
0
citations
MaskGAN: Towards Diverse and Interactive Facial Image Manipulation
CVPR 2020arXiv
0
citations
3D Human Mesh Regression With Dense Correspondence
CVPR 2020arXiv
0
citations
Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content
CVPR 2020
0
citations
Learning Depth-Guided Convolutions for Monocular 3D Object Detection
CVPR 2020arXiv
0
citations
Online Knowledge Distillation via Collaborative Learning
CVPR 2020
0
citations
Exemplar Normalization for Learning Deep Representation
CVPR 2020arXiv
0
citations
PolarMask: Single Shot Instance Segmentation With Polar Representation
CVPR 2020arXiv
0
citations
When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks
CVPR 2021arXiv
0
citations
Disentangled Cycle Consistency for Highly-Realistic Virtual Try-On
CVPR 2021arXiv
0
citations
Sparse R-CNN: End-to-End Object Detection With Learnable Proposals
CVPR 2021
0
citations
Parser-Free Virtual Try-On via Distilling Appearance Flows
CVPR 2021arXiv
0
citations
ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search
CVPR 2021arXiv
0
citations
HR-NAS: Searching Efficient High-Resolution Neural Architectures With Lightweight Transformers
CVPR 2021
0
citations
Bridging Video-Text Retrieval With Multiple Choice Questions
CVPR 2022arXiv
0
citations
RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs
CVPR 2022arXiv
0
citations
Language As Queries for Referring Video Object Segmentation
CVPR 2022arXiv
0
citations
Not All Tokens Are Equal: Human-Centric Visual Analysis via Token Clustering Transformer
CVPR 2022arXiv
0
citations
Panoptic SegFormer: Delving Deeper Into Panoptic Segmentation With Transformers
CVPR 2022arXiv
0
citations
Scale-Equivalent Distillation for Semi-Supervised Object Detection
CVPR 2022arXiv
0
citations
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion
CVPR 2022arXiv
0
citations
Accelerating Vision-Language Pretraining With Free Language Modeling
CVPR 2023arXiv
0
citations
Visual Dependency Transformers: Dependency Tree Emerges From Reversed Attention
CVPR 2023arXiv
0
citations
Universal Instance Perception As Object Discovery and Retrieval
CVPR 2023arXiv
0
citations
RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer
CVPR 2023
0
citations
V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting
CVPR 2023
0
citations
Learning Transferable Spatiotemporal Representations From Natural Script Knowledge
CVPR 2023arXiv
0
citations
EC2: Emergent Communication for Embodied Control
CVPR 2023
0
citations
Real-Time Controllable Denoising for Image and Video
CVPR 2023arXiv
0
citations
Policy Adaptation From Foundation Model Feedback
CVPR 2023arXiv
0
citations
Dense Distinct Query for End-to-End Object Detection
CVPR 2023arXiv
0
citations
Semantic Image Segmentation via Deep Parsing Network
ICCV 2015
0
citations
Deep Learning Strong Parts for Pedestrian Detection
ICCV 2015
0
citations
Learning Social Relation Traits From Face Images
ICCV 2015
0
citations
From Facial Parts Responses to Face Detection: A Deep Learning Approach
ICCV 2015
0
citations
Deep Learning Face Attributes in the Wild
ICCV 2015
0
citations
Deep Dual Learning for Semantic Image Segmentation
ICCV 2017
0
citations
Vision-Infused Deep Audio Inpainting
ICCV 2019
0
citations
Switchable Whitening for Deep Representation Learning
ICCV 2019
0
citations
CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization
ICCV 2019
0
citations
Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once
ICCV 2019
0
citations
Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid
ICCV 2019
0
citations
Deep Self-Learning From Noisy Labels
ICCV 2019
0
citations
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions
ICCV 2021arXiv
0
citations
DetCo: Unsupervised Contrastive Learning for Object Detection
ICCV 2021arXiv
0
citations
Adversarial Robustness for Unsupervised Domain Adaptation
ICCV 2021arXiv
0
citations
Watch Only Once: An End-to-End Video Action Detection Framework
ICCV 2021
0
citations
Bringing Events Into Video Deblurring With Non-Consecutively Blurry Frames
ICCV 2021
0
citations
STAR: A Structure-Aware Lightweight Transformer for Real-Time Image Enhancement
ICCV 2021
0
citations
End-to-End Dense Video Captioning With Parallel Decoding
ICCV 2021arXiv
0
citations
EGC: Image Generation and Classification via a Diffusion Energy-Based Model
ICCV 2023arXiv
0
citations
MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation
ICCV 2023
0
citations
Scene as Occupancy
ICCV 2023arXiv
0
citations
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
ICCV 2023arXiv
0
citations
Segment Every Reference Object in Spatial and Temporal Spaces
ICCV 2023
0
citations
Beyond One-to-One: Rethinking the Referring Image Segmentation
ICCV 2023
0
citations
Going Denser with Open-Vocabulary Part Segmentation
ICCV 2023arXiv
0
citations
RIGID: Recurrent GAN Inversion and Editing of Real Face Videos
ICCV 2023arXiv
0
citations
DDP: Diffusion Model for Dense Visual Prediction
ICCV 2023arXiv
0
citations
Exploring Transformers for Open-world Instance Segmentation
ICCV 2023arXiv
0
citations
DiffusionDet: Diffusion Model for Object Detection
ICCV 2023arXiv
0
citations
Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation
ECCV 2020
0
citations
Whole-Body Human Pose Estimation in the Wild
ECCV 2020
0
citations
Segmenting Transparent Objects in the Wild
ECCV 2020
0
citations
AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting
ECCV 2020
0
citations
Dynamic and Static Context-aware LSTM for Multi-agent Motion Prediction
ECCV 2020
0
citations
PoseTrans: A Simple yet Effective Pose Transformation Augmentation for Human Pose Estimation
ECCV 2022
0
citations
3D Interacting Hand Pose Estimation by Hand De-Occlusion and Removal
ECCV 2022
0
citations
Pose for Everything: Towards Category-Agnostic Pose Estimation
ECCV 2022
0
citations
Towards Grand Unification of Object Tracking
ECCV 2022
0
citations
ByteTrack: Multi-Object Tracking by Associating Every Detection Box
ECCV 2022
0
citations
DaViT: Dual Attention Vision Transformers
ECCV 2022
0
citations
Not All Models Are Equal: Predicting Model Transferability in a Self-Challenging Fisher Space
ECCV 2022
0
citations
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval
ECCV 2022
0
citations
Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks
ICCV 2019
0
citations
DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation
CVPR 2025
0
citations
MangaNinja: Line Art Colorization with Precise Reference Following
CVPR 2025
0
citations
CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians
CVPR 2025
0
citations
G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation
CVPR 2025
0
citations
Unsupervised Continual Domain Shift Learning with Multi-Prototype Modeling
CVPR 2025
0
citations
RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins
CVPR 2025
0
citations
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
CVPR 2025
0
citations
LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
ICCV 2025
0
citations
GenTron: Diffusion Transformers for Image and Video Generation
CVPR 2024
0
citations
RegionGPT: Towards Region Understanding Vision Language Model
CVPR 2024
0
citations
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
CVPR 2024
0
citations
Mind the Boundary: Coreset Selection via Reconstructing the Decision Boundary
ICML 2024
0
citations
Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
ICML 2024
0
citations
Position: Towards Implicit Prompt For Text-To-Image Models
ICML 2024
0
citations
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
ICML 2024
0
citations
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
ICML 2024
0
citations
DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection
CVPR 2015
0
citations
A Large-Scale Car Dataset for Fine-Grained Categorization and Verification
CVPR 2015
0
citations
Pedestrian Detection Aided by Deep Learning Semantic Tasks
CVPR 2015
0
citations
DeepFashion: Powering Robust Clothes Recognition and Retrieval With Rich Annotations
CVPR 2016
0
citations
WIDER FACE: A Face Detection Benchmark
CVPR 2016
0
citations
Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade
CVPR 2017arXiv
0
citations
Learning Object Interactions and Descriptions for Semantic Image Segmentation
CVPR 2017
0
citations
FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis
CVPR 2018
0
citations
SSN: Learning Sparse Switchable Normalization via SparsestMax
CVPR 2019
0
citations
Kalman Normalization: Normalizing Internal Representations Across Network Layers
NeurIPS 2018
0
citations
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
NeurIPS 2021
0
citations
Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning
NeurIPS 2021
0
citations
Model-Based Reinforcement Learning via Imagination with Derived Memory
NeurIPS 2021
0
citations
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
NeurIPS 2021
0
citations
Compressed Video Contrastive Learning
NeurIPS 2021
0
citations
Rethinking the Pruning Criteria for Convolutional Neural Network
NeurIPS 2021
0
citations
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
NeurIPS 2022
0
citations
Large-batch Optimization for Dense Visual Predictions: Training Faster R-CNN in 4.2 Minutes
NeurIPS 2022
0
citations
MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning
NeurIPS 2022
0
citations
DOMINO: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning
NeurIPS 2022
0
citations
AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation
NeurIPS 2022
0
citations
Rethinking Resolution in the Context of Efficient Video Recognition
NeurIPS 2022
0
citations
OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping
NeurIPS 2023
0
citations
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
NeurIPS 2023
0
citations
Foundation Model is Efficient Multimodal Multitask Model Selector
NeurIPS 2023
0
citations
Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection
NeurIPS 2023
0
citations
Top-Ambiguity Samples Matter: Understanding Why Deep Ensemble Works in Selective Classification
NeurIPS 2023
0
citations
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
NeurIPS 2023
0
citations
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
NeurIPS 2023
0
citations
Learning Deep Architectures via Generalized Whitened Neural Networks
ICML 2017
0
citations
Differentiable Dynamic Normalization for Learning Deep Representation
ICML 2019
0
citations