Wanli Ouyang

155
Papers
1,488
Total Citations

Papers (155)

WorldSimBench: Towards Video Generation Models as World Simulators

ICML 2025
806
citations

Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation

ECCV 2020
138
citations

Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction

NeurIPS 2017arXiv
132
citations

Improving Video Generation with Human Feedback

NeurIPS 2025
106
citations

CRF-CNN: Modeling Structured Information in Human Pose Estimation

NeurIPS 2016arXiv
79
citations

Point Cloud Pre-training with Diffusion Models

CVPR 2024
59
citations

HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction

ICLR 2025
34
citations

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

CVPR 2025arXiv
30
citations

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

AAAI 2024arXiv
25
citations

ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems

CVPR 2025
15
citations

TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation

CVPR 2024
14
citations

Semi-supervised 3D Object Detection with PatchTeacher and PillarMix

AAAI 2024arXiv
9
citations

WeatherGFM: Learning a Weather Generalist Foundation Model via In-context Learning

ICLR 2025
9
citations

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

NeurIPS 2025
7
citations

PostCast: Generalizable Postprocessing for Precipitation Nowcasting via Unsupervised Blurriness Modeling

ICLR 2025arXiv
7
citations

Boosting Residual Networks with Group Knowledge

AAAI 2024arXiv
6
citations

MOOSE-Chem2: Exploring LLM Limits in Fine-Grained Scientific Hypothesis Discovery via Hierarchical Search

NeurIPS 2025
3
citations

scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration

NeurIPS 2025
2
citations

Multi-Modal Latent Variables for Cross-Individual Primary Visual Cortex Modeling and Analysis

AAAI 2025
2
citations

SynBrain: Enhancing Visual-to-fMRI Synthesis via Probabilistic Representation Learning

NeurIPS 2025
2
citations

CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation

ICCV 2025
2
citations

GigaGS: 3D Gaussian Based Planar Representation for Large-Scene Surface Reconstruction

AAAI 2025
1
citations

STCT: Sequentially Training Convolutional Networks for Visual Tracking

CVPR 2016
0
citations

End-To-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation

CVPR 2016
0
citations

Structured Feature Learning for Pose Estimation

CVPR 2016
0
citations

Object Detection in Videos With Tubelet Proposal Networks

CVPR 2017arXiv
0
citations

ViP-CNN: Visual Phrase Guided Convolutional Neural Network

CVPR 2017
0
citations

Multi-Context Attention for Human Pose Estimation

CVPR 2017arXiv
0
citations

Multi-Scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation

CVPR 2017arXiv
0
citations

Learning Cross-Modal Deep Representations for Robust Pedestrian Detection

CVPR 2017arXiv
0
citations

Learning Spatial Regularization With Image-Level Supervisions for Multi-Label Image Classification

CVPR 2017arXiv
0
citations

Quality Aware Network for Set to Set Recognition

CVPR 2017arXiv
0
citations

Style Aggregated Network for Facial Landmark Detection

CVPR 2018arXiv
0
citations

PAD-Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing

CVPR 2018arXiv
0
citations

Mask-Guided Contrastive Attention Model for Person Re-Identification

CVPR 2018
0
citations

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

CVPR 2018arXiv
0
citations

Attention-Aware Compositional Network for Person Re-Identification

CVPR 2018arXiv
0
citations

Collaborative and Adversarial Network for Unsupervised Domain Adaptation

CVPR 2018
0
citations

Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification by Stepwise Learning

CVPR 2018
0
citations

3D Human Pose Estimation in the Wild by Adversarial Learning

CVPR 2018arXiv
0
citations

Visual Question Generation as Dual Task of Visual Question Answering

CVPR 2018arXiv
0
citations

Libra R-CNN: Towards Balanced Learning for Object Detection

CVPR 2019
0
citations

GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving

CVPR 2019
0
citations

Box-Driven Class-Wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation

CVPR 2019
0
citations

Hybrid Task Cascade for Instance Segmentation

CVPR 2019
0
citations

Multi-Person Articulated Tracking With Spatial and Temporal Embeddings

CVPR 2019
0
citations

DVC: An End-To-End Deep Video Compression Framework

CVPR 2019
0
citations

Improving Action Localization by Progressive Cross-Stream Cooperation

CVPR 2019
0
citations

SR-LSTM: State Refinement for LSTM Towards Pedestrian Trajectory Prediction

CVPR 2019
0
citations

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

CVPR 2020arXiv
0
citations

Multi-Dimensional Pruning: A Unified Framework for Model Compression

CVPR 2020
0
citations

3D Human Mesh Regression With Dense Correspondence

CVPR 2020arXiv
0
citations

EcoNAS: Finding Proxies for Economical Neural Architecture Search

CVPR 2020arXiv
0
citations

Improving One-Shot NAS by Suppressing the Posterior Fading

CVPR 2020arXiv
0
citations

Equalization Loss for Long-Tailed Object Recognition

CVPR 2020arXiv
0
citations

Mutual CRF-GNN for Few-Shot Learning

CVPR 2021
0
citations

Inception Convolution With Efficient Dilation Search

CVPR 2021arXiv
0
citations

Layerwise Optimization by Gradient Decomposition for Continual Learning

CVPR 2021arXiv
0
citations

Delving Into Localization Errors for Monocular 3D Object Detection

CVPR 2021arXiv
0
citations

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

CVPR 2021arXiv
0
citations

Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation

CVPR 2022arXiv
0
citations

Accelerating Neural Network Optimization Through an Automated Control Theory Lens

CVPR 2022
0
citations

Unsupervised Learning of Accurate Siamese Tracking

CVPR 2022arXiv
0
citations

DR.VIC: Decomposition and Reasoning for Video Individual Counting

CVPR 2022
0
citations

Not All Tokens Are Equal: Human-Centric Visual Analysis via Token Clustering Transformer

CVPR 2022arXiv
0
citations

Revisiting the Transferability of Supervised Pretraining: An MLP Perspective

CVPR 2022arXiv
0
citations

b-DARTS: Beta-Decay Regularization for Differentiable Architecture Search

CVPR 2022
0
citations

GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds

CVPR 2023
0
citations

PVT-SSD: Single-Stage 3D Object Detector With Point-Voxel Transformer

CVPR 2023
0
citations

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models

CVPR 2023arXiv
0
citations

Open-Set Fine-Grained Retrieval via Prompting Vision-Language Evaluator

CVPR 2023
0
citations

UniHCP: A Unified Model for Human-Centric Perceptions

CVPR 2023arXiv
0
citations

Learning Multi-Modal Class-Specific Tokens for Weakly Supervised Dense Object Localization

CVPR 2023
0
citations

Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection

CVPR 2023
0
citations

HumanBench: Towards General Human-Centric Perception With Projector Assisted Pretraining

CVPR 2023arXiv
0
citations

MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling With Informative-Preserved Reconstruction and Self-Distilled Consistency

CVPR 2023
0
citations

Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

CVPR 2023arXiv
0
citations

Crossing the Gap: Domain Generalization for Image Captioning

CVPR 2023
0
citations

Learning Deep Representation With Large-Scale Attributes

ICCV 2015
0
citations

Visual Tracking With Fully Convolutional Networks

ICCV 2015
0
citations

Multi-Task Recurrent Neural Network for Immediacy Prediction

ICCV 2015
0
citations

Scene Graph Generation From Objects, Phrases and Region Captions

ICCV 2017arXiv
0
citations

Learning Feature Pyramids for Human Pose Estimation

ICCV 2017arXiv
0
citations

Chained Cascade Network for Object Detection

ICCV 2017
0
citations

Online Multi-Object Tracking Using CNN-Based Single Object Tracker With Spatial-Temporal Attention Mechanism

ICCV 2017arXiv
0
citations

Crowd Counting With Deep Structured Scale Integration Network

ICCV 2019
0
citations

LAP-Net: Level-Aware Progressive Network for Image Dehazing

ICCV 2019
0
citations

Structured Modeling of Joint Deep Feature and Prediction Refinement for Salient Object Detection

ICCV 2019
0
citations

Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM

ICCV 2019
0
citations

GradNet: Gradient-Guided Network for Visual Object Tracking

ICCV 2019
0
citations

Online Hyper-Parameter Learning for Auto-Augmentation Strategy

ICCV 2019
0
citations

Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving

ICCV 2019
0
citations

AM-LFS: AutoML for Loss Function Search

ICCV 2019
0
citations

TRB: A Novel Triplet Representation for Understanding 2D Human Body

ICCV 2019
0
citations

GLiT: Neural Architecture Search for Global and Local Image Transformer

ICCV 2021arXiv
0
citations

BN-NAS: Neural Architecture Search With Batch Normalization

ICCV 2021
0
citations

Leveraging Auxiliary Tasks With Affinity Learning for Weakly Supervised Semantic Segmentation

ICCV 2021arXiv
0
citations

Geometry Uncertainty Projection Network for Monocular 3D Object Detection

ICCV 2021arXiv
0
citations

Evolving Search Space for Neural Architecture Search

ICCV 2021arXiv
0
citations

Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images

ICCV 2021arXiv
0
citations

PyMAF: 3D Human Pose and Shape Regression With Pyramidal Mesh Alignment Feedback Loop

ICCV 2021arXiv
0
citations

Once Quantization-Aware Training: High Performance Extremely Low-Bit Architecture Search

ICCV 2021arXiv
0
citations

Ponder: Point Cloud Pre-training via Neural Rendering

ICCV 2023arXiv
0
citations

CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-Training

ICCV 2023arXiv
0
citations

NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space

ICCV 2023
0
citations

STEERER: Resolving Scale Variations for Counting and Localization via Selective Inheritance Learning

ICCV 2023arXiv
0
citations

Masked Motion Predictors are Strong 3D Action Representation Learners

ICCV 2023arXiv
0
citations

Semi-Supervised Semantic Segmentation under Label Noise via Diverse Learning Groups

ICCV 2023
0
citations

What Can Simple Arithmetic Operations Do for Temporal Modeling?

ICCV 2023arXiv
0
citations

Towards Fair and Comprehensive Comparisons for Image-Based 3D Object Detection

ICCV 2023
0
citations

Improving Deep Video Compression by Resolution-adaptive Flow Coding

ECCV 2020
0
citations

Content Adaptive and Error Propagation Aware Deep Video Compression

ECCV 2020
0
citations

Cheaper Pre-training Lunch: An Efficient Paradigm for Object Detection

ECCV 2020
0
citations

Whole-Body Human Pose Estimation in the Wild

ECCV 2020
0
citations

Rethinking Pseudo-LiDAR Representation

ECCV 2020
0
citations

3D Interacting Hand Pose Estimation by Hand De-Occlusion and Removal

ECCV 2022
0
citations

Pose for Everything: Towards Category-Agnostic Pose Estimation

ECCV 2022
0
citations

Backbone Is All Your Need: A Simplified Architecture for Visual Object Tracking

ECCV 2022
0
citations

Fast-MoCo: Boost Momentum-Based Contrastive Learning with Combinatorial Patches

ECCV 2022
0
citations

Unifying Visual Contrastive Learning for Object Recognition from a Graph Perspective

ECCV 2022
0
citations

Relative Contrastive Loss for Unsupervised Representation Learning

ECCV 2022
0
citations

Domain Invariant Masked Autoencoders for Self-Supervised Learning from Multi-Domains

ECCV 2022
0
citations

NSNet: Non-Saliency Suppression Sampler for Efficient Video Recognition

ECCV 2022
0
citations

Aggregation With Feature Detection

ICCV 2021
0
citations

Neuro-3D: Towards 3D Visual Decoding from EEG Signals

CVPR 2025
0
citations

Satellite Observations Guided Diffusion Model for Accurate Meteorological States at Arbitrary Resolution

CVPR 2025
0
citations

UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines

CVPR 2025
0
citations

TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction

ICCV 2025
0
citations

SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling

ICCV 2025
0
citations

EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds

ICCV 2025
0
citations

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

AAAI 2025
0
citations

Frozen CLIP Transformer Is an Efficient Point Cloud Encoder

AAAI 2024
0
citations

ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing

AAAI 2024
0
citations

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

CVPR 2024
0
citations

Point Transformer V3: Simpler Faster Stronger

CVPR 2024
0
citations

Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions

CVPR 2024
0
citations

Taming Stable Diffusion for Text to 360 Panorama Image Generation

CVPR 2024
0
citations

CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling

ICML 2024
0
citations

FiT: Flexible Vision Transformer for Diffusion Model

ICML 2024
0
citations

Towards a Self-contained Data-driven Global Weather Forecasting Framework

ICML 2024
0
citations

Saliency Detection by Multi-Context Deep Learning

CVPR 2015
0
citations

DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection

CVPR 2015
0
citations

Object Detection From Video Tubelets With Convolutional Neural Networks

CVPR 2016
0
citations

Factors in Finetuning Deep Model for Object Detection With Long-Tail Distribution

CVPR 2016
0
citations

Learning Deep Feature Representations With Domain Guided Dropout for Person Re-Identification

CVPR 2016
0
citations

FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction

NeurIPS 2018
0
citations

Improving Auto-Augment via Augmentation-Wise Weight Sharing

NeurIPS 2020
0
citations

A Continuous Mapping For Augmentation Design

NeurIPS 2021
0
citations

Stimulative Training of Residual Networks: A Social Psychology Perspective of Loafing

NeurIPS 2022
0
citations

Unsupervised Object Detection Pretraining with Joint Object Priors Generation and Detector Learning

NeurIPS 2022
0
citations

Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images

NeurIPS 2023
0
citations

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

NeurIPS 2023
0
citations

CluB: Cluster Meets BEV for LiDAR-Based 3D Object Detection

NeurIPS 2023
0
citations

Learning to Parameterize Visual Attributes for Open-set Fine-grained Retrieval

NeurIPS 2023
0
citations

Multi-Bias Non-linear Activation in Deep Neural Networks

ICML 2016arXiv
0
citations