Wanli Ouyang
155
Papers
1,488
Total Citations
Papers (155)
WorldSimBench: Towards Video Generation Models as World Simulators
ICML 2025
806
citations
Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation
ECCV 2020
138
citations
Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction
NeurIPS 2017arXiv
132
citations
Improving Video Generation with Human Feedback
NeurIPS 2025
106
citations
CRF-CNN: Modeling Structured Information in Human Pose Estimation
NeurIPS 2016arXiv
79
citations
Point Cloud Pre-training with Diffusion Models
CVPR 2024
59
citations
HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction
ICLR 2025
34
citations
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
CVPR 2025arXiv
30
citations
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
AAAI 2024arXiv
25
citations
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems
CVPR 2025
15
citations
TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation
CVPR 2024
14
citations
Semi-supervised 3D Object Detection with PatchTeacher and PillarMix
AAAI 2024arXiv
9
citations
WeatherGFM: Learning a Weather Generalist Foundation Model via In-context Learning
ICLR 2025
9
citations
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
NeurIPS 2025
7
citations
PostCast: Generalizable Postprocessing for Precipitation Nowcasting via Unsupervised Blurriness Modeling
ICLR 2025arXiv
7
citations
Boosting Residual Networks with Group Knowledge
AAAI 2024arXiv
6
citations
MOOSE-Chem2: Exploring LLM Limits in Fine-Grained Scientific Hypothesis Discovery via Hierarchical Search
NeurIPS 2025
3
citations
scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration
NeurIPS 2025
2
citations
Multi-Modal Latent Variables for Cross-Individual Primary Visual Cortex Modeling and Analysis
AAAI 2025
2
citations
SynBrain: Enhancing Visual-to-fMRI Synthesis via Probabilistic Representation Learning
NeurIPS 2025
2
citations
CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation
ICCV 2025
2
citations
GigaGS: 3D Gaussian Based Planar Representation for Large-Scene Surface Reconstruction
AAAI 2025
1
citations
STCT: Sequentially Training Convolutional Networks for Visual Tracking
CVPR 2016
0
citations
End-To-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation
CVPR 2016
0
citations
Structured Feature Learning for Pose Estimation
CVPR 2016
0
citations
Object Detection in Videos With Tubelet Proposal Networks
CVPR 2017arXiv
0
citations
ViP-CNN: Visual Phrase Guided Convolutional Neural Network
CVPR 2017
0
citations
Multi-Context Attention for Human Pose Estimation
CVPR 2017arXiv
0
citations
Multi-Scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation
CVPR 2017arXiv
0
citations
Learning Cross-Modal Deep Representations for Robust Pedestrian Detection
CVPR 2017arXiv
0
citations
Learning Spatial Regularization With Image-Level Supervisions for Multi-Label Image Classification
CVPR 2017arXiv
0
citations
Quality Aware Network for Set to Set Recognition
CVPR 2017arXiv
0
citations
Style Aggregated Network for Facial Landmark Detection
CVPR 2018arXiv
0
citations
PAD-Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing
CVPR 2018arXiv
0
citations
Mask-Guided Contrastive Attention Model for Person Re-Identification
CVPR 2018
0
citations
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition
CVPR 2018arXiv
0
citations
Attention-Aware Compositional Network for Person Re-Identification
CVPR 2018arXiv
0
citations
Collaborative and Adversarial Network for Unsupervised Domain Adaptation
CVPR 2018
0
citations
Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification by Stepwise Learning
CVPR 2018
0
citations
3D Human Pose Estimation in the Wild by Adversarial Learning
CVPR 2018arXiv
0
citations
Visual Question Generation as Dual Task of Visual Question Answering
CVPR 2018arXiv
0
citations
Libra R-CNN: Towards Balanced Learning for Object Detection
CVPR 2019
0
citations
GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving
CVPR 2019
0
citations
Box-Driven Class-Wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation
CVPR 2019
0
citations
Hybrid Task Cascade for Instance Segmentation
CVPR 2019
0
citations
Multi-Person Articulated Tracking With Spatial and Temporal Embeddings
CVPR 2019
0
citations
DVC: An End-To-End Deep Video Compression Framework
CVPR 2019
0
citations
Improving Action Localization by Progressive Cross-Stream Cooperation
CVPR 2019
0
citations
SR-LSTM: State Refinement for LSTM Towards Pedestrian Trajectory Prediction
CVPR 2019
0
citations
Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition
CVPR 2020arXiv
0
citations
Multi-Dimensional Pruning: A Unified Framework for Model Compression
CVPR 2020
0
citations
3D Human Mesh Regression With Dense Correspondence
CVPR 2020arXiv
0
citations
EcoNAS: Finding Proxies for Economical Neural Architecture Search
CVPR 2020arXiv
0
citations
Improving One-Shot NAS by Suppressing the Posterior Fading
CVPR 2020arXiv
0
citations
Equalization Loss for Long-Tailed Object Recognition
CVPR 2020arXiv
0
citations
Mutual CRF-GNN for Few-Shot Learning
CVPR 2021
0
citations
Inception Convolution With Efficient Dilation Search
CVPR 2021arXiv
0
citations
Layerwise Optimization by Gradient Decomposition for Continual Learning
CVPR 2021arXiv
0
citations
Delving Into Localization Errors for Monocular 3D Object Detection
CVPR 2021arXiv
0
citations
ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search
CVPR 2021arXiv
0
citations
Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation
CVPR 2022arXiv
0
citations
Accelerating Neural Network Optimization Through an Automated Control Theory Lens
CVPR 2022
0
citations
Unsupervised Learning of Accurate Siamese Tracking
CVPR 2022arXiv
0
citations
DR.VIC: Decomposition and Reasoning for Video Individual Counting
CVPR 2022
0
citations
Not All Tokens Are Equal: Human-Centric Visual Analysis via Token Clustering Transformer
CVPR 2022arXiv
0
citations
Revisiting the Transferability of Supervised Pretraining: An MLP Perspective
CVPR 2022arXiv
0
citations
b-DARTS: Beta-Decay Regularization for Differentiable Architecture Search
CVPR 2022
0
citations
GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds
CVPR 2023
0
citations
PVT-SSD: Single-Stage 3D Object Detector With Point-Voxel Transformer
CVPR 2023
0
citations
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models
CVPR 2023arXiv
0
citations
Open-Set Fine-Grained Retrieval via Prompting Vision-Language Evaluator
CVPR 2023
0
citations
UniHCP: A Unified Model for Human-Centric Perceptions
CVPR 2023arXiv
0
citations
Learning Multi-Modal Class-Specific Tokens for Weakly Supervised Dense Object Localization
CVPR 2023
0
citations
Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection
CVPR 2023
0
citations
HumanBench: Towards General Human-Centric Perception With Projector Assisted Pretraining
CVPR 2023arXiv
0
citations
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling With Informative-Preserved Reconstruction and Self-Distilled Consistency
CVPR 2023
0
citations
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
CVPR 2023arXiv
0
citations
Crossing the Gap: Domain Generalization for Image Captioning
CVPR 2023
0
citations
Learning Deep Representation With Large-Scale Attributes
ICCV 2015
0
citations
Visual Tracking With Fully Convolutional Networks
ICCV 2015
0
citations
Multi-Task Recurrent Neural Network for Immediacy Prediction
ICCV 2015
0
citations
Scene Graph Generation From Objects, Phrases and Region Captions
ICCV 2017arXiv
0
citations
Learning Feature Pyramids for Human Pose Estimation
ICCV 2017arXiv
0
citations
Chained Cascade Network for Object Detection
ICCV 2017
0
citations
Online Multi-Object Tracking Using CNN-Based Single Object Tracker With Spatial-Temporal Attention Mechanism
ICCV 2017arXiv
0
citations
Crowd Counting With Deep Structured Scale Integration Network
ICCV 2019
0
citations
LAP-Net: Level-Aware Progressive Network for Image Dehazing
ICCV 2019
0
citations
Structured Modeling of Joint Deep Feature and Prediction Refinement for Salient Object Detection
ICCV 2019
0
citations
Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM
ICCV 2019
0
citations
GradNet: Gradient-Guided Network for Visual Object Tracking
ICCV 2019
0
citations
Online Hyper-Parameter Learning for Auto-Augmentation Strategy
ICCV 2019
0
citations
Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving
ICCV 2019
0
citations
AM-LFS: AutoML for Loss Function Search
ICCV 2019
0
citations
TRB: A Novel Triplet Representation for Understanding 2D Human Body
ICCV 2019
0
citations
GLiT: Neural Architecture Search for Global and Local Image Transformer
ICCV 2021arXiv
0
citations
BN-NAS: Neural Architecture Search With Batch Normalization
ICCV 2021
0
citations
Leveraging Auxiliary Tasks With Affinity Learning for Weakly Supervised Semantic Segmentation
ICCV 2021arXiv
0
citations
Geometry Uncertainty Projection Network for Monocular 3D Object Detection
ICCV 2021arXiv
0
citations
Evolving Search Space for Neural Architecture Search
ICCV 2021arXiv
0
citations
Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
ICCV 2021arXiv
0
citations
PyMAF: 3D Human Pose and Shape Regression With Pyramidal Mesh Alignment Feedback Loop
ICCV 2021arXiv
0
citations
Once Quantization-Aware Training: High Performance Extremely Low-Bit Architecture Search
ICCV 2021arXiv
0
citations
Ponder: Point Cloud Pre-training via Neural Rendering
ICCV 2023arXiv
0
citations
CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-Training
ICCV 2023arXiv
0
citations
NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space
ICCV 2023
0
citations
STEERER: Resolving Scale Variations for Counting and Localization via Selective Inheritance Learning
ICCV 2023arXiv
0
citations
Masked Motion Predictors are Strong 3D Action Representation Learners
ICCV 2023arXiv
0
citations
Semi-Supervised Semantic Segmentation under Label Noise via Diverse Learning Groups
ICCV 2023
0
citations
What Can Simple Arithmetic Operations Do for Temporal Modeling?
ICCV 2023arXiv
0
citations
Towards Fair and Comprehensive Comparisons for Image-Based 3D Object Detection
ICCV 2023
0
citations
Improving Deep Video Compression by Resolution-adaptive Flow Coding
ECCV 2020
0
citations
Content Adaptive and Error Propagation Aware Deep Video Compression
ECCV 2020
0
citations
Cheaper Pre-training Lunch: An Efficient Paradigm for Object Detection
ECCV 2020
0
citations
Whole-Body Human Pose Estimation in the Wild
ECCV 2020
0
citations
Rethinking Pseudo-LiDAR Representation
ECCV 2020
0
citations
3D Interacting Hand Pose Estimation by Hand De-Occlusion and Removal
ECCV 2022
0
citations
Pose for Everything: Towards Category-Agnostic Pose Estimation
ECCV 2022
0
citations
Backbone Is All Your Need: A Simplified Architecture for Visual Object Tracking
ECCV 2022
0
citations
Fast-MoCo: Boost Momentum-Based Contrastive Learning with Combinatorial Patches
ECCV 2022
0
citations
Unifying Visual Contrastive Learning for Object Recognition from a Graph Perspective
ECCV 2022
0
citations
Relative Contrastive Loss for Unsupervised Representation Learning
ECCV 2022
0
citations
Domain Invariant Masked Autoencoders for Self-Supervised Learning from Multi-Domains
ECCV 2022
0
citations
NSNet: Non-Saliency Suppression Sampler for Efficient Video Recognition
ECCV 2022
0
citations
Aggregation With Feature Detection
ICCV 2021
0
citations
Neuro-3D: Towards 3D Visual Decoding from EEG Signals
CVPR 2025
0
citations
Satellite Observations Guided Diffusion Model for Accurate Meteorological States at Arbitrary Resolution
CVPR 2025
0
citations
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
CVPR 2025
0
citations
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
ICCV 2025
0
citations
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
ICCV 2025
0
citations
EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds
ICCV 2025
0
citations
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
AAAI 2025
0
citations
Frozen CLIP Transformer Is an Efficient Point Cloud Encoder
AAAI 2024
0
citations
ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing
AAAI 2024
0
citations
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
CVPR 2024
0
citations
Point Transformer V3: Simpler Faster Stronger
CVPR 2024
0
citations
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
CVPR 2024
0
citations
Taming Stable Diffusion for Text to 360 Panorama Image Generation
CVPR 2024
0
citations
CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling
ICML 2024
0
citations
FiT: Flexible Vision Transformer for Diffusion Model
ICML 2024
0
citations
Towards a Self-contained Data-driven Global Weather Forecasting Framework
ICML 2024
0
citations
Saliency Detection by Multi-Context Deep Learning
CVPR 2015
0
citations
DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection
CVPR 2015
0
citations
Object Detection From Video Tubelets With Convolutional Neural Networks
CVPR 2016
0
citations
Factors in Finetuning Deep Model for Object Detection With Long-Tail Distribution
CVPR 2016
0
citations
Learning Deep Feature Representations With Domain Guided Dropout for Person Re-Identification
CVPR 2016
0
citations
FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction
NeurIPS 2018
0
citations
Improving Auto-Augment via Augmentation-Wise Weight Sharing
NeurIPS 2020
0
citations
A Continuous Mapping For Augmentation Design
NeurIPS 2021
0
citations
Stimulative Training of Residual Networks: A Social Psychology Perspective of Loafing
NeurIPS 2022
0
citations
Unsupervised Object Detection Pretraining with Joint Object Priors Generation and Detector Learning
NeurIPS 2022
0
citations
Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images
NeurIPS 2023
0
citations
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
NeurIPS 2023
0
citations
CluB: Cluster Meets BEV for LiDAR-Based 3D Object Detection
NeurIPS 2023
0
citations
Learning to Parameterize Visual Attributes for Open-set Fine-grained Retrieval
NeurIPS 2023
0
citations
Multi-Bias Non-linear Activation in Deep Neural Networks
ICML 2016arXiv
0
citations