🧬Vision Recognition

Pose Estimation

Estimating human body poses

100 papers3,811 total citations

Compare with other topics

Feb '24 — Jan '26746 papers

Top Conferences

CVPR: 61 ECCV: 22 AAAI: 8 ICLR: 6 ICCV: 3

Top Papers

#1

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Bowen Wen, Wei Yang, Jan Kautz et al.

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Shenhao Zhu, Junming Chen, Zuozhuo Dai et al.

Sapiens: Foundation for Human Vision Models

Rawal Khirodkar, Timur Bagautdinov, Julieta Martinez et al.

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Peng Wang, Hao Tan, Sai Bi et al.

GART: Gaussian Articulated Template Models

Jiahui Lei, Yufu Wang, Georgios Pavlakos et al.

Human Gaussian Splatting: Real-time Rendering of Animatable Avatars

Arthur Moreau, Jifei Song, Helisa Dhamo et al.

GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence

Van Nguyen Nguyen, Thibault Groueix, Mathieu Salzmann et al.

NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis

Nilesh Kulkarni, Davis Rempe, Kyle Genova et al.

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Evonne Ng, Javier Romero, Timur Bagautdinov et al.

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Ye Yuan, Xueting Li, Yangyi Huang et al.

TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

Yufu Wang, Ziyun Wang, Lingjie Liu et al.

ECCV 2024arXiv:2403.17346

human motion reconstructionglobal trajectory estimationslam robustificationvideo transformer model+4

66

citations

#12

StableAnimator: High-Quality Identity-Preserving Human Image Animation

Shuyuan Tu, Zhen Xing, Xintong Han et al.

Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot

Fabien Baradel, Thomas Lucas, Matthieu Armando et al.

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

Yanzuo Lu, Manlin Zhang, Jinhua Ma et al.

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

Peng Lu, Tao Jiang, Yining Li et al.

SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation

Yamei Chen, Yan Di, Guangyao Zhai et al.

GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh

Jing Wen, Xiaoming Zhao, Jason Ren et al.

MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures

Zhangyang Xiong, Chenghong Li, Kenkun Liu et al.

ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions

Anindita Ghosh, Rishabh Dabral, Vladislav Golyanik et al.

ECCV 2024arXiv:2311.17057

3d motion synthesishuman motion synthesisdenoising diffusion modelstwo-person interactions+4

51

citations

#20

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Rang Meng, Xingyu Zhang, Yuming Li et al.

CoMo: Controllable Motion Generation through Language Guided Pose Code Editing

Yiming Huang, WEILIN WAN, Yue Yang et al.

ECCV 2024arXiv:2403.13900

text-to-motion generationcontrollable motion editingdiscrete pose codeslarge language models+4

48

citations

#22

GPAvatar: Generalizable and Precise Head Avatar from Image(s)

Xuangeng Chu, Yu Li, Ailing Zeng et al.

FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models

Andrea Caraffa, Davide Boscaini, Amir Hamza et al.

4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations

Wenbo Wang, Hsuan-I Ho, Chen Guo et al.

Universal Actions for Enhanced Embodied Foundation Models

Jinliang Zheng, Jianxiong Li, Dongxiu Liu et al.

NOPE: Novel Object Pose Estimation from a Single Image

Van Nguyen Nguyen, Thibault Groueix, Georgy Ponimatkin et al.

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

Qingping SUN, Yanjun Wang, Ailing Zeng et al.

FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models

Jinglin Xu, Yijie Guo, Yuxin Peng

6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

Matteo Bortolon, Theodoros Tsesmelis, Stuart James et al.

ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions

Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na et al.

CVPR 2025arXiv:2401.10232

human-object interaction3d generative modelingmotion capturedexterous hand manipulation+4

34

citations

#31

MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos

Yushuo Chen, Zerong Zheng, Zhe Li et al.

I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions

Chengfeng Zhao, Juze Zhang, Jiashen Du et al.

Unifying Correspondence Pose and NeRF for Generalized Pose-Free Novel View Synthesis

Sunghwan Hong, Jaewoo Jung, Heeseong Shin et al.

WHAC: World-grounded Humans and Cameras

Wanqi Yin, Zhongang Cai, Chen Wei et al.

Progressive Pretext Task Learning for Human Trajectory Prediction

Xiaotong Lin, Tianming Liang, Jian-Huang Lai et al.

Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

Vanessa Sklyarova, Egor Zakharov, Otmar Hilliges et al.

Estimating Body and Hand Motion in an Ego‑sensed World

Brent Yi, Vickie Ye, Maya Zheng et al.

Navigating Open Set Scenarios for Skeleton-Based Action Recognition

Kunyu Peng, Cheng Yin, Junwei Zheng et al.

AAAI 2024arXiv:2312.06330

skeleton-based action recognitionopen set recognitioncross-modality alignmentdistance-based classification+3

26

citations

#39

Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

Xinshun Wang, Zhongbin Fang, Xia Li et al.

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

Yun Liu, Chengwen Zhang, Ruofan Xing et al.

Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation

Xianghui Xie, Bharat Lal Bhatnagar, Jan Lenssen et al.

MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models

Yasiru Ranasinghe, Deepti Hegde, Vishal M. Patel

Garment Recovery with Shape and Deformation Priors

Ren Li, Corentin Dumery, Benoît Guillard et al.

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Hongxiang Li, Yaowei Li, Yuhang Yang et al.

MANUS: Markerless Grasp Capture using Articulated 3D Gaussians

Chandradeep Pokhariya, Ishaan Shah, Angela Xing et al.

HybridGait: A Benchmark for Spatial-Temporal Cloth-Changing Gait Recognition with Hybrid Explorations

Yilan Dong, Chunlin Yu, Ruiyang Ha et al.

AAAI 2024arXiv:2401.00271

gait recognitioncloth-changing recognitionspatial-temporal modeling3d human meshes+4

23

citations

#47

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

Gyeongjin Kang, Jisang Yoo, Jihyeon Park et al.

6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation

Li Xu, Haoxuan Qu, Yujun Cai et al.

LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment

yiming ren, xiao han, Chengfeng Zhao et al.

Deep SE(3)-Equivariant Geometric Reasoning for Precise Placement Tasks

Ben Eisner, Yi Yang, Todor Davchev et al.

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

Yuxuan Luo, Zhengkun Rong, Lizhen Wang et al.

MonoHair: High-Fidelity Hair Modeling from a Monocular Video

Keyu Wu, LINGCHEN YANG, Zhiyi Kuang et al.

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

Peng Dai, Yang Zhang, Tao Liu et al.

NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors

Yannan He, Garvita Tiwari, Tolga Birdal et al.

PORF: POSE RESIDUAL FIELD FOR ACCURATE NEURAL SURFACE RECONSTRUCTION

Jia-Wang Bian, Wenjing Bian, Victor Prisacariu et al.

Towards Open Domain Text-Driven Synthesis of Multi-Person Motions

Shan Mengyi, Lu Dong, Yutao Han et al.

Loose Inertial Poser: Motion Capture with IMU-attached Loose-Wear Jacket

Chengxu Zuo, Yiming Wang, Lishuang Zhan et al.

NICP: Neural ICP for 3D Human Registration at Scale

Riccardo Marin, Enric Corona, Gerard Pons-Moll

ECCV 2024arXiv:2312.14024

3d human registrationneural fieldspoint cloud alignmenttemplate registration+4

19

citations

#59

The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement

Gabriele Trivigno, Carlo Masone, Barbara Caputo et al.

HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields

Haozhe Qi, Chen Zhao, Mathieu Salzmann et al.

MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls

Yuxuan Bian, Ailing Zeng, Xuan Ju et al.

iHuman: Instant Animatable Digital Humans From Monocular Videos

Pramish Paudel, Anubhav Khanal, Danda Pani Paudel et al.

PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos

Yufei Zhang, Jeffrey Kephart, Zijun Cui et al.

A Simple and Effective Point-based Network for Event Camera 6-DOFs Pose Relocalization

Hongwei Ren, Jiadong Zhu, Yue Zhou et al.

HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation

Yongliang Lin, Yongzhi Su, Praveen Nathan et al.

FAR: Flexible Accurate and Robust 6DoF Relative Camera Pose Estimation

Chris Rockwell, Nilesh Kulkarni, Linyi Jin et al.

Lifting by Image – Leveraging Image Cues for Accurate 3D Human Pose Estimation

Feng Zhou, Jianqin Yin, Peiyang Li

AAAI 2024arXiv:2312.15636

3d human pose estimationdepth ambiguity problemimage feature selectionpose-guided transformer+4

18

citations

#68

FusionFormer: A Concise Unified Feature Fusion Transformer for 3D Pose Estimation

Yanlu Cai, Weizhong Zhang, Yuan Wu et al.

AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

Yukang Cao, Liang Pan, Kai Han et al.

Relightable and Animatable Neural Avatars from Videos

Wenbin Lin, Chengwei Zheng, Jun-hai Yong et al.

AAAI 2024arXiv:2312.12877

neural avatarsinverse skinning probleminvertible deformation fieldrelightable 3d avatars+4

18

citations

#71

HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud

WENCAN CHENG, Hao Tang, Luc Van Gool et al.

HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images

Xihe Yang, Xingyu Chen, Daiheng Gao et al.

HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos

Jinglei Zhang, Jiankang Deng, Chao Ma et al.

SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration

Kezheng Xiong, Maoji Zheng, Qingshan Xu et al.

AAAI 2024arXiv:2312.08664

point cloud registrationcross-source point cloudsskeletal representationsunsupervised skeleton extraction+4

17

citations

#75

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery

Yixuan Zhu, Ao Li, Yansong Tang et al.

Object Pose Estimation via the Aggregation of Diffusion Features

Tianfu Wang, Guosheng Hu, Hongguang Wang

ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis

Xiangjun Gao, Xiaoyu Li, Chaopeng Zhang et al.

Dynamic Camera Poses and Where to Find Them

Chris Rockwell, Joseph Tung, Tsung-Yi Lin et al.

CVPR 2025arXiv:2504.17788

camera pose estimationdynamic video analysisstructure-from-motionpoint tracking+4

15

citations

#79

DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans

Akash Sengupta, Thiemo Alldieck, NIKOS KOLOTOUROS et al.

Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation

Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang et al.

Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations

Shengeng Tang, Jiayi He, Lechao Cheng et al.

Multiple View Geometry Transformers for 3D Human Pose Estimation

Ziwei Liao, jialiang zhu, Chunyu Wang et al.

Real-Time Simulated Avatar from Head-Mounted Sensors

Zhengyi Luo, Jinkun Cao, Rawal Khirodkar et al.

HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models

Yifan Yang, Dong Liu, Shuhai Zhang et al.

Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos

Remy Sabathier, David Novotny, Niloy Mitra

LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation

Ruida Zhang, Ziqin Huang, Gu Wang et al.

HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

Guian Fang, Wenbiao Yan, Yuanfan Guo et al.

ECCV 2024arXiv:2407.06937

text-to-image diffusionhuman anomaly generationanatomical anomaly detectionpose-reversible guidance+3

14

citations

#88

Pippo: High-Resolution Multi-View Humans from a Single Image

Yash Kant, Ethan Weber, Jin Kyu Kim et al.

CVPR 2025arXiv:2502.07785

multi-view generationdiffusion transformersingle image reconstruction3d consistent generation+4

14

citations

#89

Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset

Yiqun Mei, Mingming He, Li Ma et al.

HUMOS: Human Motion Model Conditioned on Body Shape

Shashank Tripathi, Omid Taheri, Christoph Lassner et al.

EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head

Qianyun He, Xinya Ji, Yicheng Gong et al.

ECCV 2024arXiv:2408.00297

3d talking headfree-view synthesismulti-view consistencyemotional expressiveness+4

14

citations

#92

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing

Yiheng Li, RuiBing Hou, Hong Chang et al.

Exploring More from Multiple Gait Modalities for Human Identification

Dongyang Jin, Chao Fan, Weihua Chen et al.

UniHuman: A Unified Model For Editing Human Images in the Wild

Nannan Li, Qing Liu, Krishna Kumar Singh et al.

On the Utility of 3D Hand Poses for Action Recognition

Md Salman Shamil, Dibyadip Chatterjee, Fadime Sener et al.

X-Dyna: Expressive Dynamic Human Image Animation

Di Chang, Hongyi Xu, You Xie et al.

MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm

Ziyan Guo, Zeyu HU, Na Zhao et al.

ICCV 2025arXiv:2502.02358

human motion generationmotion editingrectified flowsmotion-condition-motion paradigm+4

12

citations

#98

DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses

Yatian Pang, Bin Zhu, Bin Lin et al.

ICCV 2025arXiv:2412.00397

human image animationdiffusion models3d geometry cuesskeleton pose sequences+3

12

citations

#99

Modeling and Driving Human Body Soundfields through Acoustic Primitives

Chao Huang, Dejan Markovic, Chenliang Xu et al.

RT-Pose: A 4D Radar-Tensor based 3D Human Pose Estimation and Localization Benchmark

Yuan-Hao Ho, Jen-Hao Cheng, Sheng Yao Kuan et al.

ECCV 2024

11

citations

Pose Estimation

Top Conferences

Related Topics (Vision Recognition)

Top Papers

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Sapiens: Foundation for Human Vision Models

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

GART: Gaussian Articulated Template Models

Human Gaussian Splatting: Real-time Rendering of Animatable Avatars

GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence

NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

StableAnimator: High-Quality Identity-Preserving Human Image Animation

Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation

GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh

MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures

ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

CoMo: Controllable Motion Generation through Language Guided Pose Code Editing

GPAvatar: Generalizable and Precise Head Avatar from Image(s)

FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models

4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations

Universal Actions for Enhanced Embodied Foundation Models

NOPE: Novel Object Pose Estimation from a Single Image

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models

6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions

MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos

I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions

Unifying Correspondence Pose and NeRF for Generalized Pose-Free Novel View Synthesis

WHAC: World-grounded Humans and Cameras

Progressive Pretext Task Learning for Human Trajectory Prediction

Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

Estimating Body and Hand Motion in an Ego‑sensed World

Navigating Open Set Scenarios for Skeleton-Based Action Recognition

Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation

MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models

Garment Recovery with Shape and Deformation Priors

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

MANUS: Markerless Grasp Capture using Articulated 3D Gaussians

HybridGait: A Benchmark for Spatial-Temporal Cloth-Changing Gait Recognition with Hybrid Explorations

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation

LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment

Deep SE(3)-Equivariant Geometric Reasoning for Precise Placement Tasks

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

MonoHair: High-Fidelity Hair Modeling from a Monocular Video

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors

PORF: POSE RESIDUAL FIELD FOR ACCURATE NEURAL SURFACE RECONSTRUCTION

Towards Open Domain Text-Driven Synthesis of Multi-Person Motions

Loose Inertial Poser: Motion Capture with IMU-attached Loose-Wear Jacket

NICP: Neural ICP for 3D Human Registration at Scale

The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement

HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields

MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls

iHuman: Instant Animatable Digital Humans From Monocular Videos

PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos

A Simple and Effective Point-based Network for Event Camera 6-DOFs Pose Relocalization

HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation

FAR: Flexible Accurate and Robust 6DoF Relative Camera Pose Estimation

Lifting by Image – Leveraging Image Cues for Accurate 3D Human Pose Estimation

FusionFormer: A Concise Unified Feature Fusion Transformer for 3D Pose Estimation

AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

Relightable and Animatable Neural Avatars from Videos

HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud

HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images

HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos

SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery

Object Pose Estimation via the Aggregation of Diffusion Features