🧬Vision Recognition

Pose Estimation

Estimating human body poses

100 papers3,813 total citations

Compare with other topics

Feb '24 — Jan '26745 papers

Top Conferences

CVPR: 61 ECCV: 22 AAAI: 8 ICLR: 6 ICCV: 3

Top Papers

#1

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Bowen Wen, Wei Yang, Jan Kautz et al.

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Shenhao Zhu, Junming Chen, Zuozhuo Dai et al.

Sapiens: Foundation for Human Vision Models

Rawal Khirodkar, Timur Bagautdinov, Julieta Martinez et al.

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Peng Wang, Hao Tan, Sai Bi et al.

GART: Gaussian Articulated Template Models

Jiahui Lei, Yufu Wang, Georgios Pavlakos et al.

Human Gaussian Splatting: Real-time Rendering of Animatable Avatars

Arthur Moreau, Jifei Song, Helisa Dhamo et al.

GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence

Van Nguyen Nguyen, Thibault Groueix, Mathieu Salzmann et al.

NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis

Nilesh Kulkarni, Davis Rempe, Kyle Genova et al.

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Evonne Ng, Javier Romero, Timur Bagautdinov et al.

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Ye Yuan, Xueting Li, Yangyi Huang et al.

TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

Yufu Wang, Ziyun Wang, Lingjie Liu et al.

ECCV 2024arXiv:2403.17346

human motion reconstructionglobal trajectory estimationslam robustificationvideo transformer model+4

66

citations

#12

StableAnimator: High-Quality Identity-Preserving Human Image Animation

Shuyuan Tu, Zhen Xing, Xintong Han et al.

Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot

Fabien Baradel, Thomas Lucas, Matthieu Armando et al.

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

Yanzuo Lu, Manlin Zhang, Jinhua Ma et al.

SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation

Yamei Chen, Yan Di, Guangyao Zhai et al.

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

Peng Lu, Tao Jiang, Yining Li et al.

GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh

Jing Wen, Xiaoming Zhao, Jason Ren et al.

MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures

Zhangyang Xiong, Chenghong Li, Kenkun Liu et al.

ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions

Anindita Ghosh, Rishabh Dabral, Vladislav Golyanik et al.

ECCV 2024arXiv:2311.17057

3d motion synthesishuman motion synthesisdenoising diffusion modelstwo-person interactions+4

51

citations

#20

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Rang Meng, Xingyu Zhang, Yuming Li et al.

CoMo: Controllable Motion Generation through Language Guided Pose Code Editing

Yiming Huang, WEILIN WAN, Yue Yang et al.

ECCV 2024arXiv:2403.13900

text-to-motion generationcontrollable motion editingdiscrete pose codeslarge language models+4

48

citations

#22

GPAvatar: Generalizable and Precise Head Avatar from Image(s)

Xuangeng Chu, Yu Li, Ailing Zeng et al.

FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models

Andrea Caraffa, Davide Boscaini, Amir Hamza et al.

4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations

Wenbo Wang, Hsuan-I Ho, Chen Guo et al.

Universal Actions for Enhanced Embodied Foundation Models

Jinliang Zheng, Jianxiong Li, Dongxiu Liu et al.

NOPE: Novel Object Pose Estimation from a Single Image

Van Nguyen Nguyen, Thibault Groueix, Georgy Ponimatkin et al.

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

Qingping SUN, Yanjun Wang, Ailing Zeng et al.

FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models

Jinglin Xu, Yijie Guo, Yuxin Peng

6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

Matteo Bortolon, Theodoros Tsesmelis, Stuart James et al.

ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions

Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na et al.

CVPR 2025arXiv:2401.10232

human-object interaction3d generative modelingmotion capturedexterous hand manipulation+4

34

citations

#31

I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions

Chengfeng Zhao, Juze Zhang, Jiashen Du et al.

Unifying Correspondence Pose and NeRF for Generalized Pose-Free Novel View Synthesis

Sunghwan Hong, Jaewoo Jung, Heeseong Shin et al.

WHAC: World-grounded Humans and Cameras

Wanqi Yin, Zhongang Cai, Chen Wei et al.

MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos

Yushuo Chen, Zerong Zheng, Zhe Li et al.

Progressive Pretext Task Learning for Human Trajectory Prediction

Xiaotong Lin, Tianming Liang, Jian-Huang Lai et al.

Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

Vanessa Sklyarova, Egor Zakharov, Otmar Hilliges et al.

Estimating Body and Hand Motion in an Ego‑sensed World

Brent Yi, Vickie Ye, Maya Zheng et al.

Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

Xinshun Wang, Zhongbin Fang, Xia Li et al.

Navigating Open Set Scenarios for Skeleton-Based Action Recognition

Kunyu Peng, Cheng Yin, Junwei Zheng et al.

AAAI 2024arXiv:2312.06330

skeleton-based action recognitionopen set recognitioncross-modality alignmentdistance-based classification+3

26

citations

#40

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

Yun Liu, Chengwen Zhang, Ruofan Xing et al.

Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation

Xianghui Xie, Bharat Lal Bhatnagar, Jan Lenssen et al.

MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models

Yasiru Ranasinghe, Deepti Hegde, Vishal M. Patel

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Hongxiang Li, Yaowei Li, Yuhang Yang et al.

6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation

Li Xu, Haoxuan Qu, Yujun Cai et al.

MANUS: Markerless Grasp Capture using Articulated 3D Gaussians

Chandradeep Pokhariya, Ishaan Shah, Angela Xing et al.

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

Gyeongjin Kang, Jisang Yoo, Jihyeon Park et al.

Garment Recovery with Shape and Deformation Priors

Ren Li, Corentin Dumery, Benoît Guillard et al.

HybridGait: A Benchmark for Spatial-Temporal Cloth-Changing Gait Recognition with Hybrid Explorations

Yilan Dong, Chunlin Yu, Ruiyang Ha et al.

AAAI 2024arXiv:2401.00271

gait recognitioncloth-changing recognitionspatial-temporal modeling3d human meshes+4

23

citations

#49

LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment

yiming ren, xiao han, Chengfeng Zhao et al.

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

Yuxuan Luo, Zhengkun Rong, Lizhen Wang et al.

Deep SE(3)-Equivariant Geometric Reasoning for Precise Placement Tasks

Ben Eisner, Yi Yang, Todor Davchev et al.

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

Peng Dai, Yang Zhang, Tao Liu et al.

NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors

Yannan He, Garvita Tiwari, Tolga Birdal et al.

MonoHair: High-Fidelity Hair Modeling from a Monocular Video

Keyu Wu, LINGCHEN YANG, Zhiyi Kuang et al.

Towards Open Domain Text-Driven Synthesis of Multi-Person Motions

Shan Mengyi, Lu Dong, Yutao Han et al.

Loose Inertial Poser: Motion Capture with IMU-attached Loose-Wear Jacket

Chengxu Zuo, Yiming Wang, Lishuang Zhan et al.

PORF: POSE RESIDUAL FIELD FOR ACCURATE NEURAL SURFACE RECONSTRUCTION

Jia-Wang Bian, Wenjing Bian, Victor Prisacariu et al.

MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls

Yuxuan Bian, Ailing Zeng, Xuan Ju et al.

NICP: Neural ICP for 3D Human Registration at Scale

Riccardo Marin, Enric Corona, Gerard Pons-Moll

ECCV 2024arXiv:2312.14024

3d human registrationneural fieldspoint cloud alignmenttemplate registration+4

19

citations

#60

HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields

Haozhe Qi, Chen Zhao, Mathieu Salzmann et al.

The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement

Gabriele Trivigno, Carlo Masone, Barbara Caputo et al.

iHuman: Instant Animatable Digital Humans From Monocular Videos

Pramish Paudel, Anubhav Khanal, Danda Pani Paudel et al.

A Simple and Effective Point-based Network for Event Camera 6-DOFs Pose Relocalization

Hongwei Ren, Jiadong Zhu, Yue Zhou et al.

HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation

Yongliang Lin, Yongzhi Su, Praveen Nathan et al.

PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos

Yufei Zhang, Jeffrey Kephart, Zijun Cui et al.

Lifting by Image – Leveraging Image Cues for Accurate 3D Human Pose Estimation

Feng Zhou, Jianqin Yin, Peiyang Li

AAAI 2024arXiv:2312.15636

3d human pose estimationdepth ambiguity problemimage feature selectionpose-guided transformer+4

18

citations

#67

FAR: Flexible Accurate and Robust 6DoF Relative Camera Pose Estimation

Chris Rockwell, Nilesh Kulkarni, Linyi Jin et al.

FusionFormer: A Concise Unified Feature Fusion Transformer for 3D Pose Estimation

Yanlu Cai, Weizhong Zhang, Yuan Wu et al.

Relightable and Animatable Neural Avatars from Videos

Wenbin Lin, Chengwei Zheng, Jun-hai Yong et al.

AAAI 2024arXiv:2312.12877

neural avatarsinverse skinning probleminvertible deformation fieldrelightable 3d avatars+4

18

citations

#70

AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

Yukang Cao, Liang Pan, Kai Han et al.

HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud

WENCAN CHENG, Hao Tang, Luc Van Gool et al.

HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos

Jinglei Zhang, Jiankang Deng, Chao Ma et al.

HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images

Xihe Yang, Xingyu Chen, Daiheng Gao et al.

SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration

Kezheng Xiong, Maoji Zheng, Qingshan Xu et al.

AAAI 2024arXiv:2312.08664

point cloud registrationcross-source point cloudsskeletal representationsunsupervised skeleton extraction+4

17

citations

#75

Object Pose Estimation via the Aggregation of Diffusion Features

Tianfu Wang, Guosheng Hu, Hongguang Wang

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery

Yixuan Zhu, Ao Li, Yansong Tang et al.

Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos

Remy Sabathier, David Novotny, Niloy Mitra

DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans

Akash Sengupta, Thiemo Alldieck, NIKOS KOLOTOUROS et al.

Dynamic Camera Poses and Where to Find Them

Chris Rockwell, Joseph Tung, Tsung-Yi Lin et al.

CVPR 2025arXiv:2504.17788

camera pose estimationdynamic video analysisstructure-from-motionpoint tracking+4

15

citations

#80

Real-Time Simulated Avatar from Head-Mounted Sensors

Zhengyi Luo, Jinkun Cao, Rawal Khirodkar et al.

Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation

Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang et al.

HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models

Yifan Yang, Dong Liu, Shuhai Zhang et al.

ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis

Xiangjun Gao, Xiaoyu Li, Chaopeng Zhang et al.

Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations

Shengeng Tang, Jiayi He, Lechao Cheng et al.

LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation

Ruida Zhang, Ziqin Huang, Gu Wang et al.

Multiple View Geometry Transformers for 3D Human Pose Estimation

Ziwei Liao, jialiang zhu, Chunyu Wang et al.

HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

Guian Fang, Wenbiao Yan, Yuanfan Guo et al.

ECCV 2024arXiv:2407.06937

text-to-image diffusionhuman anomaly generationanatomical anomaly detectionpose-reversible guidance+3

14

citations

#88

Pippo: High-Resolution Multi-View Humans from a Single Image

Yash Kant, Ethan Weber, Jin Kyu Kim et al.

CVPR 2025arXiv:2502.07785

multi-view generationdiffusion transformersingle image reconstruction3d consistent generation+4

14

citations

#89

EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head

Qianyun He, Xinya Ji, Yicheng Gong et al.

ECCV 2024arXiv:2408.00297

3d talking headfree-view synthesismulti-view consistencyemotional expressiveness+4

14

citations

#90

Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset

Yiqun Mei, Mingming He, Li Ma et al.

HUMOS: Human Motion Model Conditioned on Body Shape

Shashank Tripathi, Omid Taheri, Christoph Lassner et al.

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing

Yiheng Li, RuiBing Hou, Hong Chang et al.

Exploring More from Multiple Gait Modalities for Human Identification

Dongyang Jin, Chao Fan, Weihua Chen et al.

3DGazeNet: Generalizing Gaze Estimation with Weak Supervision from Synthetic Views

Evangelos Ververas, Polydefkis Gkagkos, Jiankang Deng et al.

ECCV 2024arXiv:2212.02997

gaze estimation3d coordinate predictiondense 3d meshesdomain generalization+4

13

citations

#95

UniHuman: A Unified Model For Editing Human Images in the Wild

Nannan Li, Qing Liu, Krishna Kumar Singh et al.

On the Utility of 3D Hand Poses for Action Recognition

Md Salman Shamil, Dibyadip Chatterjee, Fadime Sener et al.

X-Dyna: Expressive Dynamic Human Image Animation

Di Chang, Hongyi Xu, You Xie et al.

MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm

Ziyan Guo, Zeyu HU, Na Zhao et al.

ICCV 2025arXiv:2502.02358

human motion generationmotion editingrectified flowsmotion-condition-motion paradigm+4

12

citations

#99

DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses

Yatian Pang, Bin Zhu, Bin Lin et al.

ICCV 2025arXiv:2412.00397

human image animationdiffusion models3d geometry cuesskeleton pose sequences+3

12

citations

#100

Modeling and Driving Human Body Soundfields through Acoustic Primitives

Chao Huang, Dejan Markovic, Chenliang Xu et al.

ECCV 2024

12

citations

Pose Estimation

Top Conferences

Related Topics (Vision Recognition)

Top Papers

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Sapiens: Foundation for Human Vision Models

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

GART: Gaussian Articulated Template Models

Human Gaussian Splatting: Real-time Rendering of Animatable Avatars

GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence

NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

StableAnimator: High-Quality Identity-Preserving Human Image Animation

Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh

MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures

ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

CoMo: Controllable Motion Generation through Language Guided Pose Code Editing

GPAvatar: Generalizable and Precise Head Avatar from Image(s)

FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models

4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations

Universal Actions for Enhanced Embodied Foundation Models

NOPE: Novel Object Pose Estimation from a Single Image

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models

6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions

I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions

Unifying Correspondence Pose and NeRF for Generalized Pose-Free Novel View Synthesis

WHAC: World-grounded Humans and Cameras

MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos

Progressive Pretext Task Learning for Human Trajectory Prediction

Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

Estimating Body and Hand Motion in an Ego‑sensed World

Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

Navigating Open Set Scenarios for Skeleton-Based Action Recognition

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation

MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation

MANUS: Markerless Grasp Capture using Articulated 3D Gaussians

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

Garment Recovery with Shape and Deformation Priors

HybridGait: A Benchmark for Spatial-Temporal Cloth-Changing Gait Recognition with Hybrid Explorations

LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

Deep SE(3)-Equivariant Geometric Reasoning for Precise Placement Tasks

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors

MonoHair: High-Fidelity Hair Modeling from a Monocular Video

Towards Open Domain Text-Driven Synthesis of Multi-Person Motions

Loose Inertial Poser: Motion Capture with IMU-attached Loose-Wear Jacket

PORF: POSE RESIDUAL FIELD FOR ACCURATE NEURAL SURFACE RECONSTRUCTION

MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls

NICP: Neural ICP for 3D Human Registration at Scale

HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields

The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement

iHuman: Instant Animatable Digital Humans From Monocular Videos

A Simple and Effective Point-based Network for Event Camera 6-DOFs Pose Relocalization

HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation

PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos

Lifting by Image – Leveraging Image Cues for Accurate 3D Human Pose Estimation

FAR: Flexible Accurate and Robust 6DoF Relative Camera Pose Estimation

FusionFormer: A Concise Unified Feature Fusion Transformer for 3D Pose Estimation

Relightable and Animatable Neural Avatars from Videos

AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud

HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos

HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images

SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration

Object Pose Estimation via the Aggregation of Diffusion Features

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery