Detecting human body parts Using confidence maps and landmark maps

5

Abstract

This article discusses the problem of finding key points on an object and determining its components using the example of the human body as one of the most pressing computer vision problems. The problem is that for an image with a supposed object (in our case, a person), it is necessary to find the position of all important components/points, for a person these are joints: shoulders, elbows, hands, knees, etc. To solve this problem, it is proposed to use neural networks using “heat maps”, which are presented in the form of a confidence map and a landmark map. The landmark map is a matrix corresponding to the original image, each cell of which contains a value from 0 to 1 with the probability of the location of the desired joint (special point) in the corresponding pixel. The landmark map is a matrix, each cell of which contains a two-dimensional vector with the direction of the next joint. The main feature is that the maps help each other to predict the correct values: thanks to the confidence map, the landmark map knows the approximate location of the landmarks and it is easier for it to determine the desired direction and vice versa, knowing the approximate direction to the next landmark, it is easier to predict the location of the landmark. These maps are calculated in several stages, and at each new stage, information from the previous one is used, so when calculating each subsequent map, the accuracy of the results obtained increases. In this work, the optimal number of stages was 6-7, but this value can change depending on the final architecture. Demonstrations of the work are carried out on the COCO dataset, which includes 18 points for each human body.

General Information

Keywords: neural networks, convolutional neural networks, Human Pose Estimation, heat maps, image processing, keypoints, object detection, object segmentation

Journal rubric: Data Analysis

Article type: scientific article

DOI: https://doi.org/10.17759/mda.2025150101

Received: 05.02.2025

Accepted:

For citation: Potenko M.A. Detecting human body parts Using confidence maps and landmark maps. Modelirovanie i analiz dannikh = Modelling and Data Analysis, 2025. Vol. 15, no. 1, pp. 7–18. DOI: 10.17759/mda.2025150101. (In Russ., аbstr. in Engl.)

References

  1. Ahmed S. Kh., Skorodumov S.V. The Use of Neural Network Approaches in the Diagnosis of Diseases. Modelirovanie i analiz dannykh = Modelling and Data Analysis, 2020. Vol. 10, no. 2, pp. 49–61. DOI:10.17759/mda.2020100204 (In Russ., аbstr. in Engl.).
  2. Chris Harris, Mike Stevens "A Combined Corner and Edge Detector". // Alveo Vision Conference. 1988. Vol. 15
  3. David G. Lowe. Object recognition from local scale-invariant features // Proceedings of the International Conference on Computer Vision. 1999. С. 1150—1157 // DOI: 10.1109/ICCV.1999.790410
  4. Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool, "Speeded Up Robust Features", ETH Zurich, Katholieke Universiteit Leuven, 2006
  5. Alexander G. Schwing, Abhinav Gupta, "Keypoint R-CNN" // IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  6. Yash Goyal, Abhinav Gupta, et al. "AlphaPose: Real-Time and Accurate Multi-Person Pose Estimation" // IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  7. Zhe Cao, Tomas Simon, "OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields" // IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019.
  8. Alekseychuk A.S., Mukin Yu. D. Application of Convolutional Neural Networks in the Problem of Removing Shadows from Photographs. Modelirovanie i analiz dannykh = Modelling and Data Analysis, 2024. Vol. 14, no. 1, pp. 41–51. DOI: https://doi.org/10.17759/ mda.2024140103 (In Russ., аbstr. in Engl.).
  9. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, C Lawrence ´ Zitnick // Microsoft coco: Common objects in context. // European conference on computer vision, P. 740–755. Springer, 2014
  10. Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: "Joint training of a convolutional network and a graphical model for human pose estimation. // Advances in Neural Information Processing Systems.", 2014, 1799–1807
  11. Andrew G. Howard, Mark Sandler, Huiyu Wang, et al. "Searching for MobileNetV3", 2019
  12. Karen Simonyan, Andrew Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition" // International Conference on Learning Representations (ICLR), 2015.
  13. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. "Deep Residual Learning for Image Recognition" // IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

Information About the Authors

Maxim A. Potenko, Graduate Student, Moscow Aviation Institute (national research university) (MAI), Moscow, Russian Federation, ORCID: https://orcid.org/0009-0008-5222-2664, e-mail: potenkog@gmail.com

Metrics

 Web Views

Whole time: 25
Previous month: 0
Current month: 25

 PDF Downloads

Whole time: 5
Previous month: 0
Current month: 5

 Total

Whole time: 30
Previous month: 0
Current month: 30