Using Pose estimation for privacy friendly age estimation

(1)

Using Pose estimation for privacy friendly age estimation

submitted in partial fulfillment for the degree of master of science

M.J.H Brink

10880372

master information studies

data science

faculty of science

university of amsterdam

2020-04-30

Internal Supervisor External Supervisor

Title, Name Dr Frank Nack

Maarten Sukel

Affiliation

UvA, FNWI, IvI

Municipality of Amsterdam

Email

F.M.Nack@uva.nl

M.Sukel@amsterdam.nl

(2)

Figure 1: A mock-up showing how an application using the proposed method could look.

ABSTRACT

In this paper we present an alternative method for performing age estimation. While earlier works in age estimation focused on using facial features, we propose the usage of body part lengths. This method has the advantage of performing better in surveillance scenarios then existing methods. Our method seems promising, due to it being supported by literature in the anthropometric field. Our proposed method also seeks to deal with different kinds of occlusion that occur with an in-the-wild setting. Our best performing model reported an accuracy of 0.552 and a recall of 63% for all people with an age below 18. We speculate that these results could be improved by using more complex models than the ones tested in this research.

KEYWORDS

Age estimation, Anthropometrics, Pose estimation, Privacy, Ex-plainable AI

1 INTRODUCTION

Age estimation from image data, either photo or video, is an impor-tant task in computer vision. The field of age estimation is closely related to other biometric-estimation tasks such as gender, ethnicity and facial expression estimation. There is a growing demand for this extraction of biometric features from either images or videos. These features can be used in intelligent applications. Rothe et al. [45] gives four examples of applications where age estimation can contribute to:

•Access control, e.g., restricting the access of minors to prod-ucts like cigarettes from vending machines or to prevent minors access to events with adult content.

•Human-computer interaction (HCI), e.g., applications like; an advertisement board can adapt its offering for young, adult, or elderly people, accordingly.

• Law enforcement, e.g., automatic scanning of video records for suspects with age estimation can help during investiga-tions.

• Surveillance, e.g., automatic detection of unattended children at unusual hours and places.

Rothe et al. [45] also states that a large amount of research has been devoted to age estimation from a face image under its most known form: biological (or real) age estimation. This research area spans decades as summarized in multiple large studies [4, 8, 18, 23, 38]. Several public standard datasets [4, 38, 43] for biological age estimation allow for performance comparison of proposed methods. One should note that earlier age estimation methods, as described in section 2.2, did not take privacy friendliness and explainability into consideration. This research aims to include privacy friendliness and explainability into the design of the approach and will use the 1.5 meter monitor [1], as described in section 1.1, as a use case.

1.1 The 1,5 meter monitor

The municipality of Amsterdam is using the 1.5 meter monitor temporarily due to the COVID-19 Emergency Ordinance [33] and due to the need to keep a distance of 1.5 meters. The system is being tested in public spaces at a number of locations in the Amsterdam-Amstelland Safety Region. The system is used in a number of differ-ent locations and situations. For instance, at office spaces, libraries and at marketplaces as seen in figure 2.

On the github page of the system, the following summary of the application is given: “The 1.5 meter monitor is a system created to make users aware of social distancing measures. It uses a cam-era, computer with GPU (or edge devices), and a screen to create awareness for anybody within the field of view of the camera to keep a social distance. It does so by displaying a picture that is taken of passersby, that is overlaid with a smiley: a green smiley for more than 1.8 meters distance, an orange smiley for 1.5-1.8 meters

(3)

Using Pose estimation for privacy friendly age estimation

Figure 2: The 1.5 meter monitor in action

distance, and a red one for less than 1.5 meters distance. An audio signal can also be played.” [1]

The 1.5 meter monitor has been created with a focus on being a positive and responsible [6] form of Artificial Intelligence in the public space. The system has been designed according to the Tada principals [50]. The 1.5 meter monitor is also fully transparent with both having it’s source code publicized and having a registration in the algorithm register [37]. In order to prevent any intrusions to privacy, the data is processed locally and not stored. However, the system does record the number of passers-by within range of the camera. This is done to monitor whether it is too busy in a certain area.

The municipality of Amsterdam indicates that they are both enthusiastic and very cautious about the appliance of AI and asks themselves what the positive and negative consequences are of a digital city. The municipality also states that they “are working to ensure full transparency with regard to our use of algorithms, including efforts to preclude any form of discrimination.” [36].

The Emergency Ordinance states that everyone aged 13 or over needs to keep at least 1.5 meters distance (20-12-2020). However, the above mentioned system does not consider age when measuring the mutual distance between persons. Existing age estimation methods, as described in section 2.2, are not designed with explainability and privacy in mind. The existing methods might also perform less well due to the difference of perspective in which the 1.5 meter monitor lives and the existing methods are developed. An example of the difference in perspective can be seen in figure 3. These two flaws ask for a new different approach that takes both explainability and the different perspective that surveillance systems have into account.

1.2 An anthropometric approach

This research will explore whether the relative lengths of body parts can be used to estimate the age of a person, through ma-chine learning. Hence, the input variables for this model would be the relative lengths of the body parts, see figure 4. This approach has multiple benefits. First, the distance of a person to the camera

Figure 3: An example of the perspective from the 1.5 me-ter monitor (left) vs the perspective existing age estimation methods are tested on (right).

shouldn’t matter, due to the measurements being relative. Secondly, the measurements of body parts shouldn’t enable an algorithm to identify an individual, raising the privacy friendliness of the model. Third, the model takes in understandable features, raising the ex-plainability. Fourth, this method should be able to perform in more scenarios where the "face-based-method" cannot perform. These are cases where the face is altered (due to face filters), obfuscated or just not visible due to occlusion. And lastly, this method should perform better in cases where the spatial resolution of the face is smaller than that existing age estimation methods use.

(4)

be seen if the described approach yields precise enough results for age estimation.

1.3 Paper structure

This paper will investigate if the approach of using bodily measures, from images, to estimate age is feasible, as described in section 1.2. Section 2 gives an overview of existing age estimation datasets, age estimation methods and pose estimation methods. A pose estima-tion method is required to get the bodily measures from images. Section 3 will both formalize the method that will be used to mate age and the approach to test if the described method to esti-mate age is feasible. Section 4 shows the results of the experiments mentioned in section 3. Section 5 will both give an interpretation to these aforementioned results and elaborate on shortcomings of the proposed method. Section 7 will discuss possible solutions to these shortcomings.

2 RELATED WORK

This section will first explore the existing datasets that exist for age estimation, then it will elaborate on existing approaches for age estimation and discuss their shortcomings when it comes to the use-cases as described in section 1.1. Lastly, there will be an overview of existing pose estimation methods.

2.1 Age estimation datasets

There are two types of datasets within the field of age estimation. The first type of dataset uses constrained images. These datasets use mugshots to get a very consistent frontal image of a persons head. Examples of these datasets are the FG-NET [29] and the MORPH [43] datasets. An example of these mugshots can be seen in figure 5. This type of dataset is not suitable for the stated method is section 1.2, due to it only containing portrait photos of faces. This type of dataset does not picture other parts of the body.

Figure 5: A sample from MORPH [42]

The second type of dataset consists of unconstrained images, the so called in-the-wild images. These datasets contain photos

are interesting for this research due to them not just containing images of the face. They also picture other parts of the body, like the torso, arms and legs. Not all photos do contain full frontal body shots, this is also interesting, due to occlusion and rotation still existing in other scenarios like the one described in section 1.1.

Figure 6: A sample from IMDB-WIKI

2.2 Existing age estimation methods

The field of age estimation has benefited hugely from the devel-opments in facial analysis. Age estimation also benefited, more recently, from developments in machine learning, specifically deep learning. Zhang et al. [59] gives an overview of these developments: The earliest age estimation methods used geometric features of the face, this was done by calculating different ratios between facial features [5]. Niels da Vitoria Lobo [5] showed that these geometric facial features can be used to separate baby from adult. However, these could not be used to distinguish between adult and elderly people. Later approaches, like the Active Appearance Model [29], included the use of texture features besides geometric facial fea-tures to achieve better results. The trend to use manually designed features continued on after 2007. Some examples of approaches that also use manually crafted features are BIF [22], Gabor [15], LBP [17] and SFP [49]. These approaches have in common that they both use manually crafted features and regression/classification methods to predict age from facial images. The most used regression models for age estimation are Linear regression [12], CCA [21], PLS [20] and SVR [19]. However, these aforementioned approaches were only effective on constrained benchmark datasets and could not achieve results on in-the-wild images. These in-the-wild images can have large variations in pose, illumination, expression, cosmetics and occlusion. Which make them harder than the benchmark datasets.

(5)

Figure 7: A general overview of the proposed method Zhang et al. [59] also describes the recent research on

Convolu-tional Neural Networks (CNN) concerning age estimation: These CNNs showed that they can learn a compact and discriminitative feature representation. This proved beneficial for the task of age es-timation. Yi et al. [58] was the first to use a CNN for age eses-timation. His approach was the first consistent approach for usage on the in-the-wild datasets. Wang et al. [53] later used a CNN to extract features from a facial image and used different classification and regression methods. Other approaches also used CNNs to predict the gender of a person and trained different CNNs on both genders to predict age [9, 30]. Other approaches had success with the fusion of models on the different genders for the prediction of age [2, 31]. Rothe et al. [44] studied the difference between biological (real) and apparent (as perceived by other people) age prediction.

The field of age estimation has evolved in three stages. It started with using geometric facial distance measures [5]. And evolved from 2007 and on to use other, manually crafted, features. Later methods used Neural Networks for feature extraction and had mostly success with the use of CNNs. These methods got better performing over the years, but lost in their explainability to the public. Also, all previous approaches for age estimation, that use image data, focus on the face. Some CNNs are not specified to only use facial image data. However, their respective authors do not write or speculate about the usage of other bodily measurements.

2.3 Pose estimation methods

A pose estimation model is required to generate the body measure-ments needed as described in section 1.2. Cao et al. [3] states that there are two different research focuses within the field of pose estimation. The first approach, that Cao et al. [3] describes, is to estimate the pose of a single person. This approach performs infer-ence over a combination of local observations on body parts and the spatial dependencies between them. To recognize these body parts generally CNNs are used [35, 51–53, 55]. These approaches became more sophisticated over time. However, all of these approaches can only be used on a single person. This means that the location and scale of the person of interest needs to be given before its pose can be extracted.

The other approach, that Cao et al. [3] describes, is to estimate the pose of multiple persons. Multi person pose estimation can be subdivided in two groups. First and most multi person pose estimation models have used a top-down strategy. These models [11, 16, 26, 57] first detect people and then estimate the pose of each person independently. This approach makes it possible to use

the single person pose estimation models. However, Cao et al. [3] states that this approach where single person pose estimation is used for multi person pose estimation has two problems. One is that the approach suffers from the errors made for detecting a person. Second is that the approach fails to capture the spatial dependencies across different people. Eichner and Ferrari [7] solved this second issue by including these spatial dependencies across different people. The second type of approach is formulated by Pishchulin et al. [41]. They implemented a bottom-up approach where they first identified all body parts and then connected these to individual persons. This approach had as a disadvantage that the processing time of a single image could be a few hours. Insafutdinov et al. [25] built on the work by Pishchulin et al. [41], improving processing time of a single image to just a few minutes. Openpose [3] also built on these earlier methods, applying a bottom-up approach, but it is the first multi-person method that is able to process images real time. To date it is the only real time multi-person pose estimation method that is open-source.

3 APPROACH

The structure of the approach section goes as following, see figure 7 for a general overview: First the dataset is constructed. The dataset contains photos and metadata such as date of birth and the picture date. Then OpenPose is used to generate poses, to then extract the lengths of body parts (e.g. neck length, shoulder width and torso length). With the dataset finalized, it gets split into the train- and the test-set. Different fill methods and classification models are then used on the train-set and will be evaluated afterwards using the test-set.

3.1 Dataset

This paper will make use of the IMDB-WIKI dataset [44]. The dataset is an in-the-wild dataset, meaning it contains images captured un-der completely uncontrolled, real-world conditions (i.e., having different poses, containing noise, bearing various expressions, con-taining occlusions, etc.). This fact proves to be highly advantageous, due to it not only containing the face of a person, but also other body parts. The usage of an in-the-wild dataset for this approach is also interesting due to most of the state-of-the-art methods being trained and evaluated on in-the-wild datasets.

The IMDB-WIKI dataset is composed of photos of actors and still images from movies. It also contains a metadata file which designates the face location of the actor and his or her date of birth

(6)

>=18 15297 3904

The original dataset contained 460.723 photos of which 19.347 with an age below 18. A random selection from all the photos with an age of 18 and above was made in order to balance the amount of actors above and below the age of 18. After balancing 38.694 photos remained. From the partial dataset a set was split of. The test-set contains a flat distribution of ages, mimicking the distribution of ages in the real world. This was done by binning all ages in groups of 6 years (e.g. 0-5, 6-11) and taking the same amount of photos, randomly selected, from each bin. The amount was determined by taking 20% from the smallest bin. The remaining photos were used in the train-set. This results in a train-set with a size of 32.837 photos and a test-set with 5.856 photos. The distribution of the classes over the train- and test-set can be seen in table 1.

Figure 8: A sample from IMDB-WIKI

The IMDB-WIKI dataset has an error in its labels that is hard to quantify due to the size of the dataset. This is error is due to the way it is obtained. The dataset is scraped from the internet and uses the date of birth of a person (which is likely to be correct) and the date the picture is taken. The picture date can contain some mistakes, for example there were 459 photos containing a person with negative age. The IMDB-WIKI dataset also contains stills from movies which are digitally altered or might not contain the actor in question. An example of these noisy pictures can be seen in figure 8.

3.2 Features

OpenPose is used to generate pose-keypoints, these will be used as features for the age classification. A photo might contain multiple

Figure 9: A visualization of the normalization and different types of occlusion

The keypoints generated by OpenPose were normalized such that the coordinates of the joints would all fall between (0, 0) and (1, 1). A visualization of this can be seen in figure 9. After the normalization the lengths of all body parts were calculated. This was done by calculating the euclidean distance between the joints that are connected to a body part. Most photos had people photographed from the side or were partially occluded. In order to combat these missing data points two different methods were used to account for the different types of occlusion. The difference between horizontal and vertical occlusion is visualized in figure 9. The first method is formulated in order to lessen the impact of horizontal occlusion, if a body part is not visible it can be subbed for their mirrored body part if available. For example with the upper arms, if the left upper arm is not visible, the right upper arm’s length is taken. All body parts where both mirrored pairs were visible get averaged in order to get the most reliable length.

(7)

Figure 10: The amount of body part occurrences The second method was formulated to combat vertical occlusion (i.e. when the shoulders were visible, but the torso not). Vertical occlusion is harder to compensate for than horizontal occlusion. This is due to the fact that we have no information about the legs when they are not photographed. A significant portion of the images is a portrait photograph, this can be seen in figure 10. We can see that in most cases the face is photographed (containing the neck, eyes and ears), but that the lower a body part is, the less likely it is to be picked up by OpenPose. This method to combat horizontal occlusion consists of using a fill method. We propose the usage of a smart fill method that takes all visible body parts into account when filling the missing body part. This fill method consists of training a k-nearest neighbors algorithm (KNN) [27] on the training dataset and using it to fill in all missing lengths of body parts. These could either not be detected by OpenPose or simply not visible on the image due to occlusion. Taking the above described steps results in a dataset with ten different features.

3.3 Model training

Nine model types were evaluated. These nine model types are Logistic regression [47], Nearest Neighbours [27], Linear SVM [60], RBF SVM [60], Decision Tree [13], Random Forest [48], Neural Net [56], Adaboost [46] and Naive Bayes [54]. All of these model types are classification models. For the purpose of this research we used the sklearn [39] implementation of these model types. These models all used their default parameters and no hyper parameter optimization was applied.

The models were evaluated on both their accuracy- and recall-scores. The accuracy-score gives a general insight into the per-formance of a model. The recall-score was used to see if a model performs better at any of the two groups.

3.4 Other experiments

For our research we like to propose three additional experiments. First is to check if the fill method proposed in section 3.2 was effective. This was checked in two ways. First we used other fill methods, where all missing lengths were filled with zeros or their respective mean, mode or average. The effectiveness of the fill methods will also be validated by comparing results from models

that used filled values and models that didn’t. The models that didn’t use filled values were trained and validated using only photos of people where no values were missing, after accounting for vertical occlusion. This eliminates the need for using fill methods. The models that used these different fill methods were compared against the proposed KNN fill method, each other and the models that didn’t use fill methods.

The second additional experiment is to set the age threshold at the age of 12. This research mainly focuses on classifying age below 18 or 18 and above. This choice was made due to the age of 18 being a point of interest for many cases where age classification would be used. Nonetheless, the age of 12 is also a point of interest as described in section 1.1. This is why we the results, when the age threshold is set at 12, are also interesting. Hence, there will be a separate test concerning the classification of age with a threshold at 12 years old.

The third additional experiment is based on the earlier research [24, 32], as described in section 1.2, that indicated that the relative body surfaces could be a predictor for age. We force the use of the relative lengths, between the body parts, by dividing every body part by all other body parts separately. This makes for 90 different combinations, ten body parts divided by nine others separately.

4 VALIDATION

The validation section is divided in two parts. Section 4.1 will show the score for the nine explored models for different measures and tries to find a model that can be used for age estimation given the approach described in section 1.2. Section 4.2 will elaborate on the findings from the different experiments proposed in section 3.4. Table 2: The respective accuracy and micro recall per class for each model type that used a KNN to fill missing values

Model Accuracy Recall<18 Recall>=18 Logistic Regression 0.416 0.841 0.203 Nearest Neighbors 0.540 0.606 0.507 Linear SVM 0.351 0.968 0.043 RBF SVM 0.499 0.711 0.394 Decision Tree 0.552 0.633 0.511 Random Forest 0.548 0.676 0.484 Neural Net 0.524 0.645 0.463 AdaBoost 0.486 0.704 0.377 Naive Bayes 0.334 0.992 0.005

4.1 Model performances

Table 2 shows the scores of the nine different model types, as de-scribed in section 3.3, that used a KNN to fill in the missing values. We can distinguish two different behaviours based on the scores displayed in the table. We can distinguish these behaviours by their respective recall scores. The first group of models has similar recall-scores for both classes. The other type of models perform have a significantly higher recall-score for the below 18 class. These models seem to have learned to only predict that the person in question is below the age threshold. This gives these models an

(8)

Figure 11: The feature importance of the Decision Tree model

4.2 Others experiments

None of the used fill methods as described in section 3.4 performed different on the nine different models. This can be concluded from figure 12. We see that there is not a large difference in accuracy between the different fill methods. What stands out is that the KNN scored a higher accuracy than all other fill methods when combined with a Linear SVM. It also stands out that most combinations of fill methods and models didn’t perform differently from cases where no fill method was used.

Figure 12: Comparison of the different fill methods Experiments with shifting the age threshold from 18 to 12 years old also did not yield any notable differences in any of the model performance metrics. We expected that it would be easier for a model to distinguish between the two groups when the threshold is set at 12 years old. This is due to the Wallace rule of nines stating

5 DISCUSSION

This section will discuss two different topics. Section 5.1 will discuss the results acquired in section 4. There will be a focus on what metrics matter for the desired results. Section 5.2 will discuss the different shortcomings of the above described and tested approach.

5.1 Motivation

The results presented in section 4 indicate that the approach used does not produce the hypothesised results. This is in contradiction to the literature that was presented in section 1.2. Here we presented the Wallace rule of nines that is used in the medical field to quickly deduce the relative skin surfaces between body parts when the age is known. This research presumed three things. First is that the Wallace rule of nines was reversible (e.g. when the relative surfaces were known, the age would be deducible) and second that the body part length would be a good proxy for the body part surface area. Lastly, we presumed that transformational problems that occur due to pictures mapping 3d space to a 2d plane. We have the suspicion that the second and third assumption might not be valid. Due to us seeing no reason as for why the Wallace rule on nines wouldn’t be reversible.

5.2 Applicability

The approach in this research was mainly designed with explainabil-ity in mind. This does not guarantee privacy, due to two facts. The first is that the approach still uses the whole image, including all its identifying features, and that the public has to trust that the data is dumbed down enough to remove all identifying data points. The second issue is that pose data might still contain identifying data points, this has not been tested with this research. The described approach isn’t more privacy friendly than other age estimation models, but it wins in explainability. This extra explainability might help the public understand that the user doesn’t intend to invade their privacy.

This approach is also not tested on its intended use case. There are currently no existing datasets of surveillance images that con-tain age labels. These surveillance datasets would have a similar perspective as that of the 1.5 meter monitor. We hypnotise that current age estimation techniques, as described in section 2.2, can’t perform with the same level of accuracy. This is due to the surveil-lance perspective having less data points per face when compared to the dataset they are tested on. However, a different perspec-tive might also yield less precise measurements for the proposed method, which could hurt results.

6 CONCLUSION

The proposed method of using pose estimation to generate features for age estimation seemed promising. This was mainly due to there

(9)

being some literary indicators, as described in section 1.2, support-ing the proposed method. The proposed method also could have two advantages over existing age estimation methods that were described in section 2.2. The first advantage being more explain-able than existing age estimation methods. The second advantage over existing age estimation methods is that the proposed method should be able to perform better in surveillance scenarios, however, this could not be tested.

We did not find results that would stand up in a real world scenario, despite the promising literary findings. Our best model reported an accuracy of 0.552 and was able to correctly label 63% of people below the age of 18 and 51% of people of 18 and above. We speculate that there might be two problems with the currently proposed method. First is that the usage of body lengths is not a good proxy for body area. And second is that there might be a transformation issue when extracting lengths from a 2d image that was taken in the 3d world.

7 FUTURE WORK

Our work uses an existing dataset that contains both images of people and age labels. No existing dataset combines surveillance images with age labels. The field of age estimation might be helped by the creation of such a dataset. There are many applications of age estimation on surveillance data as described in section 1.

We stated that our approach is more privacy friendly, however, we did not back this up with the necessary data. Hence, it might be interesting to explore to which extend pose estimation data can be used to identify an individual. It might be possible to extract other features than age. For example, one article [40] states that body proportions are different for the different genders. This could make gender extractable from pose data.

Our work mirrored the more simple approach that was used by the earlier age estimation methods as described in section 2.2. However, these earlier methods used handcrafted datasets to use in their age estimation, where we used a in-the-wild dataset. Later methods that use these in-the-wild datasets used more complex methods than our proposed method does. We concluded that the simple model types we used might not be complex enough and that other more complex models might be able to learn something from the existing datasets. It might also be interesting to explore age regression using the proposed method.

ACKNOWLEDGMENTS

I would like to thank my first supervisor, Maarten Sukel, and my second supervisor, Frank Nack, for their patience, guidance and support. I’d also like to extend my thanks to the municipality of Amsterdam, whom have given me extensive feedback and support.

REFERENCES

[1] Amsterdam-AI-Team. 2020. 1.5-meter-monitor. Retrieved March 1, 2021 from https://github.com/Amsterdam-AI-Team/1.5-meter-monitor

[2] Grigory Antipov, Moez Baccouche, Sid-Ahmed Berrani, and Jean-Luc Dugelay. 2016. Apparent age estimation from face images combining general and children-specialized deep learning models. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 96–104.

[3] Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2019. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE transactions on pattern analysis and machine intelligence43, 1 (2019), 172– 186.

[4] Bor-Chun Chen, Chu-Song Chen, and Winston H Hsu. 2015. Face recognition and retrieval using cross-age reference coding with cross-age celebrity dataset. IEEE Transactions on Multimedia17, 6 (2015), 804–815.

[5] Niels da Vitoria Lobo. 1994. Age classification from facial images. In CVPR. [6] Virginia Dignum. 2019. Responsible artificial intelligence: How to develop and use

AI in a responsible way. Springer Nature.

[7] Marcin Eichner and Vittorio Ferrari. 2010. We are family: Joint pose estimation of multiple persons. In European conference on computer vision. Springer, 228–242. [8] Eran Eidinger, Roee Enbar, and Tal Hassner. 2014. Age and gender estimation

of unfiltered faces. IEEE Transactions on Information Forensics and Security 9, 12 (2014), 2170–2179.

[9] Ari Ekmekji. 2016. Convolutional neural networks for age and gender classifica-tion. Stanford University (2016).

[10] Sergio Escalera, Jordi Gonzalez, Xavier Baró, Pablo Pardo, Junior Fabian, Marc Oliu, Hugo Jair Escalante, Ivan Huerta, and Isabelle Guyon. 2015. Chalearn looking at people 2015 new competitions: Age estimation and cultural event recognition. In 2015 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.

[11] Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, and Cewu Lu. 2017. Rmpe: Regional multi-person pose estimation. In Proceedings of the IEEE International Conference on Computer Vision. 2334–2343.

[12] Yun Fu and Thomas S Huang. 2008. Human age estimation with regression on discriminative aging manifold. IEEE Transactions on Multimedia 10, 4 (2008), 578–584.

[13] Johannes Fürnkranz. 2010. Decision Tree. Springer US, Boston, MA, 263–267. https://doi.org/10.1007/978-0-387-30164-8_204

[14] Andrew C Gallagher and Tsuhan Chen. 2009. Understanding images of groups of people. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 256–263.

[15] Feng Gao and Haizhou Ai. 2009. Face age classification on consumer images with gabor feature and fuzzy lda method. In International Conference on Biometrics. Springer, 132–141.

[16] Georgia Gkioxari, Bharath Hariharan, Ross Girshick, and Jitendra Malik. 2014. Us-ing k-poselets for detectUs-ing people and localizUs-ing their keypoints. In ProceedUs-ings of the IEEE Conference on Computer Vision and Pattern Recognition. 3582–3589. [17] Asuman Gunay and Vasif V Nabiyev. 2008. Automatic age classification with

LBP. In 2008 23rd International Symposium on Computer and Information Sciences. IEEE, 1–4.

[18] Guodong Guo. 2012. Human age estimation and sex classification. In Video analytics for business intelligence. Springer, 101–131.

[19] Guodong Guo, Yun Fu, Charles R Dyer, and Thomas S Huang. 2008. Image-based human age estimation by manifold learning and locally adjusted robust regression. IEEE Transactions on Image Processing 17, 7 (2008), 1178–1188. [20] Guodong Guo and Guowang Mu. 2011. Simultaneous dimensionality reduction

and human age estimation via kernel partial least squares regression. In CVPR 2011. IEEE, 657–664.

[21] Guodong Guo and Guowang Mu. 2013. Joint estimation of age, gender and ethnicity: CCA vs. PLS. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). IEEE, 1–6.

[22] Guodong Guo, Guowang Mu, Yun Fu, and Thomas S Huang. 2009. Human age estimation using bio-inspired features. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 112–119.

[23] Hu Han, Charles Otto, and Anil K Jain. 2013. Age estimation from face images: Human vs. machine performance. In 2013 international conference on biometrics (ICB). IEEE, 1–8.

[24] John F Hansbrough and Wendy Hansbrough. 1999. Pediatric burns. Pediatrics in review20 (1999), 117–124.

[25] Eldar Insafutdinov, Leonid Pishchulin, Bjoern Andres, Mykhaylo Andriluka, and Bernt Schiele. 2016. Deepercut: A deeper, stronger, and faster multi-person pose estimation model. In European Conference on Computer Vision. Springer, 34–50. [26] Umar Iqbal and Juergen Gall. 2016. Multi-person pose estimation with local

joint-to-person associations. In European Conference on Computer Vision. Springer, 627–642.

[27] Eamonn Keogh. 2010. Nearest Neighbor. Springer US, Boston, MA, 714–715. https://doi.org/10.1007/978-0-387-30164-8_579

[28] Neeraj Kumar, Alexander C Berg, Peter N Belhumeur, and Shree K Nayar. 2009. Attribute and simile classifiers for face verification. In 2009 IEEE 12th international conference on computer vision. IEEE, 365–372.

[29] Andreas Lanitis, Chrisina Draganova, and Chris Christodoulou. 2004. Comparing different classifiers for automatic age estimation. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)34, 1 (2004), 621–628.

[30] Gil Levi and Tal Hassner. 2015. Age and gender classification using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 34–42.

[31] Xin Liu, Shaoxin Li, Meina Kan, Jie Zhang, Shuzhe Wu, Wenxian Liu, Hu Han, Shiguang Shan, and Xilin Chen. 2015. Agenet: Deeply learned regressor and clas-sifier for robust apparent age estimation. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 16–24.

(10)

for human pose estimation. In European conference on computer vision. Springer, 483–499.

[36] Municipality of Amsterdam. 2020. Amsterdam Intelligence. Retrieved March 1, 2021 from https://assets.amsterdam.nl/publish/pages/922120/amsterdam_ intelligence.pdf

[37] Municipality of Amsterdam. 2020. One and a half meter monitor. Retrieved March 1, 2021 from https://algoritmeregister.amsterdam.nl/en/one-and-a-half-meter-monitor/

[38] Gabriel Panis, Andreas Lanitis, Nicholas Tsapatsoulis, and Timothy F Cootes. 2016. Overview of research on facial ageing using the FG-NET ageing database. Iet Biometrics5, 2 (2016), 37–46.

[39] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830. [40] Egle Perissinotto, Claudia Pisent, Giuseppe Sergi, Francesco Grigoletto, and Giuliano Enzi. 2002. Anthropometric measurements in the elderly: age and gender differences. British Journal of Nutrition 87, 2 (2002), 177–186. https: //doi.org/10.1079/BJN2001487

[41] Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter V Gehler, and Bernt Schiele. 2016. Deepcut: Joint subset partition and labeling for multi person pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4929–4937.

[42] Allen W. Rawls and Karl Ricanek. 2009. MORPH: Development and Optimization of a Longitudinal Age Progression Database. In Biometric ID Management and Multimodal Communication, Julian Fierrez, Javier Ortega-Garcia, Anna Esposito, Andrzej Drygajlo, and Marcos Faundez-Zanuy (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 17–24.

[43] Karl Ricanek and Tamirat Tesafaye. 2006. Morph: A longitudinal image database of normal adult age-progression. In 7th International Conference on Automatic Face and Gesture Recognition (FGR06). IEEE, 341–345.

[44] Rasmus Rothe, Radu Timofte, and Luc Van Gool. 2018. Deep expectation of real and apparent age from a single image without facial landmarks. International Journal of Computer Vision126, 2-4 (2018), 144–157.

[45] Rasmus Rothe, Radu Timofte, and Luc Van Gool. 2018. Deep expectation of real and apparent age from a single image without facial landmarks. International Journal of Computer Vision126, 2 (2018), 144–157.

[46] Claude Sammut and Geoffrey I. Webb (Eds.). 2010. Adaboost. Springer US, Boston, MA, 19–19. https://doi.org/10.1007/978-0-387-30164-8_8

[47] Claude Sammut and Geoffrey I. Webb (Eds.). 2010. Logistic Regression. Springer US, Boston, MA, 631–631. https://doi.org/10.1007/978-0-387-30164-8_493 [48] Claude Sammut and Geoffrey I. Webb (Eds.). 2010. Random Forests. Springer US,

Boston, MA, 828–828. https://doi.org/10.1007/978-0-387-30164-8_695 [49] Shuicheng Yan, Ming Liu, and T. S. Huang. 2008. Extracting age information

from local spatially flexible patches. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. 737–740. https://doi.org/10.1109/ICASSP. 2008.4517715

[50] Marleen Stikker. 2017. Het Tada Manifest. Retrieved March 1, 2021 from https: //tada.city/

[51] Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. 2015. Efficient object localization using convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 648–656.

[52] Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estima-tion via deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1653–1660.

[53] Xiaolong Wang, Rui Guo, and Chandra Kambhamettu. 2015. Deeply-learned feature for age estimation. In 2015 IEEE Winter Conference on Applications of Computer Vision. IEEE, 534–541.

[54] Geoffrey I. Webb. 2010. Naïve Bayes. Springer US, Boston, MA, 713–714. https: //doi.org/10.1007/978-0-387-30164-8_576

[55] Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convo-lutional pose machines. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 4724–4732.

[56] Terry Windeatt. 2008. Ensemble MLP Classifier Design. Springer Berlin Heidelberg, Berlin, Heidelberg, 133–147. https://doi.org/10.1007/978-3-540-79474-5_6