Mapping and Predicting Inequality in Amsterdam based on Deep Learning and Panoramic Street Imagery

(1)

Mapping and Predicting

Inequality in Amsterdam

based on Deep Learning and

Panoramic Street Imagery

(2)

(3)

Amsterdam based on Deep Learning and

Panoramic Street Imagery

To what extent are residential inequalities measurable from

panoramic street imagery?

Rosie Zheng 10996834

Bachelor thesis Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisors Dr. S. Ghebreab T. Alpherts MSc Civic AI Lab Village.ai Science Park 608B 1098 XH Amsterdam 29 January, 2021

(4)

Acknowledgement

This thesis would not have been possible without the people who (digitally) supported me during this time. I am very grateful for them and I believe that I have been able to rapidly

develop myself during this unusual period. First of all, I would like to thank Dr. Sennay

Ghebreab for giving me the opportunity to create and carry out this project, giving clarity to the endgoal, and reassuring and trusting me under his supervision. This supervision was also made possible by Tim Alpherts, who provided me with a strong foundation throughout my thesis. I would like to thank him for his in-depth help, the extra Zoomcalls, his patience, and kindness. I would like to thank Sennay and Tim for giving me the opportunity to present my work at the Civic AI Lab to be able to receive great feedback. Besides, I would not have been able to reach my goals in this short amount of time without the help of Dr. Esra Suel, the author of the paper that was replicated in my thesis. She voluntarily provided me with tools to get a better grip on and verify the used code and to retrieve the best results possible. Furthermore, I would like to thank Sander van Splunter for the excellent coordination of the whole Bachelor project by providing individual feedback and making sure everything went as smoothly as possible in a completely online environment. Lastly, I would like to thank my thesis partner Franklin Willemen for the honest communication and my wonderful friends who organised joint thesis sessions in order to prevent feeling completely isolated during this period. Despite the unusual times, I did not lose the feeling of connection with the people around me and I think that at the end, this was particularly beneficial to the outcome of this project

(5)

Abstract

Street images can be used to understand social phenomena through visual

inspection techniques and machine-learning methods. Municipalities can use such models to gain insight into inequality in the city and to establish new action perspectives to combat inequality. Suel et al. (2019) have demonstrated this is possible. They applied deep learning to panoramic street images of London and used a model to predict different outcomes directly from images without extracting intermediate user-defined features. Here we have 1) replicated the study of Suel et al. (2019), and 2) studied to what extent the model is transferable to data of Amsterdam. We studied model transferability in two steps. First, we applied the pre-trained weights from Suel et al. (2019). Second, the network was solely trained on Amsterdam data. To evaluate the performance, the outcome values were compared to data from official statistics. The results showed high MAE and low accuracy measures when applying the London weights to Amsterdam data. MAEs and accuracies were similar between images used and images not in training when applying Amsterdam weights to Amsterdam data. However, accuracy was low overall. From this, it was concluded that the London weights are not transferable to Amsterdam and that the trained Amsterdam networks are suitable to make predictions on Amsterdam data based on only MAE. Future research could investigate environmental differences between Amsterdam and London, correlations between outcome variables, and increase accuracy by exploring more comprehensive datasets, dividing the data differently, and experimenting with other (self-designed) networks focused on urban features.

(6)

1. Introduction 5

2. Method 10

2.1 Data Acquisition and Preprocessing 10 2.1.1 Image Data 10 2.1.2 Outcome Data 13 2.2 Neural Network 18

3. Results 25

3.1 Predictions with London Weights 25 3.1.1 Decile Predictions 25 3.1.2 Spatial Patterns 27 3.2 Training on Amsterdam Data 29 3.3 Predictions with Amsterdam Weights 30 3.3.1 Images used in Training 30 3.3.1.1 Decile Predictions 30 3.3.1.2 Spatial Patterns 32 3.3.2 Images not used in Training 33 3.3.2.1 Decile Predictions 33 3.3.2.2. Spatial Patterns 35 3.4 Summary of Best Results 37

4. Discussion 42

5. Conclusion 51

References 52

(7)

1. Introduction

Currently, more than half of the world’s human population live in cities and it is predicted that by 2050, an additional 2.5 billion people have moved to urban areas (United Nations, 2018). In Europe, 552 million people live in cities and together they account for 75% of the total

European population (United Nations, 2018). At the end of the 20th century, the concept of a liveable city was concentrated around the idea of a residential area including economic prosperity, public safety, public health, higher education, and close-by resources. This has gradually become the consensus of urban growth and development (Kaal, 2011). Urban

populations tend to have a higher average economic status, education level, and a better average health than rural populations. Simultaneously, there are major inequalities within cities, in terms of income, education, neighbourhood environment, health, and safety (Young, 2013; Smith et al., 2015). Differences are especially big in large cities such as London and Amsterdam, where people with opposite incomes and health live side by side in extremely different living

environments (Behrens & Nicoud, 2014; Baum-Snow & Pavan, 2013). Reduction of inequality is currently one of the main goals in the global sustainable development agenda (United Nations General Assembly, 2015).

Inequalities in residential environments can affect social equity and mental health. Therefore, recent research has focused on investigating these environments using the discussed defined features (economic prosperity, public safety, public health, higher education, close-by resources) (Guan et al., 2018; Suel et al., 2019; Zhang et al., 2018b). When considering these features, inequalities in urban areas can have complex spatial distributions, with some features having more spatial overlap than others. It is important to measure urban inequalities at both

(8)

spatial and temporal resolution to devise appropriate policies, make investments aimed at minimising inequality, and evaluate them. However, a great majority of countries do not have fully linked datasets, resulting in inaccurate measurements of economic, health, and social inequalities (Suel et al., 2019). These datasets usually consist out of varied sources, with different frequencies and spatial resolutions. Furthermore, they are often collected using costly processes.

As time proceeds, large-scale image data has become increasingly available. Previous research showed that images are a potential measurement for urban features and are

advantageous in how fast, frequent, and precise these features can be measured (Dubey et al., 2016; Steele et al., 2017; Naik et al., 2017; Xie et al., 2016; Weichenthal, Hatzopoulou, & Brauer, 2019). The reasons why images could potentially be used to measure urban inequalities are both 1) the direct visibility of features (e.g. building materials, damage, and green space for quality of housing and living environment), and 2) the indirect visibility of features (e.g. housing, types of cars, and types of shops for poverty) (Weeks et al., 2007; Sampson & Raudenbush, 2004). Even differences in outcomes such as health and crime may be detectable in images because they are related to visual social and environmental factors. However, it could be the case that their absolute values are more difficult to predict (Rzotkiewicz et al., 2018; Weichenthal et al., 2019).

According to previous research, street images can be used to map and understand these features through visual inspection techniques and machine-learning methods (Naik et al., 2014; Naik et al., 2017; Dubey et al., 2016; Gebru et al., 2017; Arietta et al., 2014). It could be the case, for example, that there are visually recognisable elements in a street that indicate that people are

(9)

happier or that there is more or less crime. These studies have focused on the measurement of visual neighbourhood attributes such as social class, safety, population density, crime rates, and housing prices from street images. However, each of them used images originating from different settings and cities, applied different analytical methods, and predicted different outcome

variables at different spatial scales. Therefore, it is not possible to accurately compare these studies with each other. This makes it difficult to evaluate whether and to what extent images can be used to detect urban inequalities in a comprehensive manner. As a result, it could be possible that municipalities gain wrong insights into inequality in the city and establish wrong action perspectives to combat inequality.

However, Suel, Polak, Bennett, and Ezzati (2019) made use of images from the same setting and city, applied the same analytical methods, and predicted the same outcome variables at the same spatial scales for four cities in England. They made use of deep learning to measure spatial distributions of income, education, unemployment, housing, living environment, health, and crime in London, Birmingham, Manchester, and Leeds. Their network was trained on panorama street images from London and automatically used features relevant to the

measurement task without having to explicitly specify relevant predefined features (e.g. roof types, trees, vehicles). To test the performance, they first trained a neural network on a subset of panorama street images from London while they used truth data at high spatial resolution from official statistics. As following, they compared how the trained network separated the best-off from the worst-off deciles for different outcome variables in images not used in training. They found that the application of deep learning to street imagery predicted inequalities in some outcome variables better (i.e. income, living environment) than in others (i.e. crime, self-reported

(10)

health, but not objectively measured health). After that, they researched the transferability of the London training set to Birmingham, Manchester, and Leeds. It appeared that the trained network from London could be applied to other cities in England. This indicates that visual features related to well-being measurements are shared between cities in the same country.

Furthermore, Doersch et al. (2015) showed by applying discriminative clustering that not only cities in the same country but cities all over Europe share similar visual features. For

example, arches are common across all European countries (Doersch et al., 2015). Furthermore, it appeared that European cities sometimes shared similarities in floor heights, material for balconies, styles of street lamps, and position of ledges on facades. When focusing on London, it can be observed that the floor heights in London are uneven, with the first floor much taller and more stately than the other floors. This characteristic is also applicable to houses in Amsterdam. Since London and Amsterdam are both European cities, it could be the case that they share more common characteristics and are similar in terms of their visual features. Therefore, it may be possible to transfer the network from Suel et al. (2019) to Amsterdam. To make this possible, Amsterdam should have data similar to London available.

Amsterdam counts several neighbourhoods including streets with different environmental characteristics. These neighbourhoods are described based on (traditional) statistics, such as by the Central Bureau for Statistics (CBS) and the City of Amsterdam. Some statistics are publicly available and are used to map and combat inequality. However, Amsterdam belongs to the cities which do not have fully linked datasets, resulting in inaccurate feature measurements. This makes it impossible to gain accurate insights into urban inequalities in Amsterdam. As a consequence, policies and investments aimed at minimising inequality in the city are not

(11)

executed based on complete truth data. Fortunately, Amsterdam has made a great amount of panorama images publicly available. Currently, these panorama images are used to detect assets throughout the city, such as traffic signs and street lights. In combination with the publicly available statistics data, the overall available data of Amsterdam is similar to the data of London. Therefore, it could technically be possible to transfer the network of Suel et al. (2019) to

Amsterdam. If excellent results are achieved, the issue mentioned above (inaccurate feature measurements leading to misdirected policies and investments) could be tackled. In this way, the municipality of Amsterdam would be able to create a city with more equality.

In this thesis, we will 1) replicate the study of Suel et al. (2019) and 2) study to what extent the model is transferable to data of Amsterdam. We study model transferability in two steps. First, we will apply the pre-trained weights from Suel et al. (2019). Second, the model will be solely trained on data from Amsterdam. To evaluate the performance, the outcome values will be compared to data from official statistics. For the network trained on Amsterdam data, a comparison will be made between the classification of different outcome variables in images used during training and images not used in training. By doing this, we will try to answer the question: To what extent are environmental inequalities measurable in Amsterdam by applying deep learning to panoramic street images?

(12)

2. Method

2.1 Data Acquisition and Preprocessing

2.1.1 Image Data

Panoramic images of Amsterdam have been obtained using the Panoramabeelden

Amsterdam API (Application Programming Interface) from the municipality of Amsterdam. This is open-source and accessible through https://api.data.amsterdam.nl/panorama/panoramas/. The latitude and longitude were needed to acquire the correct data. Because the outcome data was available at the fine scale of complete zip codes consisting of four digits and two letters (PC6), the coordinates were retrieved per PC6.

Zip codes were retrieved from Google Maps. Other potential sources included the

Nederlandse Postcodetabel open-source file (Kraijesteijn, n.d.) of which the coordinates did not

correspond to those in Google Maps. A reason for this could be that the open-source file dated from before 2010. Since Google Maps is a more up-to-date and trustworthy source than the open-source zip code file, we chose to retrieve the latitude and longitude for each zip code using the Google Maps API. Using the other coordinates could have resulted in a wrong matching between zip codes and panorama images.

The newly retrieved latitudes and longitudes were added as columns matching the Amsterdam zip codes. For each zip code, the API returns, if available, the unique identifier for the nearest available panorama image most recently taken (panoid). 2019 was selected as timestamp with the interest of gathering the greatest amount of available image data. Due to the fact that the municipality of Amsterdam has been collecting panoramic street imagery since 2016

(13)

with yearly improvement, 2019 would be the year with the most complete collection. In order to retrieve the closest panorama, we looped through radiuses of 1-25 metres. When at least one panorama was found for a zip code, the information belonging to this/these panorama(s) (panorama URL, corresponding panoid, latitude, longitude, zip code) was included in the descriptive panorama dataset.

The API returned images for 10259 unique panoids, corresponding to 10439 unique zip codes since some panoids were assigned to more than one zip code. No images could be returned for 5557 panoids so these were excluded from the dataset. The package vrProjector (publicly available at https://github.com/bhautikj/vrProjector) was used to convert curved lines of panoramas back to original straight lines (equilateral projection). This package divides the panorama image into a top, bottom, left, right, front, and back. Besides the fact that the sky and the Street View vehicle are approximately the same in every panorama, they are not

representative as street features and are therefore considered as noise. To solve this problem, the top and bottom parts were cut off from the panorama. However, this was accompanied by the loss of information. To still be able to cover a 360° view, five squared image cut-outs were extracted for each panorama by specifying the camera direction (0°, 72°, 144°, 216°, 288°) relative to the Street View vehicle (see Figure 1). This resulted in a total of 51295 images.

To be able to compare performances of the different outcome variables, we only

preserved the images from zip codes with available data for all outcome variables. This lead to a final total of 4469 panoramas and 22345 images. The other 5790/10259 panoramas were used to validate the networks trained on Amsterdam data. More details about this can be found below under the section “Outcome Data”. To replicate the study of Suel et al. (2019) in a precise

(14)

manner and fit our data to their network, we only preserved the first four out of five image cut-outs (see Figure 1 and Figure 2). We did not directly cut out four images from the panoramas since this would have resulted in non-squared images. Squared images are required as input for the network used in Suel et al. (2019). More information about this follows later under the section “Neural Network”.

Figure 1. Original panorama image with the original five image cut-outs and the finally used

(15)

2.1.2 Outcome Data

The Central Bureau for Statistics (CBS) dataset of 2016 (Central Bureau for Statistics, 2016) has been used to retrieve data for training and testing the network for Amsterdam. This source is openly available and gives access to data of the Netherlands at PC6 scale. The 2016 CBS dataset will be used rather than the most recent CBS dataset because of its free accessibility and its major completeness. We chose to use this dataset in combination with panorama images from 2019 with the interest of gathering the greatest amount of available data overall. Since the municipality of Amsterdam has been collecting panoramic street imagery since 2016 with yearly improvement, 2019 would be the year with the most complete collection.

The dataset was filtered on zip codes of Amsterdam and the CBS data was matched with available image data (panoid, URL, latitude, longitude, zip code) based on zip code. There were 10250 zip codes which matched with the zip codes (10259) for the available images. To be able to test and compare performances for evaluating whether, and for which outcome variables, street imagery can be used to measure urban inequalities, it is required to use the same set of images for all outcome variables. It would, for example, not be fair to compare an outcome variable which only used 4000 images during training with an outcome variable which used 10000 images during training. Since every image should be connected to a statistical outcome value for evaluation purposes, it is also required to have the same amount of statistical data points for the outcome variable. When filtering rows with missing values out, the leftover dataset had a total of 0 rows which is impossible to work with. Therefore, it was chosen to focus on fewer outcome variables that resembled outcome variables in the paper of Suel et al. (2019). However, the CBS dataset showed no available data for education, health, and crime. Therefore,

(16)

it was not possible to include these outcome variables in this research. However, data was available for Income, Unemployment, and Barriers to Services. Thus, we decided to focus on these outcome variables. When the missing values were filtered out for these outcome variables, a total of 4469/10259 matching zip codes remained. The rows of zip codes that did not match all outcome variables were used for the test sets for individual outcome variables. These test sets were used later to test the trained Amsterdam models on images not used in training. This resulted in a validation set for Income with 4631 datapoints, one for Unemployment with 74 datapoints, and one for Barriers to Services with 5790 datapoints. For Unemployment, Barriers to Services 2, and Income 2 we calculated deciles per zip code, with decile 1 corresponding to the worst-off around 10% and decile 10 to the best-off around 10%. For Income and Barriers to Services, we applied a different method. The retrieved outcomes variables, their detailed

definitions, computation method, and differences with the outcome variable of Suel et al. (2019) will be discussed in more detail below.

The first retrieved outcome variable is Income. Suel et al. (2019) used a dataset in which a mean was available that was used to calculate the decile scores. In the CBS dataset (2016), the column [M_INKHH, category] was used which represented the median of the standardised Income per household. This median is compared with the distribution of this income for all households in the Netherlands and based on this, classified into the groups presented in Table 1. Combinations between these groups are also possible, for example, low/bottom middle and top middle/high. The groups, including their combinations, formed a total of ten different values. Since this data is categorical, it was not possible to calculate deciles. Therefore, the ten categorical values were used to assign the values 1 - 10, with each value corresponding to a

(17)

decile (see Table 1). The final counts for each decile can be found in Table 2. However, this also resulted in an uneven distribution. That is why we chose to calculate the deciles over these numerical values, resulting in an even distribution with each decile corresponding to 10% of the data.

Table 1

Distribution of decile values for the outcome variable Income

The second retrieved outcome variable is Unemployment. Suel et al. (2019) indicated this by the percentage of households in which the reference person was recorded as unemployed based on the activity of the previous week. The reference person was determined based on economic activity in priority of order: full-time job, part-time job, unemployed, retired, and other. Economically inactive people, people who are not actively looking for work, retired people, students, people looking after family or home, or have long-term sickness or disability

Category

Median Household Income Category Assigned Value low 0-20 1 low/bottom middle 0-40 2 bottom middle 20-40 3 low/middle 0-60 4 bottom middle/middle 20-60 5 middle 40-60 6 middle/top middle 40-8 7 top middle 60-80 8 top middle/high 60-100 9 high 80-100 10

(18)

were not categorised as unemployed. In the CBS dataset (2016), the column [UITKMINAOW, number] represented residents with unemployment benefits, social assistance, and/or disability benefit. Although this is a slightly different measurement of unemployment than the one of Suel et al. (2019), they both represent unemployed people. To compute deciles, first, a percentage is calculated by taking the number of unemployed people from the total number of people (column [INWONER, number]) living in that zip code. A lower decile is equal to more unemployment. The final decile counts are listed in Table 2.

The third retrieved outcome variable is Barriers to Services. Suel et al. (2019) used a more extensive outcome variable called “barriers to housing and services”. This outcome variable was represented by a deprivation index based on physical and financial accessibility of housing and local services. They made use of the data source Office for National Statistics (ONS) for London. The data used by the ONS to calculate this measure includes homelessness, affordability, overcrowding, and distances to schools, supermarkets, health services, and post offices. In the CBS dataset, it is only possible to use distances to schools, supermarkets and health services for this calculation (columns [AFS_SUPERM, number], [AFS_ONDBAS, number], [AFS_ONDVRT, number], [AFS_ONDVMB, number], [AFS_ONDHV, number], [AFS_ZIEK_I, number], and [AFS_HAPRAK, number]). WOZ-value (valuation real estate) could be used to represent affordability but lower affordability, which correlates with higher WOZ-value, does not translate into being worse off. Mostly, people who are living in a house with a high WOZ-value are wealthier and are therefore better off. This is why we only focus on Barriers to Services and use the distances to these services as measurement. A lower decile implies a greater distance from the services. For each of the distances, the decile is calculated

(19)

and the mean values of these deciles per zip code are calculated. The final counts can be found in Table 2. However, taking the mean of the decile values for each distance outcome results in a skewed distribution with each value not corresponding to 10% of the data. Therefore, we also tried to calculate the decile over the sum of the distances instead of first calculating the deciles per distance and taking the mean over this (Barriers to Services 2).

Three example distributions of street images and outcome data used in this research can be found in Figure 2.

Table 2

Decile value counts for the outcome variables

Note. Income has a skewed distribution since it was a categorical variable in the CBS dataset.

Each category could be linked to a value 1-10 (see Table 1). Barriers to Services has a skewed distribution because first, the deciles per distance type were calculated, and after that, the mean of these deciles.

Decile Income Income 2 Unemployme nt Barriers to Services Barriers to Services 2 1 140 447 447 73 447 2 1880 447 447 267 447 3 11 447 447 419 447 4 584 447 447 566 447 5 963 446 446 690 446 6 344 447 447 929 447 7 298 447 447 886 447 8 51 447 447 500 447 9 167 447 447 138 447 10 31 447 447 1 447

(20)

Figure 2. Overview of how street images and outcome data are preprocessed and used in the

analysis. Examples are given for zip codes 1087MA, 1068GD, and 1012PB. Five images were obtained for each zip code from which four were used. For each outcome variable, decile 1 (red) corresponds to the worst-off ten percent zip codes, and decile 10 (dark blue) to the best-off ten percent zip codes in Amsterdam (except for Income and Barriers to Services).

2.2 Neural Network

According to previous research, transfer learning is beneficial when using Convolutional Neural Networks (CNNs) pre-trained on large datasets as fixed feature extractors instead of training from scratch (Yosinski et al., 2014). Therefore, a pre-trained CNN, the VGG16 network, is used to extract features, and the network is only trained for the weights of the fully connected layers. VGG16 is a convolutional neural network model proposed by K. Simonyan and A.

(21)

Zisserman (2015) and converts RGB images to 4096-dimensional feature vectors. Their model achieves a 92.7% test accuracy on ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. They used very small (3 x 3) convolutional filters in all layers. Besides the benefit of its high test accuracy, the network also seems applicable to other image recognition tasks. For these tasks, the network also achieves excellent performance, even when used as a part of a relatively simple pipeline. The input for the VGG16 network is a fixed-size 224x224 RGB image. This size has been applied when cutting the panorama images into five squared images. The network consists of a stack of convolutional layers which is followed by three fully-connected layers: the first two have 4096 channels each, the third performs a 1000-way classification for the ImageNet Large Scale Visual Recognition Challenge, and contains 1000 channels. The final layer is the softmax layer. All the hidden layers are supplied with a Rectified Linear Unit (ReLU, (Krizhevsky et al., 2012)). The cut-out panorama images were the input of this network and the 4096D codes were the output (see Figure 3). The four codes of the four cut-outs per panorama are collected and joined all together in one HDF5 file including 4469 4x4096 vectors.

Suel et al. (2019) used a network that jointly used all four images, belonging to a zip code, in the four channels. These channels can be found in the architecture shown in Figure 3. The information from the different channels is collected and fed into the last layer in which a single continuous value p between 0 and 1 is computed by applying the sigmoid function. They also included batch normalisation in all layers of the network except at the output, where

probability p is computed. To take into account the ordinal relationship among deciles, the single output p-value is interpreted as a probability with which Bernoulli trials are performed (Da Costa

(22)

& Cardoso, 2005). In this case, heads (p) and tails (1 - p) correspond to the events belonging and not belonging to an output class. To train the neural network, the cross-entropy cost function is optimised (see Table 3) in which w represents network weights, ynm is a label vector for the n-th

sample with a value of 1 for the true label class and 0 for all others, and pnm i s the probability of

the m-th decile for the n-th sample. An Adam optimiser was employed (Kingma & Ba, 2014) with a learning rate of 0.0001.

To test the London network, deep-learning-based assignment to deciles was fully applied to the Amsterdam data with the help of the code in the paper of Suel et al. (2019) (publicly available at https://github.com/esrasuel/measuring-inequalities-sview) and the pre-trained

London weights received from Esra Suel: the corresponding author of the paper. The given input for the open-source code can be found in Table 4. These inputs and this code might have led to random or inconceivable results. This could have been caused by the erroneous application of the given materials. To check if this code, these inputs, and these weights were applied correctly, we also tested them with the data of Manchester received from Esra Suel, identical to the

Manchester data used in the paper of Suel et al. (2019). The code was implemented with Pytorch. As following, the network of Suel et al. (2019) was solely trained on the outcome

variables in the Amsterdam data. Each trained network was tested on all outcome variables as well. The defined input values for this can be found in Table 4. The pre-trained weights of the VGG16 are used and the weights are only trained for the fully connected layers. Explanation about the workings of this network are mentioned in the paragraphs above and are visualised in Figure 3.

(23)

Figure 3. The architecture of the ordinal classification network from Suel et al. (2019).

Table 3

Used mathematical formulas

Measurement Formula Explanation

Cross-Entropy w = network weights

ynm = label vector for n-th sample

pnm = probability of m-th decile for

n-th sample

Mean Absolute Error n = number of errors

|yi – xi| = absolute error for

i-th sample

Accuracy See formula

!

(24)

Table 4

The defined input variables in the ordinal_classification_sview.py file from Suel et al. (2019)

Note. The input variable label_name only takes one outcome variable at a time (DECILE_INC,

DECILE_INC_2, DECILE_UNEMPL, DECILE_BARR_SERV, DECILE_BARR_SERV_2) which means that the network is applied to each outcome variable separately and should be run separately.

The predicted decile outcome variable by the different models were compared with each other. The performance was measured per model-outcome combination by both the Mean Absolute Error (MAE) and the accuracy. Calculating the MAE involves summing the

magnitudes (absolute values) of the errors to obtain the total error and then dividing the total

Variable Test Train

train_format 0 (test) 1 (train)

img_hdf5_file string path to HDF5 image file string path to HDF5 image file

lab_pickle_file string path to label pickle file string path to label pickle file

label_name string name of outcome variable column to test on

string name of outcome variable column to train on

clabel_name None None

trained_model_name string path to already trained model name

string output name of to-be-trained model

validation_flag 5 (test only split) 4 (train only split)

train_part 0.95 0.95

city_name “amsterdam” “amsterdam”

batch_size 20 20

num_epochs 50 50

(25)

error by n (see Table 3) whereas calculating the accuracy involves simply dividing the number of features classified correctly by the number of features (see Table 3). Both are widely used in research to measure the difference between predicted values and actual values (Wang & Lu, 2018; Gunawardana & Shani, 2009).

Suel et al. (2019) mainly focused on the MAE as an indicator for performance. Their results demonstrated that misclassification among neighbouring deciles is highly probable since it is harder to differentiate between images from, for example, decile three and four than from decile one and ten. Therefore, even though they reported small MAEs (average = 1.42), their accuracy of allocation showed an average of 0.27. They showed that their network performs well in differentiating between worst-off and best-off areas but not in areas with adjoint deciles. However, it is also important to be able to differentiate between neighbouring deciles since this allows the application of a varied scope of interventions. When it would only be possible to tell apart the worst-off from the best-off, only a limited amount of insights can be gathered, resulting in a limited amount of possible interventions. Furthermore, some interventions would not

perfectly fit with a zip code. For example, when the same intervention would be applied to a decile one zip code and a decile three zip code, there is a chance that the intervention has a more significant effect on the one than the other. A concrete example is the outcome variable Barriers to Services: the decile one zip code experiences a greater distance to services (e.g. 1.5

kilometres) than the decile three zip code (e.g. 1.1 kilometres). When deciding to place the service supermarket one kilometre from both zip codes, this will probably have a more

significant effect on the decile one zip code than the decile three zip code since the supermarket of the decile three zip code was more proximate. However, when an appropriate intervention is

(26)

created both zip codes, the effect could have the same amount of significance on both. Besides that, a decile one zip code probably needs more interventions than a decile three zip code. Furthermore, the best-off and worst-off areas are frequently reputable in a city, making it more necessary to use neural networks for the classification of the areas between them. Therefore, this research has focused on the measurement accuracy in addition to MAE.

(27)

3. Results

3.1 Predictions with London Weights

In order to investigate whether the pre-trained London weights are transferable to Amsterdam, we applied these weights to data of Amsterdam.

3.1.1 Decile Predictions

The average MAE across all outcome variables was 4.76 and the average accuracy 0.06. The best prediction performance regarding MAE was achieved on Barriers to Services (MAE = 4.4381). However, the lowest accuracy also belonged to this outcome variable (accuracy = 0.0013). Income 2 and Barriers to Services 2 achieved the highest accuracy (accuracy = 0.1002). Income was the least predictable outcome variable with an MAE value of 6.0371. A complete overview is listed in Table 5.

Table 5

MAE and accuracy measurements when applying the pre-trained London weights to the outcome variables of Amsterdam

Outcome Variable MAE Accuracy

Income 6.0371 0.0188

Income 2 4.4446 0.1002

Unemployment 4.4676 0.0998

Barriers to Services 4.4381 0.0013

(28)

Since there are ten deciles in total, MAEs around 4.5 and accuracies around 0.1 (see Table 5) represent random and inconceivable results. This might be an indication of an erroneous application of the code, inputs, and weights. In order to check if these were applied correctly, we also tested them with Manchester dataset received from Esra Suel, identical to the Manchester dataset used in the paper of Suel et al. (2019). Testing the London weights on the Manchester dataset resulted in non-random MAE and accuracy values for all outcome variables (see Table 6). This indicated that the weights, code, and inputs were applied correctly.

Table 6

MAE and accuracy scores for outcome variables used in Suel et al. (2019) when testing the London weights on Manchester data (LSOA level)

Note. Suel et al. (2019) also reported the MAE for education. However, the received file does not

contain a column with deciles for education. Therefore, we excluded this variable from this table.

Outcome Variable MAE Accuracy

Self-reported Health 2.2859 0.1289 Occupancy Rating 2.3792 0.1261 Unemployment 2.2232 0.1338 Income Deprivation 2.1715 0.1323 Education Deprivation 2.1587 0.1344 Employment Deprivation 2.2012 0.1318

Barriers to Housing and Services 2.6504 0.1090

Health Deprivation and Disability 2.2569 0.1256

Crime Deprivation 2.4553 0.1192

(29)

3.1.2 Spatial Patterns

Observing the spatial patterns based on truth CBS data, Income and Income 2 have a varied distribution throughout the whole city with only the far West clearly belonging to the lowest decile (see Figure 4 and Appendix, Figure 1). Furthermore, the North appears to belong to the highest decile for Income 2 but the lowest decile for Unemployment (see Appendix, Figure 2). The South-West has high decile values for Income, Income 2, and Unemployment (see Figure 4 and Appendix, Figure 1, and Figure 2), whilst having low decile values for Barriers to Services and Barriers to Services 2 (see Figure 5 and Appendix, Figure 3). For Barriers to Services and Barriers to Services 2, the distribution shows clearly that a greater distance from the city centre correlates with a lower decile (see Figure 5 and Appendix, Figure 3).

The results of testing the pre-trained London weights on these outcome variables indicated that decile ten occurred excessively in the predictions for all outcomes variables. The spatial plots representing the predicted deciles (see Figure 4 and Figure 5) showed that these predictions were highly inaccurate.

(30)

Figure 4. Spatial distribution for Income based on truth data (left) versus tested with pre-trained

London weights (right). 3842 from the 4469 samples plotted. This figure represents a prediction with relatively high MAE and low accuracy.

Figure 5. Spatial distribution for Barriers to Services 2 based on truth data (left) versus tested

with pre-trained London weights (right). 3842 from the 4469 samples plotted. This figure represents a prediction with relatively low MAE and high accuracy.

Note. These choropleths (https://residentmario.github.io/geoplot/plot_references/

plot_reference.html) are made per outcome variable by merging the predicted decile files and the truth data pickle files with the complete Amsterdam Dataframe on PC6. The Amsterdam

Dataframe consisted of the first 16822 rows of the Postcodevlakken_PC6 shapefile which was read with geopandas (https://geopandas.org) and included a column ‘geometry’ with polygons for plotting. 3842 from the 4469 PC6 matched and thus only 3842 zip code locations are plotted in the choropleths.

(31)

3.2 Training on Amsterdam Data

MAE values decreased significantly after training the network on Amsterdam data (see Table 7). The lowest MAE was achieved for Barriers to Services. The training had the least effect on Unemployment. As shown in Figure 6, the MAE decreased greatly in the first few epochs and remained stable from epoch 30 until epoch 50 for all outcome variables. There were a couple of few from epoch 0 until epoch 30, especially for Income.

(32)

Table 7

MAE scores before and after training for the outcome variables when trained solely on Amsterdam data

3.3 Predictions with Amsterdam Weights

To test if the trained Amsterdam networks performed well, we used and compared images used in training and images not used in training.

3.3.1 Images used in Training

In order to test if the trained Amsterdam networks performed well on data used during training, we used images used in training.

3.3.1.1 Decile Predictions

Table 8 shows that each network trained on an outcome variable experienced the best results when tested on matching outcome variables. The best results were gathered for Income (MAE = 0.6749, accuracy = 0.4520) and the lowest for Income 2 (MAE = 2.0948, accuracy = 0.1320). Barriers to Services and Barriers to Services 2 also showed relatively low MAEs and high accuracies, with Barriers to Services 2 (MAE = 1.2023, accuracy = 0.2600) having better

Outcome Variable MAE Before Training MAE After Training

Income 4.9771 1.8543

Income 2 4.3250 2.6068

Unemployment 4.2458 2.6754

Barriers to Services 4.3292 1.2602

(33)

results than Barriers to Services (MAE = 1.3519, accuracy = 0.2376). Notable is that testing the trained Barriers to Services model on Barriers to Services 2 led to the same MAE and accuracy values as the other way around (MAE = 1.6435, accuracy = 0.2009). Additionally, it appeared that testing the trained Unemployment model on Income (MAE = 1.9561, accuracy = 0.1414) and Barriers to Services (MAE = 1.8962, accuracy = 0.1667) led to a relatively good result. Table 8

MAE and accuracy scores for the outcome variables when testing on images used in training

Note. Testing is done with networks based on Amsterdam variables and data. The values in blue

have the lowest MAE and highest accuracy and are used for the geopandas choropleths. The black values represent the models of the trained variables when tested on outcome variables from which decile data was not used in training (e.g. Income is trained on Income images and deciles but not on the other outcome variables). All of the models made use of the same image data when testing. Model Trained Variable Measurem ent

Income Income 2 Unemploy ment Barriers to Services Barriers to Services 2 Income MAE Accuracy 0.6749 0.4520 4.3907 0.0639 2.5943 0.1349 2.8906 0.0899 3.2394 0.1020 Income 2 MAE Accuracy 3.1286 0.0859 2.0948 0.1320 3.0700 0.0857 2.6921 0.0962 2.6844 0.0957 Unemploy ment MAE Accuracy 1.9561 0.1414 2.8313 0.0866 2.0224 0.1387 1.8962 0.1667 2.4976 0.1011 Barriers to Services MAE Accuracy 2.7999 0.1020 3.1143 0.0933 2.8237 0.1090 1.3519 0.2376 1.6435 0.2009 Barriers to Services 2 MAE Accuracy 2.6144 0.1311 2.8278 0.1072 2.8986 0.1112 1.6435 0.2009 1.2023 0.2600

(34)

3.3.1.2 Spatial Patterns

Figure 7 and Appendix, Figure 6 show that the predicted spatial patterns for Income and Barriers to Services 2 are almost identical to the true spatial patterns. This supports the low MAE and high accuracy values in Table 8 belonging to these outcome variables. We can observe that mainly intermediate deciles are predicted for Income 2 while the true deciles are more varied for this outcome variable (see Figure 8). This same observation is shown in Appendix, Figure 4 regarding the outcome variable Unemployment. Furthermore, relatively accurate predictions are observed for Barriers to Services, although the predicted deciles are overall slightly lower than the true deciles (see Appendix, Figure 5). Besides that, the decile ten predictions for Barriers to Services 2 appear to be heavily underrepresented, compared with the truth values (see Appendix, Figure 6).

Figure 7. Spatial distribution for Income based on truth data (left) versus tested on images used

in training with trained Amsterdam Income weights (right). 3842 of the 4469 samples plotted. This figure represents a prediction with relatively low MAE and high accuracy.

(35)

Figure 8. Spatial distribution for Income 2 based on truth data (left) versus tested on images used

in training with trained Amsterdam Income 2 weights (right). 3842 of the 4469 samples plotted. This figure represents a prediction with relatively high MAE and low accuracy.

3.3.2 Images not used in Training

In order to test if the trained Amsterdam networks performed well on data not used during training, we used images not used in training as validation data.

3.3.2.1 Decile Predictions

In contrast to Table 8, Table 9 does not show all the best results on the diagonal. This means that not all models trained on a specific outcome variable perform best on that outcome variable as well. More about this can be found in the chapter “Discussion”. We observed that only Barriers to Services (MAE = 1.5381, accuracy = 0.2264) and Barriers to Services 2 (MAE = 2.1777, accuracy = 0.1382) perform best when the models are trained on these two outcome variables. For Income, the best results were gathered when testing with the model trained on the outcome variable Unemployment (MAE = 2.0695, accuracy = 0.1566). For Income 2 (MAE =

(36)

2.5649, accuracy = 0.1209) and Unemployment (MAE = 2.0405, accuracy = 0.2162) this was the case when tested with the model trained on Income. Among these results, the best performance was achieved for Barriers to Services (MAE = 1.5381, accuracy = 0.2264) and the lowest performance for Income 2 (MAE = 2.5649, accuracy = 0.1209). Noticeable is that two of the worst performances overall apply to Income when tested with the model trained on Income and to Unemployment when tested with the model trained on Unemployment.

Table 9

MAE and accuracy scores for the outcome variables when testing on images not used in training

Note. Testing is done with networks based on Amsterdam variables and data. The values in blue

are the ones with the lowest MAE and highest accuracy (except Barriers to Services 2) and are used for the geopandas choropleths. None of the test data (both images and decile outcome values) is used during training.

Model Trained Variable

Measure ment

Income Income 2 Unemploy ment Barriers to services Barriers to services 2 Income MAE Accuracy 3.1306 0.0874 2.5649 0.1209 2.0405 0.2162 2.6787 0.1019 3.2824 0.0861 Income 2 MAE Accuracy 2.4858 0.1181 2.6944 0.1008 3.4459 0.0675 1.9626 0.1643 2.7005 0.1059 Unemploy ment MAE Accuracy 2.0695 0.1566 2.9968 0.0872 3.8243 0.0405 1.6994 0.2103 2.4765 0.1038 Barriers to Services MAE Accuracy 2.3031 0.1391 3.2250 0.0909 3.4189 0.0676 1.5381 0.2264 2.2544 0.1462 Barriers to Services 2 MAE Accuracy 2.5832 0.1302 3.1241 0.09134 3.2432 0.0405 2.2544 0.1463 2.1777 0.1382

(37)

3.3.2.2. Spatial Patterns

According to Table 9, the best results were achieved when testing the Barriers to Services trained model on Barriers to Services. Figure 10 shows that especially zip codes in the city centre are predicted well. Regarding the zip codes further away from the city centre, we find the

predicted decile values are low to intermediate, while the truth values reveal that these zip codes also belong to decile ten sometimes. For Income, it can be observed that predicted deciles are generally lower than the true decile (see Figure 9). However, the overall pattern seems quite similar. It was not possible to evaluate Income 2 and Barriers to Services 2 on spatial patterns, since the truth deciles all have the colour of decile ten (see Appendix, Figure 7 and Figure 9). When comparing the predicted deciles for Barriers to Services and Barriers to Services 2, we see that the deciles for Barriers to Services 2 are generally higher (see Figure 10 and Appendix, Figure 9). Only little testing data was available for Unemployment for which both image and official statistics data were not used in training. Furthermore, not all this testing data matched with the Postcodevlakken_PC6 shapefile: only 65 samples were plotted for Unemployment (see Appendix, Figure 8). This makes it difficult to observe the overall similarities and differences.

(38)

Figure 9. Spatial distribution for Income based on truth data (left) versus tested on images not

used in training with trained Unemployment weights (right). 4401 of the 4631 samples plotted. This figure represents a prediction with relatively high MAE and low accuracy.

Figure 10. Spatial distribution for Barriers to Services based on truth data (left) versus tested on

images not used in training with trained Amserdam Barriers to Services weights (right). 5466 of the 5790 samples plotted. This figure represents a prediction with relatively low MAE and high accuracy.

Note. 4401 of the 4631 PC6 matched with the Postcodevlakken_PC6 shapefile for Income and

Income 2, 65 of the 74 PC6 for Unemployment, and 5466 of the 5790 PC6 for Barriers to Services and Barriers to Services 2. More information about the matching can be found in the note below Figure 5.

(39)

3.4 Summary of Best Results

From the previous subchapters, it followed that, when testing the London network on Manchester data, the best results were achieved for the outcome variable Education Deprivation. When testing this network on Amsterdam data, this applied to Barriers to Services 2. There was not one network trained on Amsterdam data which achieved well on all outcome variables. The combinations between network and outcome variable that achieved the best results consisted of the network trained on Income, tested on Income for images used in training and the network trained on Barriers to Services, tested on Barriers to Services for images not used in training. A summary including all the best results from the previous subchapters can be found in Table 10.

Table 10

Best overall results

Used Weights Test Data MAE Accuracy London Manchester (Education Deprivation) 2.1587 0.1344 London Amsterdam (Barriers to Services 2) 4.4392 0.1002 Amsterdam (Income)

Images used in Training Amsterdam (Income)

0.6749 0.4520

Amsterdam

(Barriers to Services)

Images not used in Training Amsterdam (Barriers to Services)

(40)

Examples of accurate and inaccurate predictions for the best performances from Table 10 can be found in Figure 11, Figure 12, and Figure 13. Each figure displays two zip codes for which the prediction was accurate and two zip codes for which the prediction was inaccurate. Showing both of them creates the possibility to observe which street features the network is capable of classifying and which ones not (yet). This will be further discussed in the Discussion. Remarkable was that the London weights were only able to accurately predict deciles 1 and 10 for Amsterdam data. The predictions for deciles 2-9 were always inaccurate with these weights. Applying the Amsterdam weights, it was noticeable that no decile was predicted as 1 when the truth value was 10 and the other way around. This frequently happened when applying the London weights. This implies that the London weights could only be used to accurately classify between worst-off and best-off deciles when applied to Amsterdam data and even then, there is a chance to get complete opposite predictions. The Amsterdam weights, on the other hand, never predicted complete opposite deciles and were able to accurately predict decile values 2-9. Therefore, the Amsterdam weights could also be used to classify all deciles.

Figure 11 shows that the pre-trained London weights accurately predicted a decile one and decile ten in streets containing houses with opposing building styles, a clear amount of cars (zero or many), and average green space. Highly inaccurate predictions were made for streets including houses with a more similar building style, an unclear amount of cars (few or several), and a difference in green space. Observing the some almost barren trees, this last difference could be attributed to capturing the panoramas in different seasons.

(41)

Figure 11. Predicted and truth decile values for zip codes 1067AL (panorama left), 1066RA

(panorama bottom right), 1073KK (panorama middle right), and 1063NS (panorama upper right) when testing London weights on Amsterdam images for the outcome variable Barriers to

Services 2.

Figure 12 shows that the Amsterdam weights accurately predicted streets including houses with opposite building styles and clear building materials, difference in green space, and few cars. Highly inaccurate predictions were made for images with opposite green spaces, opposite lighting, and opposite amount of cars (zero or many). The inaccurately predicted panorama on the middle right shows barren trees which indicates that this street was captured during the winter. However, the upper right panorama shows green trees which indicated that the panorama must have been captured during the summer or spring. The opposite green space could therefore be attributed to the difference in season.

(42)

Figure 12. Predicted and truth decile values for zip codes 1064WR (panorama left), 1068NB

(panorama bottom right), 1017HV (panorama middle right), and 1033GM (panorama upper right) when testing Amsterdam weights trained on Income on Amsterdam images used in training for the outcome variable Income.

Figure 13 shows that the Amsterdam weights accurately predicted streets including opposite building styles, clear building material, little green space, and a great amount of visible cars. Highly inaccurate predictions were made for a street with newly-built houses and for a street which shares the same visual features with another street. However, the similar looking streets had complete opposite truth decile values.

(43)

Figure 13. Predicted and truth decile values for zip codes 1103AN (panorama left), 1069EM (panorama bottom right), 1054KZ (panorama middle right), and 1071VP (panorama upper right) when testing Amsterdam weights trained on Barriers to Services on Amsterdam images not used in training for the outcome variable Barriers to Services.

(44)

4. Discussion

In this thesis, we investigated to what extent deep learning can be applied to street images of Amsterdam to map and predict urban inequalities. We researched the transferability of the network trained on London data from Suel et al. (2019), trained this same network on

Amsterdam data, and tested the networks trained on Amsterdam data on images used in training and images not used in training.

When comparing the results for the outcome variables and when focusing both on MAE and accuracy, we found that the application of the London network to street imagery of

Amsterdam predicted inequalities more accurately in some outcomes (Income 2, Barriers to Services 2) than in others (Income, Unemployment, Barriers to Services). However, all the decile predictions by the London network seemed to be remarkably higher than the truth values,

resulting in poor overall transferability. A reason for this could be that worse-off zip codes in Amsterdam share the same visual features as best-off areas in London. Another reason could be the incorrect application of the London network since the predictions seemed random observing the MAE and accuracy values (around 4.5 and 0.1), knowing that there are ten decile values in total. However, we tested this with the received data of Manchester used in the paper of Suel et al. (2019) and this resulted in non-random MAE and accuracy values, indicating that the network was applied correctly.

Furthermore, it could be the case that the data preprocessing did not go according to the method of Suel et al. (2019). Regarding Income and Barriers to Services, the distribution was not evenly spread, resulting in invalid deciles. However, in order to regulate this, we created the

(45)

outcome variables Income 2 and Barriers to Services 2 with each having an evenly spread distribution with valid deciles. Table 5 indeed shows that the results are better for the outcome variables with an evenly spread distribution (Income 2, Unemployment, Barriers to Services 2) than the ones without (Income, Barriers to Services). Nevertheless, none of these results indicate that transferability was successful. Besides the preprocessing of decile outcome values, it could also be the case that the preprocessing of the images was not performed correctly. Suel et al. (2019) made use of four image cut-outs per panorama covering a 360° view. We first cut the panoramas in five 224x224 pieces covering a 360° view but only preserved the first four cuts of each panorama. This was done in order to have the same dimensions for the panoramas encoded in the HDF5 file as Suel et al. (2019) had. However, when focusing on the content of the HDF5 files, the image codes did not cover a 360° view, but only a 288° view. We were not able to have four cuts of size 224x224 per panorama covering a 360° view since we first cut off the sky and the Street View vehicle to reduce noise. In order to hold on to this 224x224 image size used in Suel et al. (2019), we were mandated to retrieve five cuts. Future research could test the London network with Amsterdam images covering a 360° view.

Besides the insufficient workings of the London network and the data preprocessing, the results can also be connected to the data itself. We were not able to retrieve Amsterdam data on PC6 level for most of the outcome variables used in Suel et al. (2019) from the CBS dataset (Central Bureau for Statistics, 2016). Therefore, we were only able to use three outcome

variables from the Amsterdam dataset which matched best with outcome variables in the paper of Suel et al. (2019) (Income, Unemployment, Barriers to Services). However, the data for these outcome variables revealed some defects: 1) Income was categorical and was therefore

(46)

manually divided in values 1-10, 2) Unemployment involved people with unemployment benefits, social assistance, and/or disability benefit while Suel et al. (2019) did not categorise people with a long-term sickness or disability as unemployed, and 3) Barriers to Services was a self-constructed outcome variable which resembled the outcome variable Barriers to Housing and Services from Suel et al. (2019) best but missed the housing part. Unemployment was the only outcome variable with similar data available (percentage number of persons counted as unemployed divided by number of persons in total). It was possible to divide Barriers to Services to decile values according to the numerical distances. However, this was a self-created outcome variable while Suel et al. (2019) used the already existing outcome variable Barriers to Housing and Services for which ready-made data was available. All these issues could have resulted in an inaccurate resemblance of the outcome variables used by Suel et al. (2019).

We tried to access more recent datasets from the CBS, but these were not freely accessible and included the same outcome variable as the one used in this thesis. The

municipality of Amsterdam freely offered a dataset covering more than 500 variables (publicly available at https://data.amsterdam.nl/datasets/G5JpqNbhweXZSw/basisbestand-gebieden-amsterdam-bbga/). These variables agree with the outcome variables in Suel et al. (2019). However, this dataset is on PC4 level while we researched data on PC6 level. Future research could look focus on PC4 level since there is more data publicly available for that than for PC6 level. Street images still have to be retrieved on PC6 level but could be connected to the PC4 data. This will, however, result in less precise predictions since all images will have

(47)

them. Besides, there is an average of 7607 residents per PC4 (Central Bureau for Statistics, 2018) which is higher than that of most planning activities.

Taking into account image data, we used the freely accessible Panoramabeelden Amsterdam API from the municipality of Amsterdam while Suel et al. (2019) used images retrieved from the Google Maps API. Comparing the images from these two sources, it can be observed that the images from the Google Maps API do not contain the Street View vehicle while the images obtained with the Panoramabeelden Amsterdam API do (see Figure 19). We tried to cut the vehicle off using vrProjector, which was, however, not consistent enough.

Besides, the vrProjector converted curved lines of panoramas back to original straight lines using equilateral projection. However, Suel et al. (2019) did not make use of this and immediately applied the Google Street View images (see Figure 19). These and other factors that cannot be detected immediately (e.g. camera settings) could have impacted the result.

Figure 19. Upper: panorama of a street in London cut in four obtained with the Google Maps

API. Lower: panorama of a street in Amsterdam cut in five (first four cuts used and shown above) obtained with the Panoramabeelden Amsterdam API.

(48)

Training the network of Suel et al. (2019) on Amsterdam data worked better for some outcome variables (Barriers to Services, Barriers to Services 2) than for others (Income, Income 2, Unemployment). Overall, it seemed that all outcome variables benefitted from being trained on by the network. It was remarkable that Unemployment benefitted the least from this, while this was the only outcome variable preprocessed in the same way as it was in Suel et al. (2019). However, as mentioned before, Unemployed also covered people with disability benefits while these people were not categorised in Suel et al. (2019) as unemployed. Furthermore, it was noteworthy that the best performance was achieved by training the network on Barriers to Services 2, while this was a self-created variable.

When comparing the results of testing the networks trained on the Amsterdam dataset on images used in training, we found that each outcome variable achieved the best predictions when using a network that was trained on that same outcome variable. However, it also seemed that the network trained on Unemployment was better at predicting Income and Barriers to Services than Unemployment itself. Unemployment and Income are both economic factors that most likely influence each other. For example, zip codes with a high unemployment rate are likely to have a low average income as well, since both connect to poverty. Houses, cars, and types of shops are indirect visual indicators for poverty (Weeks et al., 2007; Sampson & Raudenbush, 2004). Barriers to Services could also be related to types of shops. Since types of shops are indirect visual indicators for poverty, it could also be the case that Barriers to Services is connected to poverty. This would explain why the network trained on Unemployment also performs well on Income and Barriers to Services. Future research could also focus on the correlation between those three variables and investigate the overall link with poverty.

(49)

Comparing the results of testing the networks trained on the Amsterdam dataset on

images not used in training, we found that some outcome variables (Income, Barriers to Services, Unemployment) achieved better performances than others (Income 2, Barriers to Services 2). The best performance was achieved when using the trained Barriers to Services network to make predictions for Barriers to Services. Together with Barriers to Services 2, these two outcome variables were the only ones for which the best predictions were achieved by using the network trained on that same outcome variable. For the prediction of Income, the network trained on Unemployment worked best and for both Income 2 and Unemployment, the network trained on Income worked best. A reason for this could be that Income, Income 2, and Unemployment are all economic variables that presumably influence each other. Also here it applies that a zip code experiencing a lot of unemployment is linked to a low average income as well since they both connect to poverty. As mentioned, houses and cars are indirect visual indicators for poverty and therefore also for these variables (Weeks et al., 2007; Sampson & Raudenbush, 2004). These results indicate another reason why future research should take into account the correlation between those variables.

It was not possible to point out the best network since all networks achieved different performances on different outcome variables. It could be suggested that the Income network is versatile since it seemed to be able to make the best predictions for both Income 2 and

Unemployment. However, when applying the Income network to the outcome variable Income, results were worse than for almost all other outcome variables. Since the distribution for Income was not divided evenly, the network could have been trained with too much focus on values more frequently occurring. We corrected this by introducing Income 2 but it seemed that the

(50)

network trained on Income 2 did not have the best prediction in any of the outcome variables. Therefore, it was not necessarily the distributions that effected the results.

When comparing the results between images used in training and images not used in training, we observe that all outcome variables experience better predictions when tested with images used in training than when tested with images not used in training. However, all the best predictions with images used in training also make use of the deciles used in training. Therefore it would be illogical if these were not the best predictions. Remarkable is that these best

predictions are not particularly different from the best predictions for images not used in training. Since there were only 4469 datapoints per outcome variable available for training, overfitting could have taken place. However, just mentioning that the results are not unusually different, the chances of overfitting are lowered. Only for Income, this has probably been the case.

Looking at the examples of predictions in Figure 11, Figure 12, and Figure 13, we can observe that there is not specifically one area in Amsterdam that achieves good predictions and one area which obtains bad predictions. We can see that the accurate predictions based on London weights happened for streets with a clear amount of cars (none or many) and a clear age for buildings (young or old). Streets with medium old buildings and several or few cars were predicted inaccurately. Since housing and cars are indirect features for poverty (Weeks et al., 2007; Sampson & Raudenbush, 2004), the absence of clarity for these features could have resulted in inaccurate predictions. Focusing on the predictions based on the Amsterdam weights with images used during training, we could observe that the accurate predictions happened for streets with clearly visible cars and housing materials. Inaccurate predictions happened for streets with an unclear view on housing material caused by camera lighting or interfering green

(51)

space. Remarkable is that an inaccurately high prediction was made for a street with a lot of green space while an inaccurately low prediction was made for a street with barren trees. It could have been the case that, when the trees were not barren, they were more accurately classified as green space. Since green space is linked to the quality of housing and living (Steele et al., 2017; Naik et al., 2017; Weichenthal, Hatzopoulou, & Brauer, 2019), trees with leaves could have resulted in a higher (and more accurate) prediction. Therefore, it is important that all used images are captured in the same season. This could prevent inaccurate predictions based on green space. Future research could take this into account. Lastly, when we focus on the predictions based on the Amsterdam weights with images not used during training, we can see that the accurate

predictions are made for streets with houses possessing a characteristic architecture. The building materials and windows are clearly visible and distinguishable. Inaccurate predictions were made for a street with newly-built houses. The vast majority of the dataset did not consist of

panoramas including newly-built houses and therefore, the network was not properly trained on them. It could have been the case that these new buildings with their new architectures caused a mismatch between truth and prediction. Noticeable is the other street which was predicted inaccurately: this street possessed the same visual features as the street which was predicted correctly (see Figure 13, panorama upper right and panorama middle right). Both obtained high predictions while the truth value was high for only one of them. Having the same visual features but different truth values makes it challenging to correctly classify the images.

Taking into account both MAE and accuracy values for all results, it seemed that a lower MAE was almost always associated with higher accuracy. The best results were achieved when having a low MAE and a high accuracy. However, it also appeared difficult to produce a high