• No results found

Slum Severity Index in Mexico City Definition and First Modelling Steps

N/A
N/A
Protected

Academic year: 2021

Share "Slum Severity Index in Mexico City Definition and First Modelling Steps"

Copied!
77
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Slum Severity Index in

Mexico City

(2)
(3)

To God be all glory

(4)
(5)

Acknowledgments

I want to give thanks to God because these two years of the master where of great learning in many aspects of my life and I always had everything that I needed. I also want to thank my wife Abryl Ramirez Salazar for being by my side for the last year and supporting me, even reading and discussing with the subject of this thesis, which prooved to be very helpful. I also want to thank CONACYT and Mexico for granting me the scholarship for the tuition as well as most expenses of daily life. Thanks also to R. Debraj for his help and guidance through the thesis and Mike Lees for introducing me into the field of study.

(6)
(7)

Abstract

Aiming to formulate a computational model to describe slums in Mexico City, in this thesis a study is presented of previous work in the city dealing with informal settlements and poverty. The study is done through the lenses of the United Nations definition of a slum, this was done to integrate the previous efforts and research, that even though they are not specifically on slums, they provide important information to understand the dynamics and context of slums in the city. The outcome of this study is an index that incor-porates ideas from the previous work but aligns with the UN definition, the similarities and differences with previous indexes is discussed. A comparison is made between three methods to integrate in one quantity the different measurements of the demographic variables related to the slum definition where we found that a clustering method can overlook structure within the settlements that could be useful to incorporate resilience on future models. The SSI is used to map the distribution of slums using 2010 data at the level of urban blocks and qualitatively compared to a result from 2000 data. Also a historical analysis is made at a higher level of aggregation involving data from the period 1990 to 2010. Finally a guideline for a model is proposed.

(8)
(9)

Contents

Acknowledgments III

Abstract V

Figure list IX

Table list XIII

1 Introduction 1

1.1 Research Salience . . . 1

1.2 Research Objectives . . . 3

1.3 The slum definition . . . 4

1.3.1 UN-Habitat definition . . . 4

1.4 Methods and tools . . . 4

1.4.1 Guideline to build a structured and relevant dataframe 4 1.4.2 Factor Analysis, K-Means and Desicion Trees . . . 5

1.4.3 Geographical Information System (GIS) . . . 6

1.5 Thesis Structure . . . 6

2 Literature Review 9 2.1 A short history on the measuring slum severity . . . 9

2.2 The case of Mexico . . . 11

3 Analysis of data and the Slum Severity Index 21 3.1 Data sources . . . 21

3.2 Cleaning the Data . . . 22

3.2.1 Knowing the data . . . 22

3.2.2 Dropping Data using Metadata . . . 23

3.2.3 Graphical analysis to check inconsistencies . . . 24

3.2.4 Filling Missing values . . . 34

3.3 Selecting the attributes for the Slum Severity Index and the Data Mining . . . 35

(10)

3.4 SSI definitions . . . 36

3.4.1 Absolute value or percentage, preparing ssi-Data for Analysis . . . 36

3.4.2 SSI Calculations . . . 37

4 Correlations, Validations and application of the SSIs 39 4.1 Geographical distribution of slums and comparison between the methods . . . 39

4.1.1 Validity of the index . . . 45

5 Applications of our results 49 5.0.1 Application of the SSI: Assesment of politics . . . 49

5.1 An idea on how the dataframe obtained can be further ana-lyzed . . . 50

6 Model Guideline, discussion and future work 53 6.1 The Model Guideline . . . 53

6.1.1 Objective . . . 53 6.1.2 Description . . . 53 6.2 Discussion . . . 54 6.3 Conclusion . . . 56 6.4 Future Work . . . 56 Bibliography 59

(11)

List of Figures

2.1 Growth of Mexico City since 1990. Source Yavidaxiu, Public

Domain, https://commons.wikimedia.org/w/index.php?curid=871060 15

2.2 The former system of lakes in the Central Valley of Mexico, their border is delineated in black over the current extension of the city. This map also shows the remanents of the lakes in blue as well as the main lines of water in the city. Source:

Yavidaxiu - Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=1192145 16

3.1 Here its visible the amount of missing values present in the data, still, after the cleaning procedure we were left with 90% of the data . . . 23 3.2 Samples to far off the identity line show inconsistencies in the

data, there is none below the identity line, nevertheless there are a few on top of it probably due to overestimation of the parameter . . . 24 3.3 The long tail distribution of Dwellers (top), is consistent with

the same kind of distribution for houses (bottom). . . 26 3.4 Even though there seem to be a lot of outliers, an analysis in

other attributes showed that there is no reason do a separate analysis. Nevertheless we had to consider the long tail nature of the distributions in the moment of comparing the samples. . 27 3.5 Blocks with no color mean that they have no houses without

water. . . 28 3.6 Blocks with no color mean that they have no houses without

water. . . 29

(12)

3.7 Mexico City water on houses on 2000, the image quality doesn’t allow to make a good comparison, but still general similarities can bee seen as the concentration of scarcity in water at the periphery of the city. This map was taken from Connolly pub-lication on 2003[7] . . . 29 3.8 This maps show where inadequate structure could be found

in the city. The fact that most of the city is colored does not mean that all the houses are in a bad condition, but that they are amongst all of the city. Consider that less than 6% of the houses have this conditions . . . 32 3.9 Samples below the identity line are inconsistencies of the data,

the contradiction comes from the definition of density per room being total amount of dwellers divided by total amount of rooms in the house, where rooms are >= 1 . . . 33 3.10 Most of the plotted area correspond to 3 persons per room, that

is not a very high value. This may be consequence of migration in Mexico shifting towards the states, and also the decrease of fecundity in the city. . . 34

4.1 Factor Analysis . . . 40 4.2 K-Means . . . 41 4.3 The geographical location of slums according to each definition.

The First is the Factor Analysis, the middle one is kMeans and the last one the linear combination. . . 42 4.4 It is clear that both indexes are linearly related, this is another

proof it is essentially the same to sum all the parameters than to do factor analysis. . . 43 4.5 This is the map obtain in Connolly’s work[8] on her ”housing

quality index”, it corresponds to data from 2000. It is pre-sented here for comparison with our 2010 results. It is possible to see a reduction of the ”Bad-Very Bad” areas. . . 44 4.6 This plots show that as the SSI increases people tend to have

a lower academic level in average. . . 45 4.7 This is Iztapalapa, a part of the city commonly known to be

poor, it is part of what people call Neza-Izta-Chalco slum. See that in a lower level of aggregation (top) it is possible to dis-tinguish some areas with deprivation that is ot available at the lowest scale that CONEVAL shows results (down). . . 47

(13)

LIST OF FIGURES XI

4.8 In the image in the right we show the GIS photo that corre-sponds to the red square over the green area in the map of the left. This maps comes from the results in the FA SSI. The photo comes from Google maps. . . 48 5.1 The slum index shows that Mexico city has been decreasing,

the different changes in the period 90-00 and 00-10 correspond with the change of ruling political party . . . 50 5.2 . . . 51

(14)
(15)

List of Tables

2.1 Exponential population growth from 1910 - 2000 [7]. The pop-ulation growth peeked in the 70’s, at the end of the ”Mexican miracle” . . . 12

(16)
(17)

Chapter 1

Introduction

1.1

Research Salience

According to United Nations 54 per cent of the world’s population live in urban areas and by 2050 the percentage of urban dwellers will increase to 66 per cent of the population in the world[22]. People migration to the cities has been increasing, since the demand for labor has risen and there is an apparently better quality of life due to investment from successful sectors [21]. Mexico city is not an exception, along with Mumbai and S˜ao Paulo, each with around 21 million inhabitants, are taking the forth place in world’s population as mega-cities [22]. Mexico City Metropolitan Area (MCMA), also known as Greater Mexico City from the 1940’s to 2010 the urban population change from 43% to 77.8%[5].

However, most of the times the migration to the urban areas occurs very quickly, and this can be a challenge to the city. As a result, cities grow in an unplanned manner, with an undeveloped infrastructure and a lack of adequate policies which in turn may lead to several problems such as conges-tion, polluconges-tion, poverty, health problems, urban violence and environmental degradation [21], resulting in urban areas more unequal than rural areas and hundreds of millions of the world’s urban poor living in sub-standard condi-tions [22]. In Mexico City, 35.4% of the population lived in poverty in 2010, that amounts to almost 7 million persons in poverty and 875,823 persons in extreme poverty[5]

The United Nations has address this kind of problems in several con-ferences including Millennium Development Goals (MDG), Rio+20 United Nations Conference on Sustainable Development, “The future we want”, and the Sustainable Development Goals (SDG) where urban problems like slums are matters of great urgency [22]. In the MDG target 11 of goal 7 it

(18)

ticularly refers to “improving the life of 100 million slums dwellers in the world” and in the SDG refers to “ensure access for all to adequate, safe and affordable housing and basic services and upgrade slums by 2030”.

Therefore data, models and social research about the dynamics of the cities and particularly about slums are desire and necessary to cover such agenda, not only for the future but for the present days. In order to promote policies that can endure and be more sensitive to people living in those conditions [22]

According to the five characteristics of the UN-HABITAT definition of a slum, the United Nations said that “there are already more than 1 billion people living in slums, and many more are expected in the projected urban expansion to come over the next few decades” [19]. Later, they found that more people is living in slums, but the percentage of people is decreasing

In July 2015 a press release [4] from CONEVAL, a govermental institution in charge of measuring and evaluate poverty in Mexico, stated that poverty rate at a national level has increased from 45.5% to 46.2% in the period of 2012 to 2014, in the case of urban poverty it raised from 40.6% to 41.7%, this amounts to 35.4 million people in urban poverty in all the country. Even a slum is not defined in CONEVAL reports, they have a Social Gap Index that addresses the five variables of the UN definition of a slum within other characteristics like academic level. According to this Index social gap is increasing since 2005.[6]

According to Priscilla Connolly on a UN-HABITAT publication of 2003, there is no specific data about slums in Mexico city, although around two-thirds of the population is considered to live in “colonias populares”, which are characterized as a type of slum[7]. Connolly however clarifies that “by no means should all (people in colonias populares) be considered to be ‘slum dwellers’”. In the same publication it is recognized that there is an “in-creasingly” amount and quality of data that can serve as indicator of the conditions of the dwellers. In later work[8, 10] Connolly explored further the subject of irregular settlements in Mexico locating them in the periphery of the city and showing that the deprivation of the newer informal settlements is worst on average than in older settlements.

My first research objective when choosing this field of research was to model the slums in Mexico city, however the lack of specific slum data through the years would make it impossible to evaluate any model I could come up with. In the other hand the possibility to give a continuation to the work on slums in Mexico to see the changes of their distribution in the city appear equally interesting to me and hopefully it can prove useful in building a model.

(19)

1.2. RESEARCH OBJECTIVES 3

poverty, and though this is relevant for our subject of slums; it is also a much broader concept and also has much more implications. Poverty, slums and informal settlements seem to be correlated but they are not the same. The measurement Connolly used for ‘quality of housing’, that corresponds to the slum condition of a place, do not matches the definitions that appear on other work in slums. Also, it is important to notice here that CONEVAL has an indicator of poverty but not of slums, and that its definition is also different. Basically we needed to conciliate the requirements in the definition of the UN and this previous work done in Mexico City. So we will make a comparison between three different ways of classifying slums (and poverty), in order to show they are not uncorrelated.

Many times the importance of developing good measurements is under-estimated, but in order to come up with relevant models that will be really useful for assisting the policies in the city it is first essential to have good measurements the can asses those models. To be able to come up with good measurement it is important to explore and know the data available as well as understanding the context the model will try to describe . Basically the effort in this thesis was focus on this two preliminary steps toward building the model.

It is clear that urban reforms are now aiming to lower poverty in the different localities in Mexico city and therefore it is very important to have reliable indicators as well as models for these, to asses the undergoing changes and predict future ones. This tools will hopefully help policy makers and guide them in the creation of policies that can have better results and be more sensitive to the condition of the people living in those areas.

1.2

Research Objectives

The main purpose in this thesis is to set an informed ground to be able to propose a relevant computational model of the slums in Mexico City. To set this ground we define an objective measurement of the slum severity in the city by means of studying the previous work on informal settlements and poverty in Mexico under the scope of the definition by the UN. Bearing this in mind as a parallel objective we would analyze and process the census data of 2010 at the level of urban blocks, this enabled us to select the attributes to use in the slum severity index (SSI) as well as proportionate a clean dataframe, ready for further computational analysis. Afterwards a comparison is given regarding three methods of calculating the SSI. The next objective is to show the geographical distribution of the slums in 2010 in Mexico city and make a comparison with the previous work. Finally the last objective is to show

(20)

an application of this measurement by preforming a brief historical analysis of the the slum severity in the city from 1990 to 2010.

1.3

The slum definition

There are many ways of defining a slum and every country has its own definition, therefore it is important to establish a general definition that makes it possible to have a common ground in the work done in all parts of the world. For us the definition of the United Nations fulfill such a requirement since it is an organization with a worldwide focus and acceptance.

1.3.1

UN-Habitat definition

The UN-HABITAT defines ”a slum as an area that has one or more of the following five characteristics”[16]

1. Poor Structural Housing 2. Overcrowding

3. Inadequate access to safe water

4. Inadequate access to sanitation and other infrastructure 5. Insecure residential status.

This definition will allow us to come up with a relevant indicator for the slum severity in Mexico City with the desired compatibility at a global scale. The previous work in Mexico City focus on poverty or informal settlements, that is why this thesis aims to link that work with this definition.

1.4

Methods and tools

In this section we explain briefly how and why we used in this work different statistical and computational methods, and give a brief explanation of some of them.

1.4.1

Guideline to build a structured and relevant dataframe

Han and Michelin[15] proposed a 7 step process for knowledge discovery in databases, this is a process used in data mining, but it is also useful here since we will use unsupervised learning algorithms, like Factor Analysis and

(21)

1.4. METHODS AND TOOLS 5

K-Means clustering, to estimate the values of the SSI. Also we will show the results of applying a decision tree in order to find correlations between variables that could have gone unnoticed.

1.4.2

Factor Analysis, K-Means and Desicion Trees

The reason to have an index or parameter for the slum severity in the different areas in the city, is to be able to identify with an objective measurement and in a simple way where are the slums located. We can see this number as an indicator or a factor that points us to the areas that need more attention.

One way of determining that index or factor is by means of the statistical method, in this case we chose Exploratory Factor analysis. This method consists in finding underlying (unobserved) factors from a set of observed variables. In this work this means that we assumed that the lack of water, quality of materials (specifically floor), sewage and space (overcrowding) in the houses of a certain locality is interrelated and can be explained by the value of one factor, i.e. the SSI. This assumption and choice of method is our way of including the definition and concepts of the United Nations definition of slums.

There are other methods that are also used to reduce the number of variables like PCA, and even though both sometimes have the same numerical outcomes, in theoretical terms Factor Analysis is more accurate, since PCA tries to explain the maximal amount of variance and EFA the underlying factor that explains common variance.

K-Means is a clustering algorithm that works by establishing an objective function to measure the quality of a partition of a dataset in disjoint subsets, quality referring to the similarity between the objects in each set and the dif-ference between clusters .[15] The algorithm consists in reassigning elements to the cluster which mean value is closer, and then recalculating the mean of the clusters, this process repeats until certain conditions are met.

A decision tree is a tool that consist in a graph where the internal nodes correspond to a test on an attribute, the bindings (branches of the tree) cor-respond to the outcomes of that test and the external nodes (the leaves) rep-resent the class labels. Decision Trees can be generated through a supervised algorithm that through an objective measurement like the Gini coefficient, searches for tests in the attributes that would split the samples into subsets with samples that have a less heterogeneous distribution in the different re-sponse classes. The motivation to use a decision tree is to see if we can gain some information on the rules in which the attributes could combine in a future model.

(22)

1.4.3

Geographical Information System (GIS)

GIS software has become very popular in urban modelling because of its possibilities in handling and modeling spatial data. In this thesis we focus on the its capacity to display geographical data with the use of shapefiles, that is files that contain the poligons, lines or coordinates that can describe different elements of the urban landscape. This was used to show the location of the depravity in the different slum characteristic, as well as in the mapping of the slum severity indexes that we calculated.

Since the work in this thesis wants to point out to the creation of the model we also had into consideration the set of criteria proposed by Debraj[11] to evaluate slum models. This include three assessments: first on the level of detail concerning spatial, temporal and conceptual constrains, the second one evaluates the emergence and growth factors included in the model and the last one deals with the sensitvity analysis, validation and calibration done. We mention since this point that a good candidate for a slum model could be to use a Cellular Automaton model coupled with GIS software in order to have a point of departure that could have a different alternatives of improvement. Cellular Automaton models have extensively been used to describe complex systems, they consist of components of space called cells, that can be programmable with logical operations, therefore Automatons. This kind of coupling is not uncommon in urban modelling and has had very good results [18].

1.5

Thesis Structure

This master Thesis consists of five chapters. The first one being this Intro-duction, in chapter two there will be a literature review focus on the history of slums measurement worldwide as well as the previous studies about Mexico City regarding informal settlements and poverty, which in themselves include different indicators.

In the third chapter we will define the slum severity index taking into account what was found in the literature study, as well as the data available. Before doing so we follow the process by Hans and Michelin as we already mention, to analyze, clean and make transformations to the data from 2010 census in order to get a solid database from which it will be relatively simple to calculate the different slum severity indexes (SSIs).

In Chapter 4 we will make a comparison of this indexes and use the SSI to locate the slums in the city, contrasting it with Connolly’s map for 2010. We will include as well a correlation to CONEVAL index.

(23)

1.5. THESIS STRUCTURE 7

In Chapter 5 we will show a simple application of the SSI by making a short historical analysis that corresponds to political changes in the govern-ment. We also include in this chapter a way in which the dataframe could be used to make a computational analysis using a data mining decision tree. The Chapter 6 will include a guideline for a model to simulate the slums as well as a discussion about the results, accomplishments and possibilities for future work.

(24)
(25)

Chapter 2

Literature Review

2.1

A short history on the measuring slum

severity

In 1999 Nelson Mandela launched a plan called ”Cities without slums”[3], a plan developed by City Alliance a partnership between The United Nations Center for Human Settlements (UNCHS, UN-Habitat) along with The World Bank with many cities around the world. This initiative was adopted in the United Nations Millennium Declaration[20] in the following way ”By 2020, to have achieved a significant improvement in the lives of at least 100 million slum dwellers as proposed in the ”Cities Without Slums” initiative”. This declaration give form to the Millenium Development Goals which included a set of indicators to be able to assess the level of accomplishment of the goals. There were two indicators that dealt with this target, that is, the percentage of housholds with access to sanitation and the percentage of households with secure tenure.[12]. Initially the effort went into estimating the security of tenure, that is the security the dwellers have of not being evicted from their houses without a formal legal process. This is particularly important since it is assumed that people without security of tenure will tend to neglect the conditions in which they are living, since an eviction could occur at any moment. However it wasn’t easy to measure directly so the UN decided to do an indirect measure through the structure and amenities of houses.

This was the starting point for Harvey and Guenter[12] to make a broad estimation of the slum dwellers in 2002, they made it at a city level and preformed Principal Component Analysis on 5 variables (water access, per-manent structure, sewer connection, Law compliant and electricity) to ob-tained a single component that could explain 63% of the variance of the data. Their results showed that it is a good approximation to use the amenities and

(26)

structure of households to estimate the security of tenure, also they found out that PCA has very close results to just making a simple linear combina-tion with equal weights of all amenities. Finally they note the importance of creating an estimation at a household level. It is important to notice that the definition of a slum used in this thesis, given in the research framework of the Introduction, also dates to 2002, probably related to the findings in Harvey and Guenter estimation.

One estimation at a household level was done by Patel, Koizumi and Crooks in ”Measuring slum severity in Mumbai and Kolkata: A household-based approach”[1]. In this paper the authors question the usual way in which the Census of India conducts the estimation of slum dwellers in the city, which is based on categorizing in the dichotomy slum/non-slum each unit of study called Census Enumeration Block (CEB). This units are delineated by the Census of India, but Patel argues that those units contain an heterogeneous population, and remembers us ”that not all poor live in slums and those who live in slums are not necessarily the poorest”[17]. Therefore the authors propose a household enumeration based on the data from the National Family and Health Survey (NFHS) on the cities of Mumbai and Kolkata, this data contained information at a household level from a small sample of the cities (.08%and .25%). The way in which the sampling was conducted allowed Patel to determine if a household was part of a slum or non-slum CEB according to the Census of India in order to contrast the results. They found that indeed the percentages of households that suffered from the different characteristics of a slum were similar for households on slum and non-slum CEB. For example the lack of sanitation was of 41.7% for slum households and 25.5% for non-slum households in Mumbai, where as in Kolkata the percentages where even reversed, 22.8% for slums and 29.1% for non-slums. Those results end up in a 35.5% for Mumbai and 39.9% for Kolkata of the slums based on the UN definition to be considered non-slums according to the Census of India. Another interesting contribution of this paper is the definition of the SSI, this index consist of the aggregation of the binary codes that corresponds to the presence or no presence of each of the five slum characteristics. This measurement allows to monitor in a better way the impact of policies, since the latter usually aim to improve one characteristic but slum dwellers usually present more than one.

It is relevant at this point to summarize in three points the importance of these paragraphs in relation with the objectives of the thesis.

1. First of all we get our definition of a slum, and what different attributes may be relevant to measure each characteristic of the slum

(27)

2.2. THE CASE OF MEXICO 11

at a household level

3. Creating a continuous slum index rather than using a dichotomy char-acterization of the slum depravity leads to better asses the impact of policies

2.2

The case of Mexico

The name of the city is an interlude to the complex situation of the megacity of over 21 million inhabitants. Mexico City is the current name of the former Federal District as of the beginning of 2016. The name of Mexico City and environs as well as the Metropolita Area of Mexico City (MAMC), the Metropolitan Area of the Valley of Mexico and Greater Mexico City are different denominations referring to the urban area comprised of 57 to 59 political entities that are found inside of the former Federal District, the State of Mexico and Hidalgo. From now on for simplicity we will refer to it just as Mexico City.

I have lived in Mexico City for 28 years, it never ceases to impress me the size of the city whenever I arrive by plane, one can literally see hills upholstered with houses, and the city extends beyond sight for several min-utes. The same happens as one goes by car to Pachuca, Hidalgo a city 107 km Northeast from Mexico City center, going across a section of the State of Mexico and entering Hidalgo without reaching to the city limit. In the beginning of the twentieth century the area of Mexico City was of 23km2, by the end of the century this area increased to 2123km2 almost 10 times

the original[10]. This impressions drove me to think that Mexico City was a developing city that should encompass a great deal of marginalized ar-eas, informal settlements that could be seen as slums. Moreover it is easy to encounter in the web with the statement that Mexico City contains the biggest slum in the world. Nevertheless, Mexico City’s informal settlements shouldn’t be systematically categorized as slums according to Priscilla Con-nolly. For example the percentage of amenties enjoyed in one of the con-situents of the ”biggest slum”, that is Ciudad Nezahalcoytl, in the State of Mexico, were already close to the city’s average by 2000, while the areas with more than 75% of dwellings missing tubed water presented a marked difference. As stated by Connolly the informal settlements are hetergenous in composition regarding economic income as well as profesional practices, and this has showed an advantage in upgrading the living conditions of the areas. Also govermental programmes in the decade of 2000 had a big impact, particularly in Valle de Chalco-Solidaridad. In Google Earth application it

(28)

is possible to track in time some major changes in this decade, specially in an area North of Nezahualcoyotl (in the direction of expansion of the city) called Chimalhuacan.

Connolly participated in the case studies for the Global Report on Human Settlements 2003[7] writing an extensive report on Mexico Ciy. She makes several historical and geographical remarks trying to understand the current situation of the informal settlements in Mexico city which could be very useful to take into account to improve the model proposed in this thesis. The following are a synthesis of those remarks which I believe could be included in a simulation:

1. Mexico’s national-wide young and economically promising industrial-ization process at the beginning of the nineteenth century was halted by the Mexican Revolution, not surprisingly so since the economic growth was done on top of a deeply unequal society not only disregarding the needs of the rural poor communities but even usurping the fruits of their labor. The second wave of industrialization would have to wait until the 40’s where due to the Second World War an import substitu-ion industry began to grow which paired to a constructsubstitu-ion prohibitsubstitu-ion within the Federal District resulted in the expansion of the city north-ern part into the surrounding State of Mexico. The growth however was not nation-wide, and as a result we have an exponential growth seen on the table 2.1.

1910 1940 1970 2000 Population 471 1, 645 8, 623 17, 946 Relative to national 3.1% 8.4% 17.9% 18.4% Mean Anual Growth 3.2% 4.2% 5.5% 1.4%

Table 2.1: Exponential population growth from 1910 - 2000 [7]. The population growth peeked in the 70’s, at the end of the ”Mexican mir-acle”

2. Badly planned expends on the city’s infrastructure in the 70’s led to an increasing debt and this paired with the rapidly emerging economies world wide w drove Mexico city into a crisis. The massive unem-ployment led people to informal occupations like ”on-street vending”. Again the city was in recession but with a high rate of migration, which also explains the horizontal growth seen on 2.1, Connolly puts it in this way: ”in accordance to the macroeconomic and social processes

(29)

gov-2.2. THE CASE OF MEXICO 13

erning urban development (...) the city expands horizontally in times of recession, when land is cheap, and consolidates in times of economic growth when credit for building is available.”

3. As one of the names of the city indicates Mexico City is a valley, in it a system of lakes was found that served as a basin to the water that came from the mountains, which is not scarce with an average of 700.89 mm per year. Nevertheless the city transformed completely this landscape (see Fig.2.2). The necessary infrastructure for this, developed in the lines of the socioeconomic growth, that is accentuating the economical gap. Highlands often found in the West and South had a privilege po-sition and housed the most rich, while the low lands in the North and Eastern parts of the city not only suffered lack of water and sewage connections, but also from inundations, dust storms and earthquake damages related to the highly compressible subsoil where once the old lakes use to be. The high volumes of water evaporated from the basin though the ages left in the remnants of the lakes only salt water, there-fore not alleviating the unequal situation in terms of access to water, since all this problems didn’t prevent the non-stop demand of cheap land in the second half of the twentieth century. Connelly emphisizes the importance of water as the ”major environmental problem ” and the main ”factor for slums” in the city.

4. Areas high over steep slops in the periphery show high percentages of houses without water.

5. The fear of eviction seems to not pose a threat regarding the security of tenure for informal settlers since contradictory documents from the State obscure the housing market, and instead of fighting it the gov-ernment eventually join efforts to formalize and urbanize many of the areas where the slums where. State of tenure did not show correlation with slum condition either.

6. Slums can be found in diverse areas of Mexico city, not all irregular settlements are slums. Connolly list and characterises informal settle-ments based on the definitions of CONAPO, a governmental institution in charge of demographic planning. This types neither seem to adhere to the definition by UN-HABITAT, neither correspond geophysical with the presence of the caracteristics of a slum.

7. Precarious roofing and 1 room houses (including kitchen, bedrooms and bathroom) seems to be the most significant indicators of poor structures

(30)

in marginal dwellings

8. Amenities as cars, cellphones and computer could be use as side indi-cators of income, their percentages in former slum areas of the city is another proof of the heterogeneity of income that informal settlements ten to achieve after the years. Television in the other hand is not a good indicator, since most houses even those without water, have a tv. Nevertheless that clearly shows a predominant effect on the actions of people, taking into considerations their priorities.

9. Attention is drawn on the fact that percentage and absolut quantities of the different needs can show different results of the most marginal areas

(31)

2.2. THE CASE OF MEXICO 15

Figure 2.1: Growth of Mexico City since 1990. Source Yavidaxiu, Public Domain, https://commons.wikimedia.org/w/index.php?curid=871060

(32)

Figure 2.2: The former system of lakes in the Central Valley of Mexico, their border is delineated in black over the current extension of the city. This map also shows the remanents of the lakes in blue as well as the main lines of water in the city. Source: Yavidaxiu - Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=1192145

In the full version of Connolly’s report[9] she includes a series of maps that show the distribution of the different kind of informal settlements, as well as other maps showing the lack of sanitation, access to water, state of tenure (rent or owned), density and other socioeconomic measurements contained in the the National Population Census 2000. This maps were produced in a project called OCIM-SIG that stands for Urban Obsevatory of Mexico City, Geographical Information Systems for Metropolitan Planning and Research. The information contained on this maps appeared really promising but when

(33)

2.2. THE CASE OF MEXICO 17

trying to access them on the web it appears that the service is no longer available, unfortunately the resolution in the foremost paper doesn’t allow to get all the details, still they will be compared in Chapter 3 with the analysis of 2010 data. However in another document by Connolly [10] a clearer map can be seen on the distribution of settlements, here the definition (CONAPO) of each type of settlement can be seen to depend on the time of establishment, development and legal status as well as if it was part of a governmental plan, just an invasion or other types of origins. This kind of definition immediately pose a question for me, if this could fall in the same mistakes as the Census of India definitions spoken above. Nevertheless, in yet another paper by Connolly[8] she also defines a classification of the settlements in 4 categories derived with the unsupervised learning method of K-means over data including all of the characteristics of the slum in the definition by UN-HABITAT ( even security of tenure in an inderect way through the presence of different amenities). All of the data used on her papers mentioned in this thesis is from 2005 or older census.

The government of Mexico also made an effort to measure poverty. CONEVAL stands in Spanish for ’National Comission for evaluating Social Development Policies’. It is a governmental institution created in 2005 in charge of mea-suring poverty based on the Mexican law that establishes as a primordial objective to guarantee the social rights of individuals. In the ’Methodology for the Multidimensional Measuring of Poverty in Mexico’ the authors revise the definition of poverty by the law of the Mexican State and study how to identify and measure poverty in the following indicators:

1. Income per capita

2. Average household educational delay 3. Access to health services

4. Access to social security

5. Quality and space in living places 6. Access to basic services in the house 7. Access to food

8. Social cohesion

There are three other rules that CONEVAL must fulfil to abide by the law, those are:

(34)

1. A social right is either covered or not, so it must be measured with binary variables

2. All social rights must be treated as equally important, that is equal weights in a linear combination

3. If an individual’s right is not covered then that indivdiual should be regarded as poor.

Based on these rules, CONEVAL measured poverty in two different in-dicators. In one part Income and in the other an index that incorporates several sectors of the data. For this thesis the main interest in CONEVAL work is the definition if that index, which we will refer as CONEVAL index. It is defined in the following way:

A Boolean matrix Ci,k contains in its entries whether the human right

k = 2, ..., 7 is covered (ci,k = 0) or not (ci,k = 1) for the person i = 1, ..., n .

Then this matrix is collapsed column-wise into the vector IP which entries ipi contain a number from 0 to 6 that represents the level of need in which

each individual stands. Then although in the literature a more complicated procedure is explained, what they in fact do, is to divide by 6 (the total num-ber of indicators) and by N (the total numnum-ber of people in the locality). This is the value that is regarded as the indicator of social scarcity for CONEVAL for that locality.

An indicator of slum dwellers can also be found in the webpage of MDG indicators[23],it’s only available at a city scale.

Now is time to make another summary of the relevance of this second section of the Literature Review for the purposes of this thesis as well as for future work regarding the model:

1. Industry forming in the northern part of the city increasing the popu-lation and expanding the city

2. Economical recession pushing the expansion to be horizontal instead of vertical

3. Transformation of the landscapes privileging the lands of the west and south, and affecting the North East low lands where water used to be, making those lands also more vulnerable to earthquakes and inunda-tions. The remaining sources of water in the North East are salty due to high evaporation in the area. This transformation should be consid-ered in modelling the water supply, which is one of the characteristics of a slum which is most present in the city, therefore becoming one of the main reasons for the presence of slums in Mexico City.

(35)

2.2. THE CASE OF MEXICO 19

4. Another consideration related to water supply and sewers is the height and steep of the area, because of technical problems for linkage with the main networks

5. Security of tenure doesn’t seem to affect the presence of slums since the local laws are commonly overlooked and an informal market of land guided the land acquisition and gave peace of mind, and even the government aided the formalization of the settlements in the following years.

6. Other important factors are overcrowding and poor structure which seems to be linked with houses of 1 room.

7. Previous indicators show presence of slums in the periphery of the city and also a correlation with young families and high migration

8. Migration since the 90’s crisis has shifted from the city to the U.S. 9. Overall it is important to considerate the heterogeneity in poor and

rich parts of the city

To end this chapter it is important to point out that even though previous work was on informal settlements and poverty, which are different concepts than slums it is still possible to make a connection with the definition by the UN. Connolly’s work gives us a historical and contextual base to understand the slums in Mexico and to distinguish them from the informal settlements. Her work also includes a measurement that fits with the definition of UN unfortunately this is not the focus of her work and is only presented for the census data on 2000. In this thesis the definition of the SSI will be based on the information provided by the analysis of Connolly, and one of the methods we use on the calculation of the SSI will resemble her ”housing quality index”. A comparison is also shown between our results in 2010 and hers in 2000, which show qualitatively the changes in the decade. On the other hand we have CONEVAL results on poverty, this institution has several measurements on poverty, one of them is an indicator of the ”Social Lag” this is similar to the UN definition but it includes other factors since poverty is a broader concept. The work of CONEVAL will serve us to corroborate our results as well as limit the reach of the slum index. A comparison is done of the results of CONEVAL historical analysis of their index with our own and found some differences, that have to do with this limitation. In summary the previous work proove to be essential to give a context and frame to the work done in this thesis, which was done focusing on a globally accepted definition of slums.

(36)
(37)

Chapter 3

Analysis of data and the Slum

Severity Index

3.1

Data sources

The data used in this thesis comes from the National Institute of Statistics, Geography and Informatics (INEGI) the official government institution in Mexico in charge of censusing the population of the country. INEGI was established in 1983, when it incorporated many departments that dealt with Geography, Informatics and analysis of information that dated from as long as the 1800’s. Even though data exists regarding demography since the be-ginning of the XX century, it is only from its creation that more encompassing and general census have been made every 5 years. (Although it has to be said that some boundaries of the political division have been changing throughout some of the census) We studied the the datasets from 2010. The datasets from 1990 to 2005, where only used to exemplify an application of the slum index.

The data bases can be downloaded from [14], the most recent ones contain data at four levels of aggregation (Municipal, Locality, Group of Blocks, and Block) and it covers an extensive set of attributes regarding people ( age, education, gender distribution, work, health, migration, indigenous origin, disability, religion), houses (construction material, number of rooms, services (electrcity, water, sewage), amenities ( toilet, radio, tv, refrigerator, ...) and type (uninhabited - inhabited, comunal - particular, )), and family structure. From now on we will be referring to the 2010 census data, unless we specify something different.

The dataset is contained in two files, one corresponds to the State of Mexico and the other to the former Federal District. As we said in the

(38)

Literature Review, Mexico city expands over this two states, nevertheless it doesn’t occupy all the territories. So we had to select from the whole dataset, those samples that belong to one of the localities in the city. First it was necessary to build the locality ID, that results from the attributes of Entity, Municipality and Locality, then use a list with all the localities of the city to redefine the dataset with only this samples. This new dataset was then separated into the different levels of aggregation, that is the Municipal Level (L1), Locality Level(L2), Group of Blocks, Level (L3), and the Blocks Level (L4) which is the finest division. The L4 dataset is the one we are going to analyze.

In order to have structure in the analysis of the data, we will follow the Han and Michellin process for ”knowledge mining from data”. In this chapter we will do the first steps, that is Data Cleaning, Data Selection and Data Transformation. Along with this steps we will define the Slum Severity Index (SSI). The Data Mining and the assesment of that knowledge will be on the next chapter as well as the suggestions for a model based on what we learned from the literature and the Data Mining process.

3.2

Cleaning the Data

Following a procedure to clean the data was necessary since just eliminating all samples with missing values left 5 percent of the data, which actually is full of zeros, therfore next sections shows how we made this.

3.2.1

Knowing the data

The data consists of 157,017 samples and 198 attributes. There are a lot of missing values, nevertheless with the information provided the metadata, as well as some pre-processing techniques we will drop the samples and at-tributes that have no significant information or that contain inconsistencies, and fill in the missing values. We drop as few samples as possible because the slums could be found on samples that miss information in many of the attributes.

(39)

3.2. CLEANING THE DATA 23

Figure 3.1: Here its visible the amount of missing values present in the data, still, after the cleaning procedure we were left with 90% of the data

3.2.2

Dropping Data using Metadata

The metadata used to clean the database comes from the description of it[13]. Here we see that the first 8 attributes are only for identification, from here we can drop 3 attributes that refer to the names of State, Municipality and Locality. From the rest of the attributes the descriptions says that there are 7 referring to state of occupancy (inhabited-habited, collective-particular) that will be missing, so we just eliminate them. One of them is relevant, the number of particular inhabited house (H), but we will obtain it back from other attributes by dividing the Average Occupants in the houses ( ¯O) with the occupants of the house (O), i.e., H = OO¯. This is left us with 190 attributes, with which we will work from now on.

Samples with the value ’N/D’ in the attributes will not have any data except for estimates in population and houses, so we will also drop them. The same will happen with all other samples which only contain the identification and estimate values for population and houses. After this we are left with 150,826 samples, that is we 96% of the original number.

The next info is about the value of the attributes, it says that if there is a value less than 3, then a ’*’ will appear in the database. This info will be used in different ways. First we will drop all samples that doesn’t meet this

(40)

criteria in the amount of dwellers, since that means that there will be no info in the rest of the attributes. The samples dropped here were 8,410.

.

3.2.3

Graphical analysis to check inconsistencies

We have two measurements of the total number of people, one is the total population and the other the number of occupants or dwellers. The total population has estimated information, while the total number of dwellers was calculated out of other values. On Fig.3.2 it can be seen that that there are a few samples that differ significantly, this is probably due to an overestimate of the population. On the otherhand, we can also see that the value calculated for dwellers never exceed the total population, giving a certain security about using this attribute. Therefore we decide to take the latter to proceed with the work in this thesis.

Figure 3.2: Samples to far off the identity line show inconsistencies in the data, there is none below the identity line, nevertheless there are a few on top of it probably due to overestimation of the parameter

Now we show the distribution of Population in our samples in Fig.3.3, we can see that we have a long tail distribution. A boxplot Fig.3.4 identified several samples as outliers, nevertheless a further analysis in some important

(41)

3.2. CLEANING THE DATA 25

distribution of attributes show that the outliers should not have a different treatment than the rest of the samples. They are simply high populated blocks, but equally heterogenous as the other samples in their deprivation of services. So we will preform the same analysis to all the samples. We can see on Fig.3.3 that also the number of houses has the a same kind of distribution as expected. Although we did checked for more inconsistencies regarding population and houses and found 3 samples that have an unusual amount of people (> 5000) but almost no houses ( 2, 3), also they don’t contain much data in the other attributes, so we drop them off. This was the last samples that we dropped, so the final number of samples we are using is 142,397 that is 90% of the original sample size.

(42)

Figure 3.3: The long tail distribution of Dwellers (top), is consistent with the same kind of distribution for houses (bottom).

(43)

3.2. CLEANING THE DATA 27

Figure 3.4: Even though there seem to be a lot of outliers, an analysis in other attributes showed that there is no reason do a separate analysis. Nevertheless we had to consider the long tail nature of the distributions in the moment of comparing the samples.

So we find that even at the level of Blocks it is important to take relative measurements in order to compare the samples, nevertheless it is still im-portant to take into consideration places where there are a high number of houses with severe conditions. Here is where we start making use of the GIS software and plot different maps showing the blocks where there are houses with slum characteristics.

(44)

Figure 3.5: Blocks with no color mean that they have no houses without water.

The first map on Fig. 3.5 shows depravity on water in the city. Most of the city area appears in white, since there is no deprivation of water in that area, however the 16% left has at least one house without water. The choice to use the colormap jet was done because we wanted to see all places where there is lack of water, if a gradient colormap was used instead then the areas with small values would not be seen. Still here we have the advantage of locating areas with high density of houses without water. Areas like the south and east of the city show exactly that condition Fig.3.6, on the other hand houses with no water can be seen dispersed around all the city but more agglomerated in the periphery of the city. This results agree with the

(45)

3.2. CLEANING THE DATA 29

those of Connolly for 2000 (Fig3.7), although a further comparison seems a bit out of place since I could not get the data to create 2000 maps nor a better quality image from the ones on her 2003 paper.

Figure 3.6: Blocks with no color mean that they have no houses without water.

Figure 3.7: Mexico City water on houses on 2000, the image quality doesn’t allow to make a good comparison, but still general similarities can bee seen as the concentration of scarcity in water at the periphery of the city. This map was taken from Connolly publication on 2003[7]

(46)

Next map shows the location of houses with only 1 room, this is an impor-tant attribute since according to the literature, this kind of structure could point to slums. What we see on Fig 3.8 is that this is a more prevalent condi-tion, as a matter of fact more than half of the city blocks have 1 room houses, although this houses account for only 7 per cent of the houses in the city. So we used this attribute but in the form of percentage of the total number of houses in the block, we also do this conversions with the other quantities later for the definition of the SSI. For now we are only concerned in showing maps with absolute quantities, and see what information they provide. Next attribute is the earth floor, this quantity was considered unimportant in the literature, nevertheless we can see that it has a presence similar to the 1 room houses, it amounts to 2% of the total houses and appears in almost 30% of the blocks. The map can be seen also on Fig3.8.

(47)
(48)

Figure 3.8: This maps show where inadequate structure could be found in the city. The fact that most of the city is colored does not mean that all the houses are in a bad condition, but that they are amongst all of the city. Consider that less than 6% of the houses have this conditions

One important measurement we needed for the SSI is the density of pop-ulation. Again, there are two measurements in our data, one for the persons per house, and the other for persons per room. Fig.3.9 shows there are some inconsistencies in our data, because it turns out some of the samples have a higher average of people per room that per house, which is a contradiction since a house has 1 or more rooms according to the census description. In this case we cannot simply use the density per house since a house can have

(49)

3.2. CLEANING THE DATA 33

a lot of rooms, actually in the same figure we can see that as the density on rooms increases there are less samples above the identity line, which means that many of the samples that have high values of density in the house, in reality have a several rooms, therefore we would not like to consider them as overcrowded. So decided to use the density per room but still had to do something about the inconsistencies. A possibility is that the values for this samples where actually switched, but since there is no way to prove that we just decide to drop them off considering there are just a few (16 samples).

Figure 3.9: Samples below the identity line are inconsistencies of the data, the contradiction comes from the definition of density per room being total amount of dwellers divided by total amount of rooms in the house, where rooms are >= 1

The attribute of average persons per room will serve to show places with overcrowding, now the question is how many people per room is considered overcrowded. We show the highest 10% which results to be around 1.5 per-sons per room. The result is seen on Fig.3.10 where we can again see the south east illuminated, apparently the deprivation of space doesn’t seem so severe though, there is far less area painted, and most of the painted area correspond to 3 persons per room which is not a very bad condition.

(50)

Figure 3.10: Most of the plotted area correspond to 3 persons per room, that is not a very high value. This may be consequence of migration in Mexico shifting towards the states, and also the decrease of fecundity in the city.

3.2.4

Filling Missing values

Many attributes are related in pairs, that is the total population, female pop-ulation and male poppop-ulation of a certain attribute are reported, for example, the total, female and male population with completed primary education. We used this to fill in the complementary missing values. It also happened with three valued attributes for example, houses with 1, 2 or more rooms. The value of 1 room houses is particularly important since we learned from

(51)

3.3. SELECTING THE ATTRIBUTES FOR THE SLUM SEVERITY INDEX AND THE DATA MINING35

the literature that we should use it for assessing the quality of structure and overcrowding in a house.

Assumption regarding missing values: For the next step we indicate that most of the missing values involve a lot of factors from all the persons involve in the process, their communication, their understanding, etc. Therefore we consider appropiate to treat the missing values as missing at random (MAR). Following the assumption we decided to fill in with average values, but not in a direct way, since the distribution of houses shows a wide range in the number of houses per block (H), rather we divided each attribute by H or the total population (P), depending on which type of attribute it was. Then we calculated the average of the blocks that are in the same group, making reference to the next level of aggregation (l3) and finally we multiplied this average value to H or P depending on the case. This the new value that was assigned instead of the missing value.

There are other attributes that are mean values already, so for those we simply assignated the mean of the means of the l3 area. After this there were only 698 samples with missing values. That is because there was no other blocks with info on certain attributes that belonged to the same l3 area. So here we used again the fact that some missing values correspond for that value being less than 3, since 0 do appears we are only left with 1 and 2. We chose to fill in the rest of the missing values with one.

This ends the cleaning procedure and leave us with a complete dataset ready for processing.

3.3

Selecting the attributes for the Slum

Sever-ity Index and the Data Mining

As it was described in the Literature Review, we were interested in defining the index in accord to the definition by the UN, so for the definition of the slum index, following what we learned from the literature[7] we will use the following attributes from the data:

1. Houses without tubed water will be use to measure ”inadequate access to safe water”.

2. Houses without sewage + Houses without toilet will corresponds to ”inadequate sanitation”

3. Houses with only 1 room + houses where the floor is the bare ground for ”poor structural housing”

(52)

4. Average number of people per room for ”Overcrowding”

With this four attributes we build a dataframe (ssi-Data), which will be use to define the SSI.

For the Data Mining we used this attributes as responses, bare in mind the knowledge we wanted to mine is about transition rules for a model of slums. The idea is to find which of the remaining attributes would be good candidates to take into account when building the model. So those other attributes will form our test set.

3.4

SSI definitions

3.4.1

Absolute value or percentage, preparing ssi-Data

for Analysis

The total amount of houses per block vary widely over the dataset because of the long tail distribution they have as shown before, so one must expect that if one house out of 20 doesn’t have access to water, compared to 1 house out of 10, the slum severity index should differ. That is why we need to take a relative value to be able to define an index that will capture this behaviour. For that we just divide the data for water (fwa), structure (fst)

and sanitation (fsa) in each sample by the corresponding number of houses in

the sample (Nb), which corresponds to a block. For density we already have

a relative measurement, but we want it to be within the range of 0-1, also we want to look for overcrowding and this would start above 2 persons per room. So we subtracted 2 to the density and then give a 0 to all the negative values. Then we did the following operation fde raw(d) = 1 − 1+d1 where

d is the density of each sample. We thought of this operation because we wanted a measurement of the share per person of the resource, and we wanted the measurement to increase with the deprivation of the slum characteristic. Then to fit it to a range between 0-1 we divide by the maximum of fde raw

The following equations summerize the data that will be used to calculate the SSI.

(53)

3.4. SSI DEFINITIONS 37

fwa =

Houses without tubed water in the block Nb

fst =

1 Room Houses in the block + Dirt floor Houses in the block 2 ∗ Nb

fsa =

Houses without drainage + Houses without Toilet 2 ∗ Nb

fde raw = 1 −

1

1 + (Average persons per room in the block − 2)fde = fde raw/max(fde raw) Each of this fractions has a range between 0 and 1.

3.4.2

SSI Calculations

First we start with Exploratory Factor analysis (FA), as we said in the litera-ture review, we thought this would be the best match theoretically speaking, since it matches with the idea of the slum status as a factor of depravity of the different resources. The Slum Severity Index is the dot product of the communalities ( which are four) and the values of the sample in the ssi-data (in our case since there is only 1 factor, so the communality is simply the square of the factor loading in each of the variables). This index is a continuous variable and is between the range 0-1.

The second definition is in resonance with Priscilla method with the data in 2000, that is using K-Means to classify in 4 categories the blocks. We used the library sklearn in Python to apply the classification algorithm.

Lastly in accordance with the work by Patel we present the slum index as a linear combination (LC) with equal weights of the amount of households per block and per resource depravity. This also corresponds to Coneval Index, although in Coneval more attributes are taken into consideration. This index is also a continuous variable, but ranges from 0-4. The mathematical expression of LC combination is as follows:

LC = fwa+ fsa+ fst + fde

On the next chapter we present the results of this definitions of the SSI and show a short application.

(54)
(55)

Chapter 4

Correlations, Validations and

application of the SSIs

4.1

Geographical distribution of slums and

comparison between the methods

Lets begin with the locations of slums. Fig. 4.1 to 4.3 show the three maps of the different SSI definitions. It is important to notice that since the range of the values obtained with Factor Analysis is different from the Linear Combination we divided the values by the corresponding maximum to re-scale them from 0-1, and be able to compare more easily. For the maps of FA and LC we only plotted the 10% with higher values for clarity, that is why most of the map appear in white. W can see that all of the maps including K-Means mark the same areas as slums. Nevertheless there is an important difference between the K-means plot and the rest, that is within the slum area that is mainly marked by the red color it fails to show the variety of deprivation which is shown by the other two, therefore we loose information by clustering all those places together.

(56)
(57)

4.1. GEOGRAPHICAL DISTRIBUTION OF SLUMS AND COMPARISON BETWEEN THE METHODS 41

(58)

Figure 4.3: The geographical location of slums according to each definition. The First is the Factor Analysis, the middle one is kMeans and the last one the linear combination.

In all these maps we can see a phenomenom called peripherization, where the slums appear to be located on the edge of the city, this is also visible in the previous work by Connolly[8]. This phenomenon has been studied in slums around the world, a model was made particularly to reproduce this phenomenon [2]. The most important difference to notice between the maps of the indexes is that there is a finer structure seen in the FA and LC, compared to the K-means. There is no evident difference between factor analysis and the linear combination, actually in Fig. 4.4 it can be seen that

(59)

4.1. GEOGRAPHICAL DISTRIBUTION OF SLUMS AND COMPARISON BETWEEN THE METHODS 43

they are practically lineally related. The Pearson correlation coefficient is of .86. Is there an advantage of selecting one over the other? I would simply pick factor analysis for the theoretical significance of the measurement, but I cannot say this results show any benefit over the linear combination.

Figure 4.4: It is clear that both indexes are linearly related, this is another proof it is essentially the same to sum all the parameters than to do factor analysis.

It is also important to compare the results with those of Connolly on 2000 data. To observe the difference we show her resulting map here.

(60)

Figure 4.5: This is the map obtain in Connolly’s work[8] on her ”housing quality index”, it corresponds to data from 2000. It is presented here for comparison with our 2010 results. It is possible to see a reduction of the ”Bad-Very Bad” areas.

A further discussion on the implications of this results is done in the Discussion. From this point forward in this thesis we will use the FA SSI

(61)

4.1. GEOGRAPHICAL DISTRIBUTION OF SLUMS AND COMPARISON BETWEEN THE METHODS 45

unless otherwise indicated.

It is interesting to see how the index correlates with other data fields and we found that in the case of the average academic level achieved by the dwellers there is a clear pattern shown in Figure4.6. The correlation coefficient is of -.4, what is possible to see is that the higher the value of the SSI the more difficult it is for samples to be have a high level of academic education on average.

Figure 4.6: This plots show that as the SSI increases people tend to have a lower academic level in average.

4.1.1

Validity of the index

For the validation of our Factor Analysis Index we compare it to CONEVAL index. This index is just a summation of all the parameters involved in the definition as LC or Patel’s index for indian slums. CONEVAL index has 7 parameters while our index only has 4, therefore it has a higher variance and it is more difficult to draw conclusion about what intervention could be made. Think about the relation we already found with the academic level, which would be one of the extra parameters in CONEVAL index. However the inclusion of this other parameters is key to a higher sensibility. Their index is presented in 3 levels Low, Medium and High social lag. Our index is in a continuous scale so that it is possible to see the structure within the slums. One last difference is that CONEVAL report results in one level of aggregation up, while our result is reported at the smallest level. This also shows that even at this two levels there can be areas wrongly classified, because of the level of detail. This results can be seen on Fig. 4.7.

(62)
(63)

4.1. GEOGRAPHICAL DISTRIBUTION OF SLUMS AND COMPARISON BETWEEN THE METHODS 47

Figure 4.7: This is Iztapalapa, a part of the city commonly known to be poor, it is part of what people call Neza-Izta-Chalco slum. See that in a lower level of aggregation (top) it is possible to distinguish some areas with deprivation that is ot available at the lowest scale that CONEVAL shows results (down).

To get a further corroboration of our results we compared some areas that showed high value of SSI with actual images from the street view in Google Maps. In Fig. 4.8 we show an area where we found a slum in Iztapalapa, it is interesting to see that as Connolly indicated in her work, many of the slums are near hills, where the land has a pronounced steep.

(64)

Figure 4.8: In the image in the right we show the GIS photo that corresponds to the red square over the green area in the map of the left. This maps comes from the results in the FA SSI. The photo comes from Google maps.

(65)

Chapter 5

Applications of our results

5.0.1

Application of the SSI: Assesment of politics

We could not get historical data at the smallest level of aggregation l4, with wich we have been working, nevertheless we got it for l2. This data com-prehends the census data from 1990 to 2010 every 5 years. We present this result on figure 5.1. It is interesting to notice here that the last decade to be ruled by the political party in charge since the Mexican Revolution was the 90’s, a decade marked with an economical crisis and abandoned social programmes that may account for the smaller change between 95 and 2000. The next decade was ruled by a new political party that followed different programmes to reduce poverty, more specifically granting loans for people to buy new houses or improve the exisitng ones. Another important thing to notice here is that CONEVAL report [6] shows an increase in the period of 2000-2005 which seems to contradict our results. This point will be discussed further on next chapter.

(66)

Figure 5.1: The slum index shows that Mexico city has been decreasing, the different changes in the period 90-00 and 00-10 correspond with the change of ruling political party

5.1

An idea on how the dataframe obtained

can be further analyzed

The dataframe that we built has been analyzed through some graphical in-spection and correlation analysis, nevertheless I wanted to see if I could take the analysis further by using Data Mining Algorithms. The dataframe has already been processed, and the idea is to define the training set with the at-tributes that were not used in the SSI definition, while using those atrributes as to define the class labels for a Decision Tree Classifier. From the tree diagram we will be able to see which attributes are used to split the samples into the different classes, so that we can use this attributes to improve our model.

In fig 5.2 we show a part of one of the tree diagram, to show it is easy to find the most important attribute, basically we see the first property that split the samples, in this case X[167] which corresponds to houses with more than 3 rooms. With this we get the different attributes that could be important to take into consideration in the rules of change of each of the slum characteristics. We also preform it with the SSI to get the attributes may not be identified for the single characteristics, but their impact can be seen in the combination of the characteristics.

Referenties

GERELATEERDE DOCUMENTEN

The exchange of data is made possible by these functional building blocks such as tags that identify citizen, sensors that collect data about citizens, actuators

(The text occurring in the document is also typeset within the argument of \tstidxtext.. The default value is to use a dark grey, but since the default values for the predefined.

This is a sample block of text designed to test \index., the layout. of the index. environment) and any .indexing application, such as makeindex.. ˇ

has spanned a .page break, you might want to check the terms indexed here to make sure they have the correct page numbers listed.. Something else that you might want to check,

Incidentally, that’s just a ˇ regular partial derivative symbol .∂?. Not to be confused with the spin-weighted partial derivative [you need

Since long Europe has had a focus on the internal energy market, but the rapid integration of renewable energy has introduced new dynamics and issues.. National policies can

people over the past few days this woman had ignored the devastating reviews of movie critics – and in doing so she had allowed the film’s studio and distributor to claim a

EnsembleSVM is a free software package containing efficient routines to perform ensemble learning with support vector machine (SVM) base models.. It currently offers ensemble