NoSQL databases in Archaeology

(1)

0 FACULTY OF ARCHAEOLOGY

NoSQL databases in

Archaeology

a Funerary Case Study

Rens Cassée;S1228226 15-12-2017

(2)

(3)

2 NoSQL Databases in Archaeology – a Funerary Case Study.

Rens W. Cassée S1228226

Thesis MSc Digital Archaeology 1044CS05H-1718ARCH Dr K. Lambers

Master Digital Archaeology and Archaeology of the Near East Leiden University, Faculty of Archaeology

Leiden, 15-12-2017. Final version

(4)

3

Content

1. Introduction ... 5

2. Case study ... 9

2.1. Introduction ... 9

2.2. The Pre-Pottery Neolithic B... 9

2.2.1. Funerary rites in the PPNB ... 12

2.2.2. The Pre-Pottery Neolithic B dataset... 15

2.3. Funerary data ... 18

2.3.1. Excavation result ... 19

2.3.2. Osteoarchaeology ... 21

2.3.3. Literary & museum studies ... 22

3. Database management systems ... 24

3.2. General history ... 24

3.3. DBMS in Archaeology ... 26

3.4. The Relation Database ... 29

3.4.1. Normalisation and consistency ... 31

3.4.2. Advantages and disadvantages ... 34

4. NoSQL ... 38

4.2. Consistency in NoSQL databases ... 39

4.2.1. Master-Slave and Multi-Master replication ... 39

4.2.2. BASE ... 40

4.2.3. CAP-theorem ... 41

4.2.4. Aggregate ... 42

4.3. NoSQL database management systems ... 43

4.3.1. Document store ... 43

4.3.2. Graph Store ... 46

4.3.3. RDF Triple-stores ... 48

4.3.4. Column store ... 48

4.3.5. Key-value store ... 49

4.4. Differences between RDBMS and NoSQL ... 50

5. RDBMS results ... 52

(5)

4

5.2. Creating the queries ... 55

5.3. query results and file durability ... 64

6. NoSQL results ... 66

6.1. Document store ... 66

6.1.1. Building the database ... 66

6.1.2. Creating the queries ... 70

6.1.3. query results and file durability ... 76

6.2. Graph store... 77

6.2.3. query results and file durability ... 87

7. Discussion of the results ... 89

7.1. Comparison ... 89

7.1.3. Discussing the results ... 92

7.1.4. Durability ... 93

7.1.5. NoSQL vs Relational ... 97

7.2. Case study results ... 99

7.2.1. Answering the questions ... 100

8. conclusion ... 111 Abstract ... 115 Samenvatting (Dutch) ... 116 Bibliography ... 118 Literature ... 118 Web pages ... 124

list of figures, tables, and appendices ... 126

Figures ... 126

Tables ... 135 Appendices ... Fout! Bladwijzer niet gedefinieerd.

(6)

5

1. Introduction

This thesis explores the possible use of NoSQL1_{databases in archaeology, or more} accurate the use of NoSQL databases in the field of funerary archaeology. The collection, analysis and storage of data are essential to any research area. However, the field of archaeology has a special relationship with its data. Archaeology tends to destroy its primary source of information: the archaeological record. Therefore, data observation, collection and preservation are more important in archaeology than in scientific fields in which experiments can be repeated. People have developed a wide variety of hardware and software to accommodate the observation, collection and preservation of research data; some of which are especially for archaeology (Corsi et al. 2013, 120-127; Sobotkova et al. 2013, 80-88; Smith et al. 2013, 89-99). Software programs that automate the storage, manipulation and retrieval of structured bodies of information are called Database Management Systems (DBMS hereafter) (Auer & Kroenke 2011, 9-11; Elmasri & Navathe 2007, 4-6; Lock 2003, 89). Nowadays, archaeologists mostly use the relational DBMS (hereafter RDBMS). This database type has risen to the position of dominance in the 1980s and has held that position ever since (Elmasri & Navathe 2007, 56; Ryan 2004).

When new techniques and applications developed, archaeologists were eager to adopt them in their research. These new technologies and applications resulted in a wider variety of data types. Instead of having only numeral and textual data, archaeologists must deal with digital photographs, 3D models, and georeferenced data. This creates an overwhelming amount of processed, non-relational and schema-less data. RDBMS can process these different types of data. However, this is a complicated and cumbersome process for a relational database. Archaeologists are not the first to encounter this problem. Large internet companies like Google, Amazon, and Facebook have faced the same problem as well. These companies process an enormous amount of different data. Therefore, they are searching for new techniques which help them to store data, retrieve and query data more efficiently. This search has led to the creation of NoSQL databases (“Not only” Structured Query Language) (Foote 2017; Jaffe et al. 2014, 144; Méndez Civieta, 7; Vaish 2013, 7-8). Archaeologists are already using some types of NoSQL databases. Most notably, they use triple-stores like graph store and RDF store (Carver 2013; Charno et al. 2013; Gruber et al. 2013a; Gruber et al. 2013b; May 2013). These authors have used an RDF triple store for their research. They argued that

(7)

6 this format is best to use in an online environment (Carver 2013, 498-500; Charno et al. 2013, 217-218; Gruber et al. 2013a, 257-258; Gruber et al. 2013b, 266-267; May 2013, 515-517). Besides from using the RDF triple stores and graph stores, few other attempts were made for exploring the possibilities of NoSQL. Furthermore, a comparison between the relational database model and different NoSQL databases is yet to be made.

This thesis tries to answer the questions: which NoSQL databases are compatible with archaeological data and what are the advantages and disadvantages of these NoSQL databases in comparison with RDBMS? To answer this question for the whole field of archaeology is a herculean and impossible task. Therefore, the focus of this thesis is on funerary archaeology, this to demarcate the question. The field of funerary archaeology can yield different types of data and enormous amounts of data. Thus, it can be seen as a preliminary field for archaeology. The case study uses a section of the Southern Levantine Pre-Pottery Neolithic B funerary database (Cassée 2016). Of this period, it is thought that the funerary practices are standardised. The database is created to challenge this consensus (Cassée 2016). Therefore, this database holds the data of numerous burials belonging to eleven different sites. This study uses a section of this database to do a comparison study between a relational database and two different NoSQL databases. The thesis compares four different aspects of database usage: the creation of the database and the data entry, the query languages, the query results, and the durability of the files in which the database is stored. These aspects are used as the basis for this research because they are essential for a DBMS to function. In this thesis, different NoSQL databases are compared with the relational database Microsoft Access; this RDBMS serves as a baseline for the comparison. Microsoft Access is one of the user-friendliest and most known DBMS around. It is frequently used by archaeologists to store and analyse their data. Furthermore, it is ranked seventh out of more than three hundred different databases types in DB-Engines2_{most used databases} (https://db-engines.com/).

It is essential to clarify the methodology further. This study has a primary and secondary research question. The primary goal of this study is to assess the suitability of NoSQL databases for funerary archaeology. The earlier databases types need to be discussed first, to understand NoSQL databases. Therefore, the history of database development from the perspectives of both the It-world as for the archaeological

2_{DB-Engines is an initiative to collect and present information on DBMS. They create a ranking of}

DBMS bases on their current popularity. The top ten of this list is dominated by relational databases (https://db-engines.com/).

(8)

7 society, the mechanics of the relational database, the mechanics of NoSQL, and the differences between RDBMS and NoSQL databases must be discussed. However, to assess the suitability of NoSQL for funerary archaeology, a case study is needed. As mentioned, the subject of the case study is the Southern Levantine Pre-pottery Neolithic B funerary rites. The consensus on these funerary rites is tested against a series of queries; the same queries are created in the differenced studied databases. This enables a one-by-one comparison between them. Secondly, the results of these queries are discussed, together with the impact these results have or can have on the consensus.

These questions will be answered within three parts: a theoretical framework, chapters two to four; a presentation and discussion the research results, chapters five to seven; and it will end with a summarising conclusion together with a critical review of the results, chapter eight. Within these parts, this thesis will answer the questions when or if a NoSQL database is more suitable for archaeology than RDBMS. To answer these questions, clear criteria need to be defined to assess the outcome. First, a case study is presented in the study; this case study will help to assess the suitability of NoSQL databases for funerary archaeology. Furthermore, in this chapter, the background of the case study is explored, and the area of funerary archaeology and what kind of different data it can yield. Additionally, it will try to validate the use of funerary data as a preliminary field for archaeology, in database related researches. It is essential to understand where funerary data derive from; this will eventually help with interpreting the query results later in this study. Chapter three summarise the histories of both the development of DBMS and the archaeological DBMS. Also, it explains how relational databases work and what their advantages and disadvantages are. The fourth chapter discusses NoSQL. Important questions that need to be answered are: What is NoSQL? What are the differences between NoSQL and relational databases? What types of NoSQL databases are there, and how do they work? Additionally, NoSQL programmes will be discussed in short, so that an assessment can be made to which programs can be used in this study.

Chapter five discusses the mechanics of Microsoft Access. The studies dataset is entered in Microsoft Access step-by-step. This step-by-step workflow is continued for creating the queries. Furthermore, in this chapter the presentation and analysation of results and the durability of the files in which Microsoft Access stores the data are discussed. Therefore, the results presented in chapter five are the baseline for the comparison. Chapter six presents the study results. It explains how the two NoSQL

(9)

8 databases work, by studying the same aspects and by using the same workflow as chapter five. The comparison is part of chapter seven. The aspects are compared to each other in chapter seven. As well as the results of the queries and their possible meaning an implication for the Southern Levantine Pre-Pottery Neolithic B funerary rites are discussed.

The last chapter, chapter eight provides a summary of the discussed matter, and the research question will be answered in this chapter as well. Furthermore, the chapters describe and discuss, in short, the remaining problems and unanswered questions which will need further investigation. Moreover, it shall answer the question: which NoSQL databases are compatible with funerary archaeological data and what are the advantages and disadvantages of these NoSQL databases in comparison with a relational database? Through this, this thesis is trying to continue the previously developed line in which archaeologists try new types of technological advantages. It hopes to do this by investigating if new and various kinds of database yield the same type of results as traditional RDBMS.

(10)

9

2. Case study

2.1. Introduction

In this chapter, the case study is presented. However, to understand the background and the context in which the case study is situated, a short overview of the Pre-Pottery Neolithic cultural period is provided. In the second part of this chapter, the nature of funerary archaeology is discussed. Questions can be created based on the characteristics of the Pre-pottery Neolithic B funerary rites and the different components which can be studied in a funerary study.

2.2. The Pre-Pottery Neolithic B

The Pre-Pottery Neolithic B (PPNB) is a cultural period in the Fertile Crescent. The Fertile Crescent is in modern-day Turkey, Iraq, Syria, Lebanon, Israel and Jordan. The period ranges from 8.500-6.700 cal. BC and it can be subdivided into three different periods, the early PPNB (8500-8100 BC), the middle PPNB (8100-7250 BC) and the late PPNB (7250-6700 BC) (Banning 1998; Croucher 2012, 30; Kuijt & Goring-Morris 2002; Twiss 2007, 25). The PPNB shall be explained following this periodisation. As with all archaeological periods, the lines between these subdivisions are arbitrary. Nonetheless, they are based upon small changes in the material assemblages (Kuijt and Chesson 2005, 161; Twiss 2007, 25 & 28). During all these periods it is assumed that these people lived in a complex egalitarian society (Twiss 2007, 25).

During the early PPNB people became increasingly more sedentary. They started occupying the intermediate zones in which food was rarely scarce, while they were still highly reliant on hunting and gathering. People were less likely to move. Therefore, they started to invest more time in their houses. As result of this, the architecture changed from round structures to more oval or rectangular constructions (figure 2.1) (Khalaily et al. 2007, 9-10; 32-33: Kuijt & Goring-Morris 2002, 384-387). However, the settlements stayed relatively small.

During the middle PPNB, the size of the settlement increased, and the round architectures disappeared. Plastering the walls and the floor became a standard practice during this period (Kuijt & Goring-Morris 2002, 392; Twiss 2007, 29-31). Furthermore, People started to use different means of producing food. Nonetheless, Hunting and gathering still played a significant role in providing food (Kuijt & Goring-Morris 2002, 399-400; Horwitz et al. 1999, 76-78; Twiss 2007, 30). Therefore, the transition from a mobile

(11)

10

Figure 2.1: On the left a reconstruction of Pre-Pottery Neolithic A dwellings, the period which precedes the PPNB. On the right a reconstruction of PPNB dwelling (Simmons 2007, 96; 134).

to a sedentary society started during this period. The shift in diet is seen as evidence of this transition. Because of this dietary shift, the hunted game and the gathered fruits, vegetables, and plants like barley, wheat, chickpeas, and lintels became increasingly more important (Kuijt & Goring-Morris 2002, 399-400; Horwitz et al. 1999, 76-78; Twiss 2007, 30). The way of living and the diet were not the only things that changed during this period. Zoomorphic and anthropomorphic figurines started to appear during the MPPNB (figure 4.2). These figurines range from small female figures to large humanoid statues (Kuijt & Chesson 2005, 163; Kuijt & Goring-Morris 2002, 369-399; Rollefson 1986, 47-50; Rollefson 1983, 30-35).

As in the middle PPNB, the late PPNB saw an increase in settlement size as well. However, these PPNB settlements did not compose solely out of residential architecture. People began to build structures which were for storage, rituals or community affairs (figure 2.3) (Kuijt & Goring-Morris 2002, 405-406; Twiss 2007, 31). It was in the late PPNB that goats and sheep became entirely domesticated, while cattle started to be domesticated as well (Horwitz 1999, 76-78; Twiss 2007, 31). Together with the increased reliance on domesticated animals, the dependence on domesticated crops increased too. Nevertheless, Hunting and gathering were still as relevant as during the EPPNB (Kuijt & Goring-Morris 2002, 399-400; Horwitz et al. 1999, 76-78; Twiss 2007, 30). Overall, the PPNB was a continuously changing area. There is evidence that during this period sedentarisation began. Hamlets became villages and villages became towns. This increase in settlement size was needed to house the growing population, which grew as result of the ability to store food. The storage of food became needed because more food was produced then consuming. And still, these people were all each other equal

(12)

11 they were still an egalitarian society. So how could this seemingly egalitarian society grow without

Figure 2.2: On the left a middle PPNB clay cattle figurines. On the right a headless female figurine with pendulous breasts and extended abdomen, also from the middle PPNB (Rollefson 2000, 168-169).

Figure 2.3: Architectural reconstruction of a two-story LPPNB building. Note the storage rooms surrounding the first-floor central room, and the open second-floor that likely served as a residential area (Kuijt and Goring-Morris 2002, 409).

(13)

12 becoming hierarchical? It is believed that the funerary rites hold the answer to this question.

2.2.1. Funerary rites in the PPNB

During the PPNB and the preceding Pre-Pottery Neolithic A period, the funerary rites seem to be highly standardised. The same characteristics are found in both these periods. Both adults and young adults were buried either in the courtyard or their houses. While infants are often found underneath or in walls (Rollefson 1986, 50). The deceased was always laid in a shallow grave, either in an upright or flexed position (figure 2.4). Sometimes two or more people share a single grave. Al these people were of adolescent or adult ages, Accidentally, was the deceased accompanied by an infant, or more uncommonly by animal bones. The gift of grave goods was, even more, rarer (Croucher 2012, 35-39; Goring-Morris 2000, 121; Khalaily et al. 2007, 27-28; Kuijt & Goring-Morris 2002, 376; Rollefson 1986, 50). There are few to none mentions on the preferred orientation of the deceased or the dimensions of the grave. Moreover, if these two characteristics are mentioned, author mostly mentioned that there was no preferred orientation or shape (source). Furthermore, it seems that the PPNB society did not differentiate between male, female, young or old.

(14)

13

Figure 2.4: A typical PPNB burial. Placed beneath the lime-plaster floor of a house. After some time had passed to allow for decay of the flesh, the burial pit was reopened, and the skull removed (Rollefson 2000, 170).

After the deceased was buried, he or she was covered with the backfill; the descendants left the grave marked by not covering the backfill with plaster; out of which their floors were made. Therefore, it seems that offspring closely monitored the graves. This can be concluded due to the absence of plaster, which was removed before the inhumation. Reapplying the plaster was an exception rather than the standard (Goring-Morris 2000, 120). The graves were closely monitored because they are to be reopened sometime later. Therefore, it was important for the descendants to locate the grave, which was a necessity during these periods. When the soft tissue was decomposed, the offspring would open de grave to remove the skull of the deceased (Croucher 2012, 94-115; Kuijt 2000, 137-160; Kuijt & Goring-Morris 2002, 376). Subsequently, the offspring removed the remaining soft tissue on the skull, and they covered the face with plaster. This plaster layer was given human characteristics by sculpting or by painting a face on it. After a face was recreated, the community came together to bury multiple plastered skulls in a single grave. Kuijt (2000, 137-160) argued that the retrieval of the skull and the communal rebury of the skulls were necessary to keep the community egalitarian. He claimed that the time that was needed to retrieve the skull; clean, plaster and paint them; and reburying them with other skulls (figure 2.5) helped the society to stay egalitarian. This process ensured that the community did not have time to form

(15)

14 different powerhouses to create a kind of authority (figure 2.6) (Kuijt 2000, 158-159). This ritual is part of the so-called ‘skull cult’. McKinnon (1991 in Kuijt 2000, 144-145) argues that:

Figure 2.5: A skull cache from Tell Aswad. These skulls were obtained by reopening ancestral graves. After which, the skulls were plastered, from this plaster faces were created (Stordeur and Kwawam 2007, 27).

“[In] societies in which skull caching occurs, the secondary burial of the skull is often viewed as an obligation by the deceased’s household and a necessary stage of a multi-year mortuary ceremony, requiring substantial participation from another household in the village and neighbouring communities.”

Following this argumentation, people did not have enough ‘time’ to create power. They were too occupied with day to day activities, including the skull cult ritual. The skull cult and the characteristics named above do indeed describe a highly standardised way of burying the deceased. However, recent excavations at the sites of Motza and Tell Qarassa can contest these standardisations (Ibáñez et al. 2010; Khalaily et al. 2007, 26; Santana et al. 2015, 113). They deviate from the standard as described above, and paint a slightly different picture. These sites allow us to study the absence of description mentioned earlier.

(16)

15

Figure 2.6: Generalised stages of Pre-Pottery Neolithic primary and secondary mortuary practices. Note differences in timing, community participation, and possibilities of participation by multiple households (from Kuijt 2000, 144).

2.2.2. The Pre-Pottery

Neolithic B dataset

The used dataset comprises out of the funerary data of the six Southern Levantine PPNB sites, Tell Aswad, Tell Qarassa, Motza, Beisamoun, Beidha and Abou Gosh (Figure 2.7). The data was gathered to study the funerary practices during this period and was collected by a literary study (Cassée 2016). The current consensus is that the funerary practices of this period are regulated and somewhat standardised, as we have discussed in the previous paragraphs (Croucher 2012, 35-39; Goring-Morris 2000, 121; Khalaily et al. 2007, 27-28; Kuijt & Goring-Morris 2002, 376; Rollefson 1986, 50). A significant problem encountered with assembling the dataset was the different ways in

Figure 2.7: Map of all the Southern Levant Pre-Pottery Neolithic. The sites used in this study are underscored (after Twiss 2007, 25).

(17)

16 which data was published. For example, data is spread across multiple publications, so by missing one article, the data set is incomplete. Alternatively, the way of publishing data differs per article, in one article the data is published within the text, while the other article published the data in a table. The relational database that was created to accommodate the data did not always comply with the published data. Therefore, many alterations to the database were made to accommodate the different ways of data publishing. The rigid nature of relational database does not comply with the flexible way of data publications. A short assessment per site is created to present the dataset. These concise assessments are provided below. Every assessment provides information on the location of the site, by whom it was excavated, to which period the site was dated, how many individuals were found and how the author interpreted the funeral remains. Furthermore, every site has its own appendix in which the data is that is used in this study, is published. However, explaining the metadata – data about data – of each table separately is an inefficient and time-consuming process. Therefore, the metadata is presented in Appendix B. Because the metadata and the layout of the appendices are all the same. This appendix explains the metadata and the layout of Appendix C until M. Tell Aswad

Tell Aswad lays Southeast of Damascus (Syria), and has been excavated in two different excavations. In this thesis, we will make use of the second excavation by Stordeur et al. (2006). The first excavation by Henri de Contenson will not be used for it has unreliable C-14 dating, and the first publications regarding the excavations not as reliable as those of Stordeur et al. (2006). The excavation of Stordeur and colleagues was held from 2001 until 2006. During this time, they discovered that the site had three PPNB occupational phases (Stordeur et al. 2006, 40). Within the LPPNB they have found seventeen burials containing a total of forty-eight individuals (Appendix C). All the found individuals have been described by Stordeur and colleagues (2006, 42-51). However, they failed to mention the sex of many of the adult individuals.

Tell Qarassa

Twenty kilometres west of the city of Sweida (Syria) lays the site of Tell Qarassa (Ibáñez et al. 2010). The site dates to the EPPNB (Ibáñez et al. 2010; Santana et al. 2015, 113), a total of twenty-six individuals buried in nineteen different graves, have been found on the site, most of them follow the characteristics as described above (Appendix D) (Ibáñez et al. 2010). However, the site is far better documented than contemporary

(18)

17 sites. Therefore, it can yield more information on the burial practices than other archaeological sites.

Motza

Motza is located 5 kilometres West of Jerusalem (Khalaily et al. 2007, 26). Of the six layers found at the site, two belong to the PPNB. Within the PPNB layer of Motza, the excavators discovered nine graves containing twelve individuals (Appendix E) (Khalaily et al. 2007, 26). Of these twelve individuals, only five were an adult while the rest were children (Khalaily et al. 2007, 26). The authors provide no clear explanation regarding the interpretation of the funerary remains.

Beisamoun

The site Beisamoun is situated in the southern part of the Hula Basin in Israel. A team led by M. Lechievallier started excavating the site in 1969 (Lechievallier 1978, 147-150). This team was responsible for the excavation up and until 1972. It was not until 2011 that another team started excavating the site. This team, led by H. Khalaily and F. Bocquentin did find the same LPPNB patterns on the site as Lechievallier did (Bocquentin et al. 2011, 197; Lechievallier 1978, 147-150; Stern et al. 1993, 17-178). Together, they discovered eight different burials containing twenty-two individuals (Appendix F). Many of these individuals have been described by Ferembach & Solivères (1978, 179-190).

Beidha

Beidha is one of the earliest excavated sites in this study. The site is situated three kilometres north of Petra (Jordan). Excavations started in 1957 under the supervision of Diana Kirkbride (1966, 22-23; 1967, 9). During the years of excavating the site, a total of thirty-six individuals have been exhumed. These people were buried in thirty-two different burials, all these people are found in the late PPNB layers (Kirkbride 1960, 136-140; 1966, 22-23, 1967, 23). Kirkbride (1960, 24) argued that some of these people were buried for the second time because some were missing hand and feet phalanges. The different burials and their specifics are visible in Appendix G.

Abou Gosh

Abou Gosh lies in proximity to both Motza as the city of Jerusalem. Excavations were held from 1967 until 1971, under the lead of Lechievallier (1978, 9-13). During these years numerous archaeological layers have been defined as well as architectures corresponding features of PPNB. However, only a small number of individuals were

(19)

18 discovered during the four years of excavating. The thirty-one persons were buried in twenty-two separate graves. One of these graves holds the remains of nine individuals (Appendix H) (Arensburg et al. 1978, 95-106; Lechievallier 1978, 35-40; Stern et al.1993, 4-3). Furthermore, the author notes that many of the found individuals are found in a shallow grave and a flexed position lying on their side (Lechievallier 1978, 35).

2.3. Funerary data

The definition of funerary archaeology seems a bit ambiguous, and every researcher has their definition of the subject. Thus, the definition of funerary archaeology can vary from researcher to researcher. However, the core idea of the field is the study of components funerary rites and rituals and what they can say about the beliefs and convictions of the living about the deceased (Cradic 2017; Croucher 2012; Ko 2009; McKinley 2006; Murphy 2008; Parker Pearson 1999, 3-6; White & Folkens 2005). However, Parker Pearson (1999, 3-4) has given a simple explanation, his definition of the field is used in this study. He defined it as the study of the rituals and rites surrounding a funeral, in relation to the deceased’s personhood, gender, social status, age and beliefs. A multitude of authors (Cradic 2017; Croucher 2012; Ko 2009; McKinley 2006; Murphy 2008; White & Folkens 2005) used his definition of the field as the basis for their publications. Cradic (2017) used the components that Parker Pearson (1999, 3-4) used, and she complimented these with diseases, typology of the grave, the positioning of the body and textual evidence to the list of components to study funerary archaeology. The other authors did add some of their ideas to the Parker Pearson’s list of elements as well. For example, White & Folkens (2005, 8) mentioned it briefly before explaining the human anatomy. While Murphy (2008, XI) indicates the gap in Parker Pearson’s study, although she uses the premises as a basis for her study. Therefore, the variation of data funerary research can yield can be enormous. The different types of information are already briefly discussed, although they still need further extrication. In the following paragraphs, the complexity of funerary data is examined, as does the used dataset and its background. To understand the dataset the cultural period should be outlined briefly. However, the theories on the funerary believes and rites of that period should be explained more thoroughly.

What kind of data can we expect from a funerary research? If we follow the explanation of both Parker Pearson (1999, 3-6) and Cradic 2017 (219-222); then we can group funerary data into four different categories. These are:

(20)

19  Osteoarchaeology (description and analyse on the human bones);

 literary research (historical evidence and ethnographic evidence) (Cradic 2017);

 moreover, museum studies.

These categories all yield different kinds of data which can be studied independently as well as in relation to each other. The acquired data is always a result of the type of study and the studied periods because specific attributes are only found in specific periods or cultures. For example, it is possible to create a table of which the subject is ‘coffin,’ the purpose of this table is to describe the coffin in which the individuals are found. This can be a logical and valid research object if a Christion cemetery is being excavated. However, it is illogical if one would use the same table in a study on the PPNB funerary rites because people were not buried in coffins. Therefore, it is possible that not all the possible characteristics and attributes are named within this chapter. The acquired data is always a result of the type of research and the studied periods. For this reason, it is not possible to describe all the possible characteristics and attributes.

2.3.1. Excavation result

Every excavation has its own way of documenting funerary remains. Moreover, the type of documentation is pre-determined by the expectations that are raised in preliminary studies of the region. Therefore, the obtained data will differ per excavation. The acquired data shall mostly comprise out of textual or numerical information. However, it is possible to collect photographs, 3d-imagery, and georeference locations as well as a description of the context, the preservation or different construction techniques. Key things to register during the excavation of funerary remains, following Parker Pearson are:

1. The shape and depth of the grave: which may relate to the social status or gender of the buried person or a degree of formality in the burial rite (Parker Pearson 1999, 5).

2. The orientation of the grave: this can be of importance, as there are religions that bury their deceased in a specific direction. Therefore, orientation of the dead is unlikely to be random (Parker Pearson 1999, 6)

3. The body arrangement there are many ways a body can be arranged; Is the body still articulated or are the bones neatly re-arranged?

(21)

20 4. The presence of grave goods: this includes all items that are present in the grave, ranging from vessels with foods and drinks to the adornment of the body with weaponry, clothes or jewels (Parker Pearson 1999, 7-11).

5. Grave marker: the deceased is buried in a mausoleum or tombstone, which marks the burial and both, can denote the importance of the deceased.

These five different characteristics allow archaeologists to understand the beliefs of the next of kin, for they decide the representation of the deceased in the (possible) afterlife (Parker Pearson 1999, 3-4). The five characteristics yield different types of information; table 4.1 shows the kind of information an excavation can yield based on the five predefined characteristics. This table does not cover the full extent of all the possible data that an excavation can yield. However, the table does show a high number of possible attributes; these different attributes are possible to find on an excavation. In an RDBMS, like Microsoft Access, these attributes are stored in several related tables.

Table 4.1: table containing the five predefined characteristics of a funeral excavation with their corresponding attributes. For every attribute, the possible type of data and a few examples are given.

Characteristic attributes Type of data examples

The burial Shape Textual Oval, round, rectangular, irregular

Measurements Numbers Length, width and depth

Location Textual/numerical Coordinates, buried underneath houses

Type of coffin Textual Materials, no coffin, cloth, tree trunk coffin

Type of burial Textual Single, burial or mass grave

MNI Numerical Minimum number of Individual

Period Textual 500 AD, Late-roman

Orientation Cardinal direction Textual Northeast, South, South-west Body arrangement arrangement Textual

Articulation Yes/No Primary or

secondary

Textual Primary burial or group burial

Disturbed Yes/No No disturbance, disturbed by

looters

Alignment textual Orientation of the cranium: North-east, South, South-west

Grave goods Presence Yes/No

(22)

21

Material Textual Flint, wood, metal

Organic residue Yes/No

Grave Marker Type Textual Tomb, gravestone

Material Textual Marble, granite, wood

Style Textual Classical, Egyptian, baroque

Inscription textual

2.3.2. Osteoarchaeology

An Osteoarchaeological report contains a wide variety of data. This acquired data depends on the quality of the physical remains of an individual. Because human remains are a reoccurring phenomenon, a general guide was created (White & Folkens, 2005, 356) on how to write an osteological report. Most osteological reports cover several different points. These are the introduction; bones present; context and condition of the remains; pathology; anomalies; trauma, age, sex, race, stature, and weight; time and cause of death; individuation; metrics and nonmetrics; and a summary (White & Folkens 2005, 356-357). Of these ten points, seven points yield a specific form of data regarding the individual. The remaining three have data that is already present somewhere in the database, for example, time and cause of death can be explained by trauma, age, sex, race, stature and weight. Furthermore, the introduction and summary present data that has been published in the report already. Therefore, only the following seven points are considered as attributes that hold primary data:

1. Bones present: a list of the analysed skeletal remains. Sometimes with MNI determinations and their explanations included (White & Folkens 2005, 356) 2. Context and condition of the remains: Make notion of the environment in

which the bones were found. However, the context is already discussed in the excavation part. The condition of the bones can provide useful information, especially the presence of soft tissue (White & Folkens 2005, 356). Soft tissue can yield DNA, which can include information regarding sex and race.

3. Pathology: Any evidence of osteological manifestation of diseases and (healed) fractures on the bones (White & Folkens 2005, 356-357)

4. Anomalies: Anything unusual about the skeletal remains should be noted here, like supernumerary digits or other nonmetric traits (White & Folkens 2005, 357).

(23)

22 5. Trauma: report any signs of osteological trauma, ranging from head fractures to excavation-related fracture. Try to determine how recent the fractures are (White & Folkens 2005, 357).

6. Age, sex, race, stature, and weight: For these, be as accurate as possible but do not give estimates whose precision is not warranted (White & Folkens 2005, 357)

7. Metrics and nonmetrics: report standard dental, cranial, and postcranial measurements, as well as observations on nonmetric traits.

These above characteristics provide a variety of different types of data. Some of the data is highly descriptive, like the trauma, anomalies and the pathology while other data can be represented by a binary option, the bones present for example.

2.3.3. Literary & museum studies

Both literary & museum studies can provide the same kind of information as described above. E.g. a researcher can describe the shape and dimensions of a burial together with the position in which the individual was buried. However, the data in publications and at museums is already interpreted. Furthermore, every researcher and museums present their data differently. This different presentation can lead to misinterpretations of the data sets. For example, one author mentions that ninety-nine individuals were buried, without mentioning how they were buried or where, while the other researchers can present their data in tables. Therefore, this data should never be copied one to one. Additionally, the data that both types of studies are often incomplete data sets. The dataset only represents a small part of the complete research. If data derives from either literary or museum studies, then the origin of the data should be mention in the database as well.

In this chapter, the complexity of funerary data has been discussed. It has been shown that the data can derive from multiple sources and that these sources have multiple ways of storing the data. However, it is impossible to note all the possible types, characteristics and attributes of the data. Furthermore, the dataset has been presented. The dataset comprises out of a high number of funerary remains. Of these remains, it is thought that they represent a high standardisation in the funeral rites (Croucher 2012, 35-39; Goring-Morris 2000, 121; Khalaily et al. 2007, 27-28; Kuijt & Goring-Morris 2002, 376; Rollefson 1986, 50). These are:

(24)

23  Infants were often found underneath or in walls

 The grave was shallow; there seems to be no clear shape of the grave  The deceased was put in an upright or flexed position.

 The grave was unmarked

 Skull caching seems to be the norm rather than the exception.

With the presentation of the dataset and the nature of funerary archaeology, it is possible to analyse the dataset and to compare these results with the consensus on the Southern Levantine PPNB funerary rites. However, this is only possible if the data is correctly stored in a database. How one should store data in a relational database correctly is discussed in the next chapter. Furthermore, in the same chapter, the history of database development is discussed. This to create an understanding why databases became necessary, how the relational database became the dominant database type on the market, and why NoSQL databases became necessary. In chapter five the queries that were created to contest the consensus are presented. The results of these queries and their outcome will be discussed in chapter seven

(25)

24

3. Database management systems

3.1. Introduction

This chapter shortly discusses the development made in DBMS evolution for both the perspectives of the IT-world and archaeology. This is needed to create an understanding of how NoSQL is developed, and why it was necessary to develop it. However, this chapter does not solely discuss the developments regarding DBMS; it will explain the mechanics of RDBMS as well. This chapter will discuss how consistency is guaranteed in RDBMS, how RDBMS works, and what die advantages and disadvantages are of the relational database.

3.2. General history

Before the 1950’s, before the first computers were developed, information was stored on so-called punch cards. This changed when the first computers were created. In 1950, the first computers were giant calculators; as opposed to the small and compact ones of today. Data like names and phone numbers were redundant and unnecessary. Therefore, this kind of data was the leftovers of processing information, and therefore, they were not stored (Foote 2017). A decade later, in the 1960’s computers became commercially available. This new availability opened a new way to in which the extra data was perceived. To accommodate this new view, Charles W. Bachman designed in 1960 the Intergraded database system or IDS. The IDS became known as the first DBMS. The IT-company IBM did not want to be left out and created their DBMS, the Information management system or IMS (Auer & Kroenke 2011, 20; Foote 2017; Gehrke & Ramakrishnan 2003, 6). Both the IDS and the IMS were flat file databases, like Microsoft Excel. They reflected the simple hierarchical data structure of the card index. In a flat file database, each record consists of series of logically related fields, and each file contains a series of logically related records. A flat file database has only one table. This makes the flat file system more primitive than current systems; however, for some task, it will suffice. Therefore, the IDS and the IMS are viewed as the precursor of modern-day DBMS. (Elmasri & Navathe 2007, 6; Gehrke & Ramakrishnan 2003, 6; Lock 2003, 89).

By the mid-1960s computers became faster, more flexible, and more popular. Because of the increased popularity, many kinds of general DBMS became available for consumers (Foote 2017; Gehrke & Ramakrishnan 2003, 6). As a result, customers demanded a form of standard DBMS language. This standard became known as the Conference/Committee on Data Systems Languages approach, or the CODASYL

(26)

25 approach. This method was complicated to learn, and it required training before one could use it (Auer & Kroenke 2011, 20; Foote 2017; Gehrke & Ramakrishnan 2003, 6). Edgar Codd (1970, 377-387), an employee at IBM, was unhappy with the lack of a search engine in the CODASYL approach. He outlined his ideas (Codd 1970, 370-387) on how to construct a relational DBMS. At first, IBM did not use the ideas of Codd until the year 1974. So, others used the ideas of Codd, and they created the first Relational DBMS in 1973 (Foote 2017). IBM not wanting to be left behind on the IT-market, created the SQL (Structured Query Language) (Auer & Kroenke 2011, 20-21; Foote 2017; Gehrke & Ramakrishnan 2003, 6). The combination of RDBMS with SQL became the dominant software on the market; which it still is up and to today.

Even though the relational database model dominated the market since the mid-1970s, it was not that no other database types were developed. For example, during the 1990s people wanted to contest the dominance of the RDBMS. These people believed that the in-memory structures, like field-forms, could and should be stored directly on the disk, without taken the forms apart before putting them in different, related tables. E.g. a field form can consist out of different attributes like location, slope, coordinated, present structures and anomalies, these different attributes can be separated and stored access multiple different tables. This led to the creation of the Object-Oriented Databases (OODB). The OODB were considered a competitor to RDBMS (Elmasri & Navathe 2007, 24). They responded to the need to store and share complex-structured objects, like field-forms (Elmasri & Navathe 2007, 24). Nonetheless, the complexity of the database model and the lack of an early standard language contributed to their limited use (Elmasri & Navathe 2007, 24; Sadalage & Fowler 2012, 77%).

RDBMS are an efficient way to store and process structured data. However, computers are continuously evolving and becoming faster and more versatile. Additionally, data like art, photographs and music became more widespread during the 2000s; this created a problem for the RDBMS. Because this kind of data is both non-relational and schema-less (Auer & Kroenke 2011, 20-21; Foote 2017; Gehrke & Ramakrishnan 2003, 6), and RDBMS were not capable handling this kind of information. Therefore, the need for a database type that did not have tables or schemas became a demand. Large internet companies like Google and Amazon started to develop their DBMS (Foote 2017; Vaish 2013, 8-9). These companies created a database type which did not follow the SQL rules so that the issues of scalability ─the larger the database, the slower it gets─, parallelisation ─two or more rela onal databases using the same databases─, and cost

(27)

26 became less of a problem (Vaish 2013, 8-9). The creation of the web-scale databases that companies like Google and Amazon use, eventually led to the NoSQL movement at the start of the 21st_century.

3.3. DBMS in Archaeology

These developments have also had their influence in archaeology. The first known use of computers in archaeology dates to the 1958 or 1959, in France (Cowgill 1967, 17). However, most archaeologists were still reluctant to use these new “computerised” methods. They viewed that the projects which were aided by computers were f uneven quality and sophistication (Cowgill 1967, 17). Therefore, it had taken several years before computers became integrated into to the archaeological research. The programs that were used before the development of the RDBMS, as designed by Codd (1970, 370-387) were mostly created by archaeologists themselves (Cowgill 1967, 17-21; Labrador 2012, 379). Just as with the general development of data storage, the first automated storage of archaeological values was done on punch cards. It was not until the hard drive was widely available in the 1960s that users could access data regardless of their location in the physical storage system (Labrador 2012, 379-380). In the 1970s, after Codd (1970, 370-387) had defined the relational model, archaeologist started to develop their systems for storing and retrieving data. However, these homemade relational databases were less favoured than the commercial and self-contained databases (Labrador 2012, 380).

In 1987, fourteen years after the first RDBMS was developed archaeologists were already experiencing the disadvantages of the first database programs (Stead 1988, 279). Problems concerning inconsistency and the large number of data fields which were needed to satisfy the established requirements (Stead 1988, 279) led to the idea of separating the user from the data structures (Stead 1988, 279). This idea, together with the experience in the development of front-end processors, was what led to the development of the model of an Integrated Archaeological Database (IADB) (Rain 1994, 207; Stead 1988, 279). The purpose of the IADB was to address the data throughout the lifespan of an archaeological excavation project (Rain 1994, 207-208; Stead 1988, 279-284). To accommodate this, the creators of the IADB have created an overall structure that is fundamentally hierarchical. This structure allowed the storage and the accessibility to all original excavation records, including finds, contexts, single context plans, photos and stratigraphic relationships (Clarke et al. 2003, 401). The IADB is continuously altered and improved since 2004 onwards is entirely web-based

(28)

27 (http://www.iadb.co.uk/). Moreover, this is still in development (Clarke et al. 2003, 401). As of 2008, one of the developments IADB is undertaking is allowing the users to explore and make sense of the raw and unpublished data (Clarke 2008; 10-12; Clarke et al. 2003, 402). Since 2008, the IADB can be accessed from anywhere in the world and can contain most of the archaeological information gathered on site (Clarke 2008, 10). The IADB was not the only DBMS specific to archaeology. Andresen & Madsen (1996) did not wonder that computer-based recording systems or DBMS for archaeological excavations have been and are continuously being created all over the world (Andresen & Madsen 1996, 3). However, they did speculate that most of these DBMS are either simple flat file systems or hierarchically organised systems. These hierarchically systems were tree-like databases (Andresen & Madsen 1996, 3). This is what they thought of the earlier version of the IADB. They argued that these flat files and hierarchical DBMS are too simple to accommodate the inherent complexity, diversity, and quantity of archaeological data (Andresen & Madsen 1996, 3). Therefore, they created a general system for the recording, analysis, and presentation of information from archaeological excavations called the Integrated database for excavation analysis or IDEA. They intended to serve as an archive of record from all kinds of archaeological excavations (Andresen & Madsen 1996, 3). The core of their system became Microsoft Access, a relational database. They chose for Microsoft Access because it was easy to learn, and more and more organisations were starting to use it (Andresen & Madsen 1996, 3). Unfortunately, their grand ended before they could finish the promising project.

However, theirs was by far not the only archaeological adaptation of standard RDBMS. There have been, and probably still are, countless attempts to market databases initially developed for a specific project to the wider community, mostly with little success. This little success can be the result of a few factors, like the end of a grant (Andresen & Madsen 1996) or the lack of a wide public which results in the limited use of a database. One of these is called the Archaeological recording kit or ARK. Ark is a web-based toolkit for the collection, storage and dissemination of archaeological data based on MySQL (RDBMS) and it is entirely open source (http://ark.lparchaeology.com; Eve & Hunt 2008). ARK does not dictate what or how the archaeological record should look like, instead it can adapt to any digital or paper-based recording system (http://ark.lparchaeology.com; Eve & Hunt 2008). Another that is worth mentioning is the Sustainable Archaeology database or SA. SA is a web-based database and research interface (http://sustainablearchaeology.org/). The people behind SA have created their database

(29)

28 to serve as the repository for digital site reports, maps, plans, geospatial data, and other archaeological generated data (http://sustainablearchaeology.org/).

Most of the mentioned database management systems have a relational nature. Moreover, different forms of RDBMS dominated the field of archaeology ever since, creating the need for a standard archaeological design concept. Therefore, Anderson et al. (2011) and Ossa & Simon (2012) both have written papers on how to build a reliable structured archaeological database for excavations and surveys. As result of the dominance of the RDBMS in archaeology, few projects were done assessing new DBMS concept. Therefore, it was not until the 2010’s that new projects started called the FAIMS mobile platform (Sobotkova et al. 2013). The Federated Archaeological Information Management Systems or FAIMS is a major eResearch project established to build new tools to collect and share archaeological data (Sobotkova et al. 2013, 80). It allows the users to collect raw data in the field with their tablet or mobile, in an offline environment (Sobotkova et al. 2013, 80-83). This offline environment uses a local server (www.fedarch.org). Additionally, it recognises the location of the user and can georeference this place to the data that is collected in the field. They noted the limitations of RDBMS as well. They argued that RDBMS are not optimal or sustainable within this environment (Sobotkova et al. 2013, 85), this is due to the broad range of research strategies that archaeologists use, each of them requires a customised data structure (Sobotkova et al. 2013, 85). Therefore, they used a relational database scheme that mimics NoSQL.

The general trend that is visible in both the archaeology DBMS usage and the general DBMS usage is that the used systems are becoming increasingly complicated. Beginning with the flat file systems towards the RDBMS and from RDBMS to NoSQL. While, within archaeology, it starts with the IAD, which gradually became a web-based platform, towards FAIMS system. The archaeological DBMS changed, however, these systems stayed relational. Does this relational database model still suffice today, even though archaeologists generate an increased amount of data? From an IT perspective, when the amount of data increased and when the amount of different data types increased, and the time’s data needs to be shared, updated, and analysed increased as well, a new software type required to be developed, this became NoSQL. In archaeology data becomes more complex, it derives from different sources to which the software needs to be able to adapt, and archaeologists are starting to share their research more often. Therefore, the same trend is visible in archaeology as in the DBMS community. Despite

(30)

29

Figure 3.1: A pie chart based on the number of users a database has in percentage. Note the 79% of market share relational databases have (db-engines.com).

this, most archaeologists still use the relational database as their first choice. However, does the relational database model still suffice in today’s field of archaeology, or should there be looking for alternatives outside archaeology? Before, these alternatives can be studied, and before a comparison can be made, the mechanics of RDBMS need to be explained.

3.4. The Relation Database

First, to show how often RDBMS are used, an overview of the top ten most used databases s shown in table 3.1. Of the ten databases present in this table, six are relational databases (db-engines.com). Furthermore, figure 3.1 Is showing the percentage of usage per database type

(db-engines.com). Now it is clear that RDBMS are still the dominant database on the marker, the mechanics of the relational database itself should be explained. Relational databases store data in tables. The tables are connected to each other with so-called joins; these joins express the relations between the tables (Auer & Kroenke 2011, 58-63; Elmasri & Navathe 2007, 183-188; Gehrke & Ramakrishnan 2003, 455-457). They allow us to create compelling queries across different tables. Each table tells us something about the subject of the table (Auer & Kroenke 2011, 58-63; Elmasri & Navathe 2007, 183-188; Gehrke & Ramakrishnan 2003, 455-457). For example, we have a table called “sites”, this table should only have attributes which says something regarding the site, like geographic location, region, and size. These attributes are often predefined. Furthermore, they have limitations. Sometimes it is only possible to describe an attribute with a number, or a small amount of text. E.g. the size of a site in square meters or the name of the region in which a site is situated. If no data is entered in the table, then the table contains ‘NULL’ value. So, it is necessary for RDBMS to follow a fixed table schema (Auer & Kroenke 2011, 404; Elmasri & Navathe 2007, 183-188; Gehrke & Ramakrishnan 2003, 608).

The joins allow the user to add more information to the site table. It allows us the mention and describes the different structure, layer, burials or artefacts that are found during an excavation. The information regarding structures, layers, burials and artefacts

(31)

30 are stored in various other tables. The tables with the information on the structures, layers, burials and artefacts, all have one-to-many relation with the site table. Meaning that every sites entry in the site table can have multiple data entries in the structure table (Auer &

Table 3.1: A list of common DBMS software. The underscored DBMS are relational databases. These database types dominate the market. As for the top seven consist out of six relational databases. This list is based on popularity (db-engines.com).

Ranking Database name Database type

1. Oracle Relational DBMS

2. MySQL Relational DBMS

3. Microsoft SQL Server Relational DBMS

4. PostgreSQL Relational DBMS

5. MongoDB Document store

6. DB2 Relational DBMS

7. Microsoft Access Relational DBMS

8. Redis Key-value store

9. Cassandra Wide column store

10. Elasticsearch Search engine

Kroenke 2011, 147; Gehrke & Ramakrishnan 2003, 33). As for one site can have multiple structures. It is even possible to describe the archaeological finds further, as will briefly be mentioned in Chapter 4.2. The same is possible for the other tables. Another relation type is the many-to-many relation (Auer & Kroenke 2011, 191; Gehrke & Ramakrishnan 2003, 33). This relation expresses that two single data entries in different tables can have multiple data entries in each other table. For example, in a single burial, multiple people can be buried. In the same burial, multiple artefacts can be found. Of some of these artefacts it is clear to which individual they belong; however, some artefacts can be associated with multiple individuals. So multiple artefacts can be associated with one or more individuals and vice versa. This creates a logical web of joins and relations between the different tables.

RDBMS allow us to build queries in which we can combine the different tables so that querying is possible. Queries are the search engine of a DBMS (Auer & Kroenke 2011, 61; 299; Elmasri & Navathe 2007, 549; Gehrke & Ramakrishnan 2003, 16). In all RDBMS these queries are written in SQL (structured query language). This is the standard language for designing a relational database. Many different DMBS software have tools which help the user to create SQL-statement. These SQL-statements are the basis of

(32)

31 querying data (Auer & Kroenke 2011, 6; 31; Elmasri & Navathe 2007, 241; Gehrke & Ramakrishnan 2003, 130). Thus, they allow the user to find, group and calculate the stored data. However, when the amount of data in a relational database is high, and when there are many relations between the different tables and tables, RDBMS tends to get slower. There are two solutions to battle the RDBMS from getting slower. The first is building a bigger computer with better specifications. Nevertheless, there are physical and technological limitations to this. The other solution is to use different servers to store that data. This solution allows the database to use the combined power of the computers to calculate the query results. However, RDBMS were not designed to be spread across multiple servers (Sadalage and Fowler 2012, 11%-12%).

3.4.1. Normalisation and consistency

RDBMS have two important properties: normalisation and consistency (Auer & Kroenke 2011, 79; Elmasri & Navathe 2007, 355-357; Gehrke & Ramakrishnan 2003, 614-616). These two properties ensure the absence of redundant data, data integrity and that all data and data alterations follow certain rules. These properties are used in different stages of using a relational database. Normalisation is used when a database is created, while ACID is used in so-called transactions (Auer & Kroenke 2011, 17; Elmasri & Navathe 2007, 165; Gehrke & Ramakrishnan 2003, 521).

Normalisation

The value of the Relational databases lies in the vast amount of data it can store, and that this data is only stored once, this is achieved because of the so-called normalisation rules (Auer & Kroenke 2011, 17; 95; Elmasri & Navathe 2007, 355-357; Gehrke & Ramakrishnan 2003, 622-623; http://www.teach-ict.com; www.techwalla.com). Normalisation is the process of organising the columns and rows of a relational database. This process is designed to reduce data redundancy and improve the data integrity of a database. Furthermore, normalisation is the process of simplifying the database as well. Relational schemas that do not meet certain conditions are decomposed into smaller schemas that fulfil the conditions (Auer & Kroenke 2011, 19, 95; Elmasri & Navathe 2007, 355-357; Gehrke & Ramakrishnan 2003, 622-623). In the same paper where Codd (1970) proposed the relational database model, he proposed the rules of normalisation as well. He argued for three normal forms (NF). These three are the first NF (1NF), the second NF (2NF), and the third NF (3NF) (Codd 1970, 384; 1972, 36-38) It was in 1974 that a fourth NF was proposed, the Boyce-Codd NF (BCNF) (Codd 1974). The Normal forms do have increasingly restrictive requirements. Meaning

(33)

32 that every database in BCNF is in 3NF as well, every database in 3NF is in 2NF as well, and every database 2NF is in 1NF (Auer & Kroenke 2011, 95; Elmasri & Navathe 2007, 355-357; Gehrke & Ramakrishnan 2003, 614-616). Except for these four different normal forms, there are five more normal forms. These are in order from low to high: fourth NF (4NF), Essential Tuple NF (ETNF), fifth (NF), sixth NF (6NF), and Domain/Key NF (DKNF). These five normal forms and the BCNF are not discussed in this thesis because a database is considered as normalised when the database is in 3NF (Auer & Kroenke 2011, 95-96; Elmasri & Navathe 2007, 355-357; Gehrke & Ramakrishnan 2003, 622-623). Therefore, only the 1NF, 2NF and 3NF are discussed.

Every database starts in the Unnormalized form or UNF. In UNF a database is not yet normalised, and can be structured as seen is in figure 3.2. In this figure, a table is represented which hold a row of different attributes. Furthermore, this figure shows the results of normalisation when the rules of normalisation are applied (Auer & Kroenke 2011, 95; Elmasri & Navathe 2007, 355-357; Gehrke & Ramakrishnan 2003, 614-616). To put this table in 1NF the primary keys need to be defined, and repetitive attributes need to be removed (Auer & Kroenke 2011, 95; Elmasri & Navathe 2007, 395; Gehrke & Ramakrishnan 2003, 622). If we apply these rules, this results in five different primary keys. Now the 1NF is applied to the database; the 2NF can be applied. A database is in 2NF when it complies with the following rule: split data that is depended on a part the primary key (Auer & Kroenke 2011, 95; Elmasri & Navathe 2007, 395; Gehrke & Ramakrishnan 2003, 615). This can only be done if the database is in 1NF already. When

(34)

33

Figure 3.2: This figure shows the different stages of normalisation. In the first box from the top, the unnormalized form is presented in a row with different attributes. In the second box from the top, the 1NF is applied to the attributes, resulting in three different tables (in cursive) and five primary keys (bold and underscored). In the third box from the top, the 2NF is applied. This results in a total of four tables, and six primary keys. Please note, that all the structure attributes, except for the structure_id have been placed in a separate table. In the last box, the 3NF is applied. In here, the country attribute has been split from the sites table. Now the database normalised up and to the 3NF, and can be used as a proper database.

Applied, this results in the creation of four tables. The 3NF can only be applied when the database is in 2NF. The following rule must be applied to ensure that the database is in 3NF: split the attributes that are depended on non-key values (Auer & Kroenke 2011, 95; Elmasri & Navathe 2007, 395; Gehrke & Ramakrishnan 2003, 614-617).

ACID

In a relational database, it is possible to manipulate any row from any tables in a single transaction. Such transactions are called ACID-transactions. An ACID-transaction is an operation that is Atomic Consistent, Isolated and Durable (Auer & Kroenke 2011, 344-355; Elmasri & Navathe 2007, 619-620; Gehrke & Ramakrishnan 2003, 520; Sadalage & Fowler 2012, 17%). Simplifying the ACID properties is possible. However, before ACID can be explained, the meaning of a transaction needs to be defined and explained. Therefore, a thought experiment is created. In this experiment, a hypothetical RDBMS transaction is followed. In this thought experiment, a database is created filled with measurements of human femora. Only, in one of the rows, the measurements are registered as millimetres, and the user wants them to be in centimetres. So, the user creates a SQL-statement, so that the changes that the user wants to make are possible.

(35)

34 The user runs the SQL-statement and saves the changes. This process is a form of transaction. From the outside, two points in time exist: before and after the transaction; every measurement in millimetres (before) and every measurement in centimetres (after). Because of ACID, these transactions occur without any errors and interruptions. ACID is an acronym of atomic, consistency, isolation and durable. Simplified, these words can be described as followed: Atomicity can be denoted as everything in a transaction succeeds, or no alterations are made at all (Auer & Kroenke 2011, 344-355; Elmasri & Navathe 2007, 619-620; Gehrke & Ramakrishnan 2003, 520; Vaish 2013, 10). Consistency as a transaction cannot leave the database in an inconsistent state (Auer & Kroenke 2011, 344-355; Elmasri & Navathe 2007, 619-620; Gehrke & Ramakrishnan 2003, 520; Vaish 2013, 10). Isolation as one transaction cannot interfere with another (Auer & Kroenke 2011, 344-355; Elmasri & Navathe 2007, 619-620; Gehrke & Ramakrishnan 2003, 520; Vaish 2013, 10). And, durability as the completed transaction persists, even after application restarts (Auer & Kroenke 2011, 344-355; Elmasri & Navathe 2007, 619-620; Gehrke & Ramakrishnan 2003, 520; Vaish 2013, 10). If one of these operations fails, then the whole database returns to the state it was before the transaction started. Howsoever indispensable these qualities may seem, they can be quite problematic. For example, if two colleagues work on the same online (open-access) RDBMS. One of the researchers decides to change all the femora measurements, while the other wants to query the data for his research. Both access the database at the same time and try to alter the database simultaneously. This would result in an incredibly slow response time. Additionally, it will lock a part of the database. Resulting in a temporary break in one of the researcher's research.

3.4.2. Advantages and disadvantages

Where normalisation ensures that no redundant data is present in the database, most databases have specific integrity constraints that must hold for the data (Elmasri & Navathe 2007, 20-21, www.techwalla.com). These integrity constraints ensure that the user of the database must follow specific rules regarding data input. For example, let assume the database containing the femora measurements also contains a table which holds information of the excavated individuals, and that the individual table has a one-to-many relation with the femora table. The user Is not allowed to add new femora to the femora table which do not have an entry in the individual table to ensure data integrity. Thus, these rules ensure data integrity, all the data that is filled in into a