Genres in websites

(1)

UNIVERSITY OF GRONINGEN – FACULTY OF ECONOMICS AND BUSINESS

Genres in websites

A qualitative research into the genres that

divide different kinds of websites

Master thesis Business and ICT

Jan-Harm Boer

29

th

August 2011

(2)

Genres in website

A qualitative research into the genres that divide different kinds of

websites

Master thesis

Name:

Jan-Harm Boer

Student number:

S1808532

Email:

Janharmboer@chello.nl

Date of completion:

29

th

August 2011

Institute:

University of Groningen

Faculty:

Economics and Business

Degree program:

Master Business Administration

Specialization:

Business en ICT

Version:

2.1 Status:

Final

Faculty supervision

Supervisor:

Dr. Nicolae B. Szirbik

Co-supervisor:

Dr. Laura Maruster

External Supervision

Supervisor iWink:

Simon Wisselink

University of Groningen

Oude Boteringestraat 44 9700 AB Groningen

Nederland

iWink

Laan Corpus den Hoorn 100-1 9728 JR Groningen

(3)

Acknowledgments

This master thesis concludes my study Master of Business and ICT in Business Administration Sciences at the University of Groningen. The master thesis was started around February 2011 and was finished in late August. The master thesis was conducted at the professional website developer iWink in Groningen. I would like to thank Egon Berghout and Martijje Lubbers for their help in acquiring the internship that made it possible for me to graduate.

I would also like to thank everybody involved in the writing of this thesis. I would especially like to thank Simon Wisselink from iWink for providing the opportunity of an internship at iWink and for his support during the process of writing this master thesis. I would also like to thank both of my supervisors of the University of Groningen: Nick Szirbik and Laura Maruster for their support, supervision and valuable insights.

Finally I would like to thank my family and friends for their moral support in the seven months spent while conducting this study.

(4)

Summary

This thesis describes the investigation into a classification of similar websites into homogeneous functional groups. These functional groups of websites are also known in academic research as web genres: a set of websites with the same style/technique, form/format, content and/or function/purpose (Shepherd & Watters, 1999). The investigation described in this thesis, conducted on behalf of iWink, focuses on how these web genres can be used to streamline and enhance the website development of iWink and other professional website developers. The research question used for investigating the topic of this research is: What are common functional types of websites

and how can iWink utilize a classification of websites in their design process? This research question

is specified toward iWink but it is also applicable for other website developers.

The classification is conducted using a classification framework consisting out of three separate phases: 1) the construction of a website collection, 2) an investigation to discover the functionalities these websites offer and 3) the classification of the websites themselves. The websites collection is constructed out of four separate sources: the 15 most frequently used words in the Dutch and English language and the 15 most frequently used search terms in the Netherlands and the United Kingdom. The words and search terms of these sources are submitted as search queries to the Google search engine. The websites present in the results of these search queries are extracted and subsequently compiled together to form the website collection which is used as a source for the classification. In the second phase of the classification framework the 675 websites present in the website collection are analyzed to discover the functionalities these websites offer. In phase three the K-means clustering method of the classification framework is employed to segregate the websites present in the website collection into web genres. The source of the K-means clustering method is the 53 functionalities which were discovered in phase two of the classification framework. 19 web genres were found using the classification framework. The web genres found in this study are largely consistent with the web genres found in comparable web genre research. The exception is the appearance of the social network website and the interactive website genre. This inconsistency is explained by the fact that most of the comparable research is conducted before the year 2008. The emergence of the social network and interactive websites only took flight after that period. As a result these genres weren’t discovered in the comparable research. Another notable result of this study is that many of the discovered web genres are extensions and/or additions of each other. This was most notable with the web genres consisting out of corporate websites, web shops and news websites. This is a direct result of the popularity of these web genres, together these web genres and their extensions and/or additions, with a total of 8 of the 19 web genres, include over 55% of the websites present in the dataset.

(5)

(6)

Overview of figures

Figure 1 - Design science methodology ... 4

Figure 2 - Summary of global search strategy ... 10

Figure 3 - Outline of the website development method of iWink ... 12

Figure 4 - Coding phase of the website development method of iWink ... 14

Figure 5 - Summary of the classification framework ... 16

Figure 6 - Website collection method (phase 1) ... 18

Figure 7 - Functionalities identification (phase 2 - part 1) ... 21

Figure 8 - Functionalities identification (phase 2 - part 2) ... 22

Figure 9 - Web genres identification (phase 3) ... 24

Figure 10 - Graphical overview of final website collection ... 28

Figure 11 - Relations between web genres ... 34

Figure 12 - Example of the web genre integration at iWink ... 40

(9)

Overview of tables

Table 1 - Literature overview: used website collection and classification method ... 10

Table 2 - Collection of web genres used in comparable research ... 11

Table 3 - Website crawler (search script) parameters ... 19

Table 4 - K-Means clustering method ... 24

Table 5 - Discovered websites functionalities ... 28

Table 6 - Removals from website collection in the functionalities identification phase ... 29

Table 7 - Number of websites in the clusters ... 31

Table 8 - Cluster solution and corresponding web site functionalities ... 32

Table 9 - Comparison of results between this study and comparable research ... 35

Table 10 - Agglomeration table using original variables ... 56

Table 11 - Agglomeration table using principal components ... 57

Table 12 - General and ANOVA statistics of both cluster-runs ... 58

(10)

1 Introduction

This chapter provides an introduction on the topic of this study. Section 1.1 provides an introduction to the general topic and context of this study. Section 1.2 addresses the problem statement which includes the research objective and the research question. This will lead to a research design which is outlined in section 1.3. The last section of the introduction chapter is the research scope where some limitations of this study are outlined.

1.1 Topic and context

The world wide web or as it’s often called the Internet is a worldwide phenomenon that is growing in popularity and size with each passing day since its inception in 1991. Currently there are more than 131 million active websites and the number of websites is growing with a rate of around 25 thousand websites a day1. It is no wonder that there are also an abundant number of companies operating in the field of website development. According to Kiran (2010) in 2005 there were over 25 thousand professional website development companies in the United States alone and this number is expected to grow with over 20% percent to around 30 thousand by 2010. It is therefore very important for a website developer to distinguish themselves from their competitors in this highly competitive website design market. Besides marketing and/or corporate strategies the other feature that website developers can use to distinguish themselves with is their product: the website itself and the development of that website.

The most important characteristics of a website are the (graphical) design and the available content (Huizingh, 2000). Of those two characteristics the content is often considered the most important: providing information is the basic goal of a website (Angehrn, 1997). The dilemma with this is that the content is often provided and/or published on the website by the client themselves. An effect of this is that the website developer can really only distinguish themselves with the design of the website. An additional dilemma is that a good design only serves as a stage for the content. A phrase often heard among website developers is that visitors are pleased by the design but drawn to the content (Beaird, 2007).Website developers are therefore forced to find other methods which they can use to distinguish themselves from their competitors.

The study described in this thesis is conducted on behalf and the direct supervision of one of the owners of iWink. iWink is one of the larger website designers in the northern part of the Netherlands. Currently they have around thirty employees with specialist in the field of website design and development, business science and website support. They currently have a project portfolio of around 200 completed websites2. iWink also struggles constantly with the dilemma outlined above of how they can distinguish themselves from their competitors. One of the methods iWink applies to distinguish themselves is the use of the interaction design method for developing the design of their websites. As the name implies this method focuses on the interaction between the visitors of a website and the website itself. The main goal of interaction design is enhancing the usability of the website. Thereby making the content the most important aspect of the website again (Sharp, Rogers, & Preece, 2007). iWink integrates this method in their website development process

1

http://www.domaintools.com/internet-statistics/ accessed on the 14th of July 2011

(11)

by engineering at the start of each new project, based upon the expressed wishes and goals of the client, a sketch of the website. This sketch represents the layout of the different pages, how they are linked together and how the visitor can interact with the website. By designing a website according to the interaction design method iWink streamlines the whole website design before a single line of code has to be written.

Besides the (graphical) design and content there is a third and often by the client and visitors deemed unimportant characteristic of a website: the technology that the website operates upon3. Without a solid programming of the technology a website will never function properly. iWink tackles this problem with their in-house developed Content Management System (CMS). A CMS essentially acts as a backbone for the website. Among other features a CMS usually contains the following components: a location where clients can insert and update the content of the website, a location where user rights can be managed and a location where users can upload new media. Additionally a CMS is also used to implement the basic technology necessary for a website to work. An additional future that iWink integrated into their CMS is a feature that offers their clients a selection of different modules. These modules are used by iWink to integrate standard functionalities into a website without having to design them from scratch for every new website. This enables iWink to focus themselves on the more important characteristics of a website: the content and the design. The combination of the modular content management system and the interaction design is something that distinguishes iWink from a portion of the market for website design. iWink is now searching for a method to expand their CMS to include larger modules which incorporate the common functional types of websites and the corresponding functionalities. An example of a common type of website that everybody is familiar with is the web shop (functional website type). Such a website always contains among other functionalities a product catalog, a shopping cart and a checkout section (modules). If this example is followed then iWink wants to introduce a large module catered to web shops that by default incorporates the product catalog, shopping cart and checkout modules. The web shop is one example but there are other common and more advanced functional types of websites that iWink also wants to incorporate as generic modules4 into their CMS. These generic modules enable iWink to offer their clients all the basic functionalities for the common functional types of websites with one large module. This eliminates much of the necessary design and programming efforts for a website and as mentioned before enables iWink to focus more on the design and content of the website. Additionally this also enables iWink to reuse and incorporate much of the knowledge and solutions that went into the development of previous websites and modules.

This thesis describes the investigation into the aforementioned functional types of websites and the corresponding functionalities. There is already a lot of research conducted into the field of establishing and classifying common types of websites, for example: Mehler, Sharoff, & Santini (2010), Rosso (2008), Santini (2006), Eissen & Stein (2004), Montesi & Navarrete (2008) and Ayanso & Yoogalingam (2009). However, with some exceptions, most of that research focuses on finding the best methods for website classification, creating a complete taxonomy for website classification or

3

For the rest of the thesis a client is considered as a company that orders the development of a website at a websites designer and a visitor is someone who accesses a website with their own web browser.

(12)

finding the features upon which website classification can be based. The study described in this thesis however primarily focuses on finding the common types of websites and their corresponding functionalities so that iWink and other website developers can integrate these into their website development process. The primary focus of this research therefore isn’t to enhance the research field or to create a grounded theory based taxonomy of websites but to help website developers in their website development process.

1.2 Problem statement

Based upon the topic and context explained in the previous section the following research objective and research questions are derived.

1.2.1 Research objective

As described above the focus of this research is to help website developers like iWink in their design process by identifying and subsequently classifying the common functional types of websites and their corresponding functionalities. Therefore the objective of this research is to create a framework that subsequently is used to classify the most common functional types of websites. This classification will be based upon the main functionalities that these websites offer. The output of this research is an overview of the common functional types of websites with the corresponding functionalities. Subsequently iWink can adapt these to the modules in their content management system to create the generic modules.

1.2.2 Research question

From the research objective the following main research question is derived:

“What are common functional types of websites and how can iWink utilize a classification of websites in their design process?”

From the main research question the following sub-questions are derived:

1. What is the current design philosophy at iWink and how could a functional classification of websites be incorporated into this design philosophy?

2. Which methods can be used to create an unbiased collection of websites? 3. What are the most conventional functionalities of websites?

4. What are the most common methods to classify websites? 5. How can the outcome of the classification be validated?

6. How can the classification be combined with the existing (or new) CMS modules at iWink?

1.3 Research design

(13)

design cycle and an evaluation cycle. The reflective cycle is primarily used for background research on the topic of website classification, different techniques for classifying websites, different techniques for building a collection of websites and an investigation into the design process at iWink. The design cycle is where the results of the reflective cycle are combined into a framework for classifying the websites. The results of the reflective cycle will also be used to form a collection of websites which will be classified with the use of the framework. The result of the classification framework is an overview of the common functional types of websites with their corresponding functionalities. The evaluation cycle is the last stage of the design science methodology and is used to analyze the outcome of this study.

Figure 1 - Design science methodology

The three cycles and the related activities are described in detail below. A schematic overview of these cycles and the corresponding activities can be found in the appendices (appendix 1).

1.3.1 Reflective cycle

The reflective cycle focuses on a background research and an investigation into the website development process at iWink. The background research is conducted through a literature research and roughly focuses on three topics: website classification approaches, website collection approaches and the results of comparable research. The organizational investigation into the development process at iWink is performed using a series of interviews with employees of iWink. The different activities of the reflective cycle are further outlined below.

 Literature research

The literature research is used to find (background) information on a couple of subjects. Of which the most important is an investigation into comparable (academic) literature on the subject of website classification. This investigation primarily focuses on the findings or results of this research and is used to form an overview of the different views on and results of the website classification. Also the taxonomy used for the groups of similar websites that were classified and the taxonomy used for the website functionalities are investigated (the appellation and context used for the classification). The second part of the literature research focuses on the techniques that are applied when classifying websites. The last part of the literature research focuses on the techniques used to compile a collection of websites that eventually forms the source for the websites classification.

 Organizational investigation

The organizational investigation concentrates on the current development process that iWink employs when developing websites for their clients. This creates a better understanding where the classification of websites could be used and where these results of this research can make improvements. The organizational investigation is performed using a series of interviews with employees of iWink.

 Results of comparable research

The findings of the literature research are combined into an overview of the most common (functional) website types. The overview is used as a taxonomy source for the classification in

(14)

this study. Additionally this overview is also used as a means for validation of the classification results.

 Website classification approaches

The most important objective of the reflective cycle is to discover techniques that can be used for classifying websites into unique groups that contain similar websites. These techniques must be usable to classify the websites in unique groups according to their functionalities.

 Website collection approaches

The last objective of the reflective cycle is to discover or design techniques to create a website collection which is used in the website classification. This is an important aspect of the reflective cycle because if a wrong (flawed) approach is used then the most likely outcome is a biased collection of websites. The result of a biased collection of websites is that for example important types of websites can be missed in the classification.

1.3.2 Design cycle

The design cycle is where the results of the reflective cycle will be combined into a framework for classifying the websites. The three main activities in the design cycle are: developing the classification framework, compiling a collection of websites and subsequently using the classification framework to classify the websites present in the website collection into unique groups of websites. These unique groups of websites are then used to formulate the generic modules which iWink will apply in their CMS and their website design process. The different activities of the design cycle are further outlined below.

 Design classification framework

The results of the comparable research and the website classification methods are combined into a classification framework. This framework will form the backbone of the classification. The purpose of the framework is to make it possible to classify the collection of websites in a structured manner.

 Test classification framework

As the research progresses it is safe to assume that new insights or additions will be developed which will affect the framework. Therefore the classification is constantly tested and updated throughout the research to facilitate this insights and/or additions into the framework. This ultimately results in a stronger framework. At a certain point in the research the decision is made to stop updating the framework to allow the classification to be done properly.

 Compile the collection of websites

The approaches discovered in the reflective cycle are applied to compile the collection of websites which is used in the classification of the websites. The collection of websites has to be generated very carefully to prevent any biases (for example the omission of certain types of websites) in the collection.

 Apply classification framework

(15)

 Classification of websites

The results of the previous steps are compiled into a document describing the classification of the websites. This document consists of the different (functional) types of websites and the common functionalities that define these websites.

 Validate with comparable research

The classification is validated using the results of the comparable research found in the reflective cycle. The main activity is to verify if the unique groups of websites found in the classification are comparable to the unique groups of websites found in other website classification research. This step ensures that a thorough classification has been made.

 Form generic modules

In this step the classification of the websites is combined with the CMS modules of iWink to form generic modules. The generic modules will be used by iWink in the future to develop websites based upon the website classification in this study.

 iWink CMS modules (new or existing)

This are the modules in the CMS that iWink designed to build the websites. These existing (or new) modules are used to form the generic modules.

1.3.3 Evaluation cycle

The evaluation cycle is the last cycle in the design science methodology. Due to time constraints the evaluation cycle cannot be completely included in this study. It is up to future research and iWink to perform the activities of the evaluation cycle. The different activities of the design cycle are further outlined below.

 Efficacy

In the efficacy step of the evaluation cycle it is tested if the outcome of the research is successful in producing the intended result. In other words is the functional classification of the websites successful in producing the generic modules which can be used in the website development process of iWink. The efficacy of the generic modules can be tested with field experts or a couple of clients using test cases, for example: is a test subject capable of choosing a generic module which suits their company wishes and goals?

 Utility

In this step the utility of the outcome of the study is tested: does the outcome fulfill the requirements set in the beginning of the research. The utility of the outcome can be discovered through discussions with iWink. This is the only activity of the evaluation cycle that is partly included in this study.

 Quality

The quality step of the evaluation cycle focuses on the quality of the research results. Or in other words does the outcome live up to academic standards and the standards at iWink. This is a very difficult and time consuming activity and legitimate results on quality can only be measured if the results of this study are incorporated and used by iWink for at least a year.

1.4 Research limitations

(16)

 The research will be conducted in the seven months period between the 1st of February and the 31th of August. If no exceptions arise the researcher is not available anymore after the 31th of August.

 The research will not include the actual design (or coding) of the generic modules. As discussed above an overview of the common functional types of websites with the corresponding functionalities is the final output of this research. Which iWink subsequently has to adapt to the modules in their content management system to create the generic modules

 The collection of websites that is used for the classification has to be compiled very carefully to prevent it from becoming too large to classify in the time available.

(17)

2 Background

This chapter focuses on a background research in the academic literature and will mainly focus on three topics: website genres in general, website collection methods and website classification methods. Chapter 2.1 outlines the general search strategy used to find the literature. Chapter 2.2 provides a comparison between the website collection method, the classification method and the results of comparable research found in chapter 2.1. And chapter 2.3 focuses on the website development method used by iWink

2.1 Search strategy

As discussed in the previous chapter this study focuses on a functional classification of websites. Considering the age and popularity of the internet it is to be expected that a lot of research is conducted on this subject. A good quantity of research is found when searching for literature with obvious subjects like: “(functional) classification of websites“, “(functional) website classes” and “(functional) categorization of websites”. However the focus of the largest portion of the research discovered with those subjects is a textual classification on the content of a website. This is very different from the method proposed in this study: a functional classification based upon the functionalities of websites. Another large portion of research on those subjects is focusing on automatic classification of existing website collections that contains anywhere from ten thousand to well over a million websites. Although that kind of research is compelling and they probably result in a stronger and better grounded classification, it is not directly relevant to this study because time and resource concerns prevents that kind of mass classification of websites in this study.

Only a small portion of the research found with the subjects mentioned above actually discusses functional classification based upon the functionalities a website offers. That research refers to the classified groups of websites as genres or web genres as opposed to classes or categories. For example Xiaoguang & Davison (2009) refers to genre classification when they discuss the functional classification of websites that focuses on the ”role a website plays”, this corresponds with the aim of this study. A further examination of (web)genre reveals that there are many different definitions and explanations available on what a genre is, this is mainly due to the range of subjects that genres are used for. The most common subjects where genres are used and where most people are familiar with them are: literature, movie, music and art. Most dictionary entries reflect this, three examples of definitions from commonly used dictionaries are:

 Oxford dictionary5: ”A style or category of art, music, or literature”.

 Cambridge dictionary6: “A type of literature, art, or music characterized by its particular

subject or style”.

 Dictonary.com7: “A class or category of artistic endeavor having a particular form, content,

technique, or the like”.

(18)

Although Wikipedia is not a common and trusted academic resource it does also provide an interesting and useful definition on what a genre is:

 Wikipedia8: “Genre is the term for any category of literature or other forms of art or culture,

e.g. music, based on some set of stylistic criteria. Genres are formed by conventions that change over time as new genres are invented and the use of old ones is discontinued. Often, works fit into multiple genres by way of borrowing and recombining these conventions.”

These definitions all have in common that they refer to some style, type or category of subjects that are characterized by some sort of particular characteristics or criteria that makes each of these groups (genres) mostly unique. The definition of a genre from Wikipedia also exposes two other interesting topics in genre research: discovered genres can change over time when new genres are invented or old ones become obsolete. Secondly works are not restricted to one genre and often fit into multiple genres. These two problems, especially the first one, become very apparent when genres are used to classify websites which exist in the fast changing world of the internet.

The normal definition of a genre cannot directly be applied when classifying websites. The reason for this is that websites include a third element besides the two elements used in the definition of common genres: form and content. As already discussed in the introduction a third element is necessary when classifying websites because a website is not a static object (like a text or a movie) but also fulfills a function (Montesi & Navarrete, 2008). Therefore a web genre consists out of three elements: form, content and function (Shepherd & Watters, 1999). This early work by Shephard and Watters (who are regarded as the inventors of the term web genre) is adopted by the majority of other researchers in this field which ultimately led to the following definition of a web genre:

 “A set of documents with the same style/technique, form/format, content and/or

function/purpose. Style, technique, form and format are all similar things, so it’s assumed that a genre is defined by form, content and purpose.”

This is a definition from the WebGenreWiki9, which is a website established and maintained by the leading researchers in the field of web genres. They conclude that a web genre is defined by form, content and purpose. Functionalities offered by a website are usually a direct result of the purpose of a website: it does not makes sense to add a checkout counter to a website if it is not possible to purchase any products on that website. Website functionalities therefore explain something about the content and purpose of the website. This is also another reason (besides the reasons explained in the introduction) why the functionalities offered by a website are chosen for the identification of the unique groups of websites.

All these definitions suggest that (web)genre is a better subject for finding related literature. The literature found when using the following subjects: "web genres", "genres in websites" and "classifying websites in genres" are indeed much more relevant to this study then the literature found when using the topics discussed in the beginning of the chapter. Most of the research

8

http://en.wikipedia.org/wiki/Genre

(19)

conducted before the year 2000 is discarded, because of the dramatic change and evolution in the online world in the last decade. The research considered mostly focuses on the period between 2000 and 2007. As a result the latest online developments like web 2.0 and the emergence of interactive websites like online word processors (for example Google Docs) often are not discovered. A summary of the global search strategy is outlined in figure 2.

Figure 2 - Summary of global search strategy

A discussion of the used website collection method, website classification method and results of the research discovered with the second set of search objects is outlined in chapter 2.2.

2.2 Analysis of comparable research

The relevant research found in the preceding section is analyzed to discover the results of the research and the website collection and classification method that was used. By no means this analysis has to be regarded as an exhaustive literature research, but it is used to highlight some of the more important research conducted in the field of website and/or web genre classification. Section 2.2.1 focuses on the used website collection and classification of the analyzed papers. Section 2.2.2 focuses on the results (if applicable) of the analyzed literature.

2.2.1 Website collection and classification method used in comparable research

Table 1 provides an overview of the analyzed research and the corresponding website collection and classification method that were used. The website collection methods are divided into four possible methods (which are explained in more detail in chapter 3.1): the use of an existing collection, the use of a search engine, a random selection of websites and the use of a website crawler. The classification methods are divided into data mining techniques (which are explained in more detail in chapter 3.3), survey methods like questionnaires and interviews, manual methods and other “unique” methods. These divisions are based upon what was found in the related literature.

Table 1 - Literature overview: used website collection and classification method

Literature Collection Classification

(Ayanso & Yoogalingam, 2009) Existing collection Data mining: Two step clustering (Boese, 2005) Random selection Data mining: a combination of methods (Crowston & Williams, 1999) Search engine Data mining: Hierarchical clustering (Crowston & Williams, 2000) Random selection Manually by the researchers

(Dewe, Karlgren, & Bretan, 1998) Search engine Survey: questionnaires (among a university) (Eissen & Stein, 2004) Random selection Data mining: Bayes theorem and support

vector machines

(Huizingh, 2000) Existing collection Survey: questionnaires (among students) (Kanaris & Stamatatos, 2009) Existing collection Unique method developed by researchers (Montesi & Navarrete, 2008) Random selection Manually by the researchers

1th set of search subjects:

"(functional) classification of websites", "(functional) website classes" and "(functional) categorization of websites"

Not relevant

Due to focus on website classification based upon content of the website instead of functionality

2nd set of search subjects:

"web genres", "genres in websites" and "classifying websites in genres"

Relevant

(20)

Literature Collection Classification

(Rehm, 2002) Crawler Manually by the researcher

(Rosso, 2008) Search engine Survey: questionnaires and interviews (among students)

(Roussinov et al., 2001) Search engine Survey: questionnaires and manually by the researchers

(Santini, 2006) Existing collection Data mining: Bayes theorem and support vector machines

(Santini, 2007) Existing collection Combination of manual and data mining techniques

As can be seen from table 1 there is an equal distribution of the methods used for the construction of the collection of websites. This implies that there is not one single solution that is regarded by the website classification research field as the best method for creating a collection of websites. As a consequence for this study one or a combination of these methods has to be chosen. The best solution would probably be a combination of the different methods. The used classification techniques however clearly demonstrate that the preferred method for classification has to be found either in the field of data mining or has to be conducted manual. A complete manual classification will take too long for this study considering the amount of websites and functionalities that are going to be classified. However constructing a tool that could classify the websites completely automatic is, according to the literature discussed above, a research topic in itself. The best method is probably again a combination of both (explained in more detail in chapters three and four).

2.2.2 Comparison of the results in comparable research

After the results of the literature discussed in chapter 2.2.1 are combined ten web genres emerge that are discovered more frequently than others: help websites (like FAQs), article websites, discussion websites (forums and discussion boards), online shops, online portrayal; private (personal) and non-private (corporate), link collections (like www.startpagina.nl), download websites, search engines and blogs. This is emphasized by Kanaris & Stamatatos (2009) who mentions two web genres collections, or pallets as they call them, that are often used in the field of web genre classification to evaluate classification results: KI-04 by Eissen & Stein (2004) and 7genre by Santini (2007). These two collections of web genres contain many of the web genres that are also discovered in the comparable research. Table 2 provides an overview of both web genre collections and compares them with the results found in the comparable research analyzed in this study.

Table 2 - Collection of web genres used in comparable research

Comparable research KI-04 7genre

Article websites Article Blog

Blogs Discussion E-shop

Discussion websites Download FAQs

Download pages Help pages Online newspapers

Help Websites Link collections Personal homepage

Link collections Online portrayal - non-private Search page Online portrayal - non-private Online portrayal – private Website listings

Online portrayal – private Online Shop Online shops

(21)

As can be seen in table 2 there is quite some overlap between the different sources. For example all three sources mention the personal homepage as a web genre: online portrayal - private in comparable research and KI-04 and personal homepage in 7genre. Also online shops are mentioned in all three sources: online shops in comparable research and KI-04 and e-shop in 7genre. Other similarities between both sources are article websites (newspapers in 04), blogs (discussion in KI-04), help websites (FAQ`s in 7genre) and link collections (website listings in 7genre). The web genres mentioned in table 2 will be used to evaluate and validate the classification results of this study.

2.3 Website development method of iWink

As discussed in the introduction iWink develops their websites using the interactive design method (Sharp et al., 2007) which focuses on the interaction between the visitors of a website and the website itself. iWink integrated the interaction design method into their website development method. This integration is best reflected in a phase iWink aptly named “Interaction Design”. The other four phases that make up the website development method of iWink are: “Analysis and advice”, “Web design”, “Coding” and “Web development”. Not all of these phases are always completed or necessary for each project. This is mainly due to the scope of a given project: not every web development project needs such an elaborate process. Figure 3 below provides an overview of the sequence of the website development method of iWink. The following sub sections will outline these five phases and explain where the results of this study will be integrated. As it is not the objective to precisely map out the complete development method of iWink this section is limited to a general description and therefore does not includes flowcharts for each individual phase.

Figure 3 - Outline of the website development method of iWink

2.3.1 Analyze and advise

This phase is initiated by the sales department when they find or are contacted by a new client. This phase consists out of two major activities. The first one is the analyze part: the sales department schedules a first meeting with the client which is without any obligations for the client. This first meeting is mainly to familiarize the client with iWink and iWink with the client. After the first meeting a project manager is assigned to the project who is responsible for the project until it is delivered to the client. The project manager is also the main contact point for the client as long as the website is in development. In a subsequent meeting the wishes and goals of the client are discussed, the scope of the project is defined and sometimes examples of other finished projects are demonstrated. If it is a large project more subsequent meetings will sometimes take place. Based upon the results of the analyze activity an advise (which is the second activity of this phase) in the form of a quotation is made. This offer combines all the agreements and discussed subjects of the analyses activity and condenses them into a description of activities and products iWink will deliver and for which price this work will be done. A recent addition to these two activities which will complete the “analyze en advice” phase is a MDMI-consult (MDMI stands for “Meer doen met internet” which translates to “Doing more with the internet”). This MDMI-consult goes above and beyond the product an internet developer normally offers. This MDMI-consult is mainly offered in an attempt to get the client to think about his own problems or ideas and how to solve them. This

Analyse and advice Interaction

design Web design Coding

(22)

MDMI-consult contains: a discussion about the main objective the client pursues with the website and a target group analysis which first focuses on actually defining the target group and secondly how to anticipate and /or approach them particularly on the ground of usability. The final subject of the MDMI-consult is a checklist for common problems and solutions regarding websites.

The first phase of the development process does not contain many opportunities for incorporating the results of this study. However the generic modules could be used in the meetings with the client to demonstrate functional solutions for the common types of websites.

2.3.2 Interaction design

The second phase is initiated when the client accepts the activities and products defined in the offer. This phase is all about visualizing the concept of the website. This visualization is achieved by using a click model which is in essence an advanced interactive wireframe of a website. This click model contains all the web pages (or as the called internally by iWink: templates) that will combine into the complete website. These templates contain the complete structure of the website: where the different content blocks are located, where the menu and banner are placed, which options the menu offers and where they link to. Almost everything is visually represented in this click model, and the menu options do actually work. The only two things that the click model lacks is every aspect regarding to the visual layout (colors etc.) and the actual coding (video content for example does not work in the click model). The interaction design phase consists out of two activities, or more if the needs arise: in the first activity a designer transforms the wishes of the client into a click model. This model is then made available online to the client. Dependent on the scope of the project this click model is also personally presented to the client. This is an interactive presentation in which the client gets to see and use the click model to get an idea on how the website will work. The client is invited to criticize this click model and is also free to suggest changes to the model. Based upon the outcome of this first activity the click model is redesigned according to the feedback from the customer. This new click model is then again presented to the client. Most of the time the client accepts the second click model. If it is accepted the project moves on to the third phase of the project. If the click model is not accepted it is redesigned again and again until the client is satisfied with the results.

The click models are build according to the rapid prototyping concept, which in short implies that the click model usually is constructed within one or two days. The designer of the click models has constructed a collection of “widgets” for use in the click model. These widgets are a collection of frequently used functionalities in a website. These widgets considerable shortens the time needed when designing a click model. These collections of widgets are perfect for integrating the results of this study. By integrating the generic modules and the accompanying functionalities into the collection of widgets the construction time of the click model could be shortened even further.

2.3.3 Web design

(23)

website. The graphics are created, the color scheme is chosen and sometimes an existing house style is integrated into the website. The designer uses the layout constructed with the click model as a template where the visual style is designed around. The designer first makes templates for only two pages: the main homepage and an important subpage of the website. These templates are personally presented to the client and all choices made in the design are explained to the client. The client gets the opportunity to provide feedback and request changes in the design. If there are major changes this activity is repeated. If the client accepts the templates or there is not any major feedback or requests the rest of the templates are designed based upon the previous activity. These templates are again presented to the client who once more gets the opportunity to give feedback or request changes. These will be incorporated until the client accepts them.

The graphics and visual style are different for each newly developed website, therefore the results of this study are not usable in this phase of the website development process of iWink.

2.3.4 Coding

In this phase all the preliminary work is finished by a coder. This phase only includes the basic work: only functions provided by the in-house CMS are added. All custom functions that need to be specially designed for the client are developed in the next phase. This phase consists out of three activities: the design made in the “web design” phase is first converted to a static basic starting website consisting of only HTML and CSS. In this step only the structure and design of the website are programmed. The next activity is to link the static website to the in-house developed CMS (which is used for 99,5 % of all the websites developed by iWink). The last step of the coding phase is to incorporate all of the functionalities requested by the client that is already provided by one or more existing CMS-modules.

The coding phase is the main location where the results of this study could be incorporated. This phase largely consists of generating a basic starting website with the common existing functionalities. This is exactly what this study pursues: constructing generic modules based upon the common functional types of websites and the functionalities they offer. This implies that a good portion of this phase could be replaced by the output of this study. This phase would be reduced to enabling the generic module in the CMS that matches the wishes and goals of the client and apply the graphical design of the previous phase onto it. Figure 4 provides a schematic overview of the current coding phase and shows which steps (the blocks framed with red dashed lines) could be transformed into only one step where the generic module is enabled.

Figure 4 - Coding phase of the website development method of iWink Coding

Client accepts graphical templates (output web design)

(24)

2.3.5 Web development

This is the last phase in the website development process. All functionalities that require modules custom tailored to a specific client are built in this phase. After the custom modules are developed and integrated into the website the client is once again integrated into the project. If at this point the client wants to incorporate any changes to the website it is assessed if there is time left and if the change is not too large. If this is the case the client can only requests the change with a separate and new website development project. The website is delivered to the client lacking any content, every requested functionality is present but the actual content is to be added by the client themselves. However the project manager does equip the website with some basic content and placeholder text to provide the client with an idea how the website will look when content is present. This is also the opportunity for the project manager to test the modules and functionalities for errors and if everything works as expected. These checks are largely done with the help of checklists (i.e. for a search module: is there a search button, does it work, are the search results as expected etc.). These checklists are something iWink is starting to standardize and will eventually be integrated company wide. After the website is completed and has gone live the client has three months in which they have the opportunity to report small errors to be fixed. The project manager is still in charge at this phase however after the three months period is expired the project is handed over to the support-department and the website development process is officially finished

(25)

3 Classification Framework

This chapter focuses on the classification framework that is used to identify the web genres. The classification framework is structured into three separate phases: website collection, functionalities identification and web genre identification. The first phase, website collection, focuses on the construction of an unbiased collection of websites which is used in phases two and three. The second phase, functionalities identification, focuses on finding the main functionalities that the websites found in phase one offer. The third and last phase focuses on identifying the web genres. These web genres are grouped according to the common functionalities that are found in phase two. A graphical summary of the classification framework is depicted in figure 5 below, a detailed overview and explanation of the different phases can be found in their corresponding subchapters.

Figure 5 - Summary of the classification framework

3.1 Website collection (phase 1)

The first phase in the classification framework is to construct a collection of websites that will serve as input for phase two and three. It is important that this collection of websites is as unbiased as possible to prevent the results of the classification framework from gravitating towards only a couple of specific web genres. According to Boese (2005), and also demonstrated by the background research described in chapter 2.2, four methods are commonly applied for building such a collection of websites (often called corpora in comparable research). These four methods, including a discussion on the advantages and disadvantages, are explained below:

1. A completely random selection of websites

Description: The first method for building a collection of websites is a completely random

selection of websites. An example of this method can be found in the research conducted by Crowston & Williams (2000). They used the “anything goes” option10 of the Altavista search engine to achieve a random collection of 100 websites. Another example of a random selection of websites is demonstrated by Montesi and Navarrete (2008), in their study one week of browsing history of a software engineer was used to build a collection of websites.

Discussion: The method of randomly selecting websites is that it is often, although random,

still too specific. This is a consequence of the influence that the researchers have on the “random” selection. Therefore the collection of websites frequently becomes biased towards a certain genre of websites. The research by Montesi and Navarrete (2008) for example largely resulted in forum-based, discussion websites focused on technical topics because the browsing history was that of a software engineer.

10

AltaVista’s “anything goes” option is not available anymore, however it selected websites by generating a random number n, selecting the nth word a inverted file (i.e., a listing of all the words in all the websites) and then recalling the website corresponding to that word (Crowston & Williams, 2000).

Phase 3 - Web genre identification (figure 9) Phase 2 - Functionalities identification (figure 7 and 8)

(26)

2. A (custom) website crawler

Description: The second method that is often employed for constructing a collection of

websites is a (custom) website crawler. A website crawler automatically browses, downloads and categorizes websites according to predetermined parameters. An example of this method is found in research conducted by Rehm (2002) who used a custom website crawler with parameters that limited the collection of websites to only include websites from German universities

Discussion: A (custom) website crawler is one of the fastest methods for constructing a large

collection of websites, because it is an approach which is fully automated and highly customizable. However a downside of this method is that, due to the high degree of customizability, the crawler often takes a long time to setup. Another possible disadvantage of this method is the parameters that are used for the setup of the website crawler. If these are not chosen correctly they can also introduce a bias into the website collection. The research by Rehm (2002) for example only crawled for university pages in Germany. Although this was by design in that research such a bias is almost always likely to be introduced, because the researchers choose their own parameters.

3. Using search engines

Description: A third method is using existing search engines for building a collection of

websites. With this method queries are submitted to a search engine (for example Google) and the results of the queries are combined into a collection of websites. Many examples where this method is used can be found in comparable research: Roussinov et al. (2002) used search queries of randomly selected persons in an university for building their collection of websites, Crowston and Williams (1999) constructed a collection of FAQ`s websites using the Yahoo search engine and Dewe et al. (1998) used three different methods:, the Magellan voyeur function, questionnaires and browsing history to collect random subjects which subsequently where submitted as queries in a search engine.

Discussion: This method of using search engines is the most desirable because it is free, easy

to conduct and highly flexible. However this method also comes with its own problems because, just like with custom crawlers, the researchers choose their own search terms for the search queries. An example of this is the research of Crowston and Williams (1999) which only focused on the results of search queries with the search term “FAQ`s “. So it is important that the search terms used for querying the search engines are chosen carefully to prevent the introduction of biases in the collection of websites.

4. Using existing collections of websites

Description: The last common method is using one of the existing collections of websites:

many collections of websites have already been created for research or commercial purposes. Examples of these existing website collections are: the Yahoo! Directory, WebKB, Cade directory, ODP directory and the Engineering Electric Library.

Discussion: The method of using an existing collection of websites is often recommended,

(27)

The method used in this study is a combination of three of the four described methods: a random selection of websites (1) is collected by a custom crawler (2) that uses the Google search engine (3). This combination of methods is chosen to reduce the amount of biases introduced in the website collection as much as possible. While at the same time eliminating some of the drawbacks the individual methods contain. The method of a random selection of websites is primarily chosen to prevent the introduction of a bias by myself. The random selection of websites is combined with random subjects who ensure that websites from all the common web genres are included. Also issues similar to the research of Montesi & Navarrete (2008), selection of random websites with a specific topic, are prevented with this method. The custom crawler enables the development of the collection to be conducted automatically. This is considerably faster than the manual counterpart where everything has to be done manually: entering the search queries, extracting the websites from the results and compiling them into a collection. The search engine is chosen for its flexibility: multiple search queries can be submitted to the search engine at a high rate and the results are easily extractable. The combination of the random subjects with the flexibility of a search engine which is automatically queried by a crawler forms a robust method for the development of a collection of websites for this study.

The website collection method is summarized in figure 6 below. The letter “n” in the figure represents a variable number that can be changed to find an optimal website collection. This website collection represents only a small sample out of an enormous population. However this sample is expected to be large enough to find the most common web genres. Which is sufficient for the goal of this study: finding common functional groups of websites (web genres) which website developers can use in their website development. For a more grounded theory based taxonomy of websites a larger website collection is necessary, as for example in Pierre (2001) which used a sample of almost 30000 websites. However such a large collection of websites also poses an almost insurmountable amount of work which unnecessary for this study and is also not a desirable solution with respect to the time and resource limitations.

Figure 6 - Website collection method (phase 1)

As can be seen in figure 6 the first part of the website collection method focuses on achieving a random selection of websites to prevent the introduction of biases into the website collection. This is

Phase 1 - Website Collection method n most frequent used words in Dutch n most frequent used words in English n websites per word n websites per word List of websites List of websites Combined list of websites (results of crawler) Removal of doubles entries Results of crawler with double entries removed Websites created by iWink Final website collection (output phase 1) n most frequent

used search terms (Netherlands)

n websites per

search term List of websites

n most frequent used search terms (United Kingdom)

n websites per

(28)

achieved by combining different sources that contain possible subjects which subsequently are used as input for the website crawler. Four sources are selected for this purpose:

1. The most frequent used words in the Dutch language

Source: http://wortschatz.uni-leipzig.de/Papers/top100nl.txt

2. The most frequent used words in the English language

Source: http://www.duboislc.org/EducationWatch/First100Words.html

3. The most frequent used search terms in the Netherlands

Source: http://www.google.nl/intl/nl/press/zeitgeist2010/regions/nl.html

4. The most frequent used search terms in the United Kingdom

Source: http://www.google.nl/intl/nl/press/zeitgeist2010/regions/uk.html

The division between the Dutch and English language and between the Netherlands and the United Kingdom is chosen to maximize the websites variation due to the different demographic properties of the language and countries. Additionally the combination of these four different sources guarantees that there is as little bias present in the website collection as possible and in the event that there are biases present they are not introduced by myself. The second part of the website collection method is the development of a website crawler. The website crawler used in this study is an automated search script developed by iWink which submits the subjects present in the sources above as queries into the Google search engine. Afterwards the website crawler automatically generates a list of a predetermined number of unique websites from the search results. The search script requires three parameters as input to function which are explained in table 3 below:

Table 3 - Website crawler (search script) parameters

Parameter Explanation Input

$MAX_PER_TERM The number of unique websites that are selected per search term (see $TERMS) in the search result

n number of unique

search results $TERMS The subjects that are used as search terms List of subjects $LANGUAGE The preferred language of the search results NL or EN

This search script is executed four times, one for each source discussed above. The outcomes of each script execution are then combined into one large list of websites. This list of websites is screened to remove all double websites entries that are a result of the combination of the four different script executions. The last step in the first phase of the classification framework is to combine the screened list of websites with all the websites designed by iWink to form the collection of websites that is used as input for phase two of the classification framework. These websites developed by iWink are added to the collection of websites because they are excellent examples of websites commonly developed by iWink and other website developers. By adding these websites to the website collection it is guaranteed that part of the website collection contains websites, and therefore also the corresponding web genres, that are useful for iWink and other professional website developers.

3.2 Functionalities identification (phase 2)

(29)

1. If there is any interaction between the visitor and the website, besides using the navigation options on the website. Examples are a catalogue and a checkout function for online stores. 2. Website pages that provide essential information to the visitor and which cannot be

removed without seriously affecting the purpose of the website. Examples are an about page and contact information for a corporate website.

Phase two of the classification framework is divided into two separate parts. In the first part a sample of the website collection is analyzed to find the main functionalities. The identified functionalities from part one are subsequently used in part two to analyze the complete website collection (including the websites present in the sample).

This method is chosen after a small scale test was conducted in finding the main functionalities of 75 random websites. The main conclusion of this small scale test was that it is important to have a proper definition of the website functionalities. Without a proper definition of the functionalities it is very difficult to determine if a website uses a certain functionality, up to a point that it almost becomes guesswork. Another discovery of this small scale test is that in the absence of properly defined functionalities it will take a long time to analyze a large collection of websites. There are two main reasons for this. The first reason is that without a definition of the functionalities a decision has to be made for each individual website if an offered functionality is the same as all of the previously analyzed websites that offered it. The second reason is that in the event that a new functionality is discovered all the websites that were already analyzed have to be analyzed again to check if they also offer this new functionality.

Based upon the findings of the small scale test the decision is made to divide phase two of the classification framework into two separate parts. This method is expected to remedy the problems that arose in the small scale test, because it has the advantage of being flexible enough to allow the most commonly used functionalities to be used in analyzing the complete website collection. However the method is also strict enough to prevent the analysis from taking too long: strictly defined functionalities leave less room for interpretation. Therefore the analysis can progress faster and as an added benefit the result will be more consistent. The two parts of the functionalities identification phase will be described in more detail in the following subchapters.

Genres in websites

Genres in websites

A qualitative research into the genres that

divide different kinds of websites

Master thesis Business and ICT

Jan-Harm Boer

29

August 2011

Genres in website

A qualitative research into the genres that divide different kinds of

websites

Master thesis

Name:

Jan-Harm Boer

Student number:

S1808532

Email:

Janharmboer@chello.nl

Date of completion:

29

August 2011

Institute:

University of Groningen

Faculty:

Economics and Business

Degree program:

Master Business Administration

Specialization:

Business en ICT

Version:

2.1

Status:

Final

Faculty supervision

Supervisor:

Dr. Nicolae B. Szirbik

Co-supervisor:

Dr. Laura Maruster

External Supervision

Supervisor iWink:

Simon Wisselink

Acknowledgments

Summary

Table of contents

Overview of figures

Overview of tables

1 Introduction

1.1 Topic and context

1.2 Problem statement

1.2.1 Research objective

1.2.2 Research question

1.3 Research design

1.3.1 Reflective cycle

1.3.2 Design cycle

1.3.3 Evaluation cycle

1.4 Research limitations

2 Background

2.1 Search strategy

2.2 Analysis of comparable research

2.2.1 Website collection and classification method used in comparable research

2.2.2 Comparison of the results in comparable research

2.3 Website development method of iWink

2.3.1 Analyze and advise

2.3.2 Interaction design

2.3.3 Web design

2.3.4 Coding

2.3.5 Web development

3 Classification Framework

3.1 Website collection (phase 1)

3.2 Functionalities identification (phase 2)

3.2.1 Functionalities identification part 1 – analyzing a sample of the website collection