• No results found

Gaining Insight In A City’s Business Climate By Social Multimedia

N/A
N/A
Protected

Academic year: 2021

Share "Gaining Insight In A City’s Business Climate By Social Multimedia"

Copied!
16
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Gaining Insight In A City’s Business Climate By Social

Multimedia

Michiel Huizing

University of Amsterdam, Faculty of Science Science Park 904

1094XH Amsterdam M.J.W.Huizing@uva.nl

ABSTRACT

In this article a new way to visualize the attractiveness of metropolitan areas by means of open and social data will be presented. Studies into the visualization of the attractive-ness of (metropolitan) areas has not been studied widely. Even though the attractiveness of areas is well known in popular press as the business climate, no definition is coined yet. This study therefore focusses on the definition and vi-sualization of the business climate in a city. Even though there are initiatives to gain insight in local events in cities, there have not been studies into establishing insight in the business climate of a city. In this study the main focus lies on the incorporation of open data with social multimedia from Twitter. By using textual and visual algorithms, a visualization is presented on the basis of these data that ultimately gains insight in the business climate of a city.

Categories and Subject Descriptors

H.4 [Information Systems Applications]: Miscellaneous; H.5 [Information Interfaces and Presentation]: User Interfaces

General Terms

Algorithms, Design, Human Factors and Theory

Keywords

Social Multimedia, Visual Analytics, Information Visualiza-tion, Insight, Business Climate

1.

INTRODUCTION

In January 2016 news website nu.nl reported that a record number of foreign companies had settled in the city of Am-sterdam [2]. The sectors that attracted the most businesses were found in the information and communication sector (ICT) and the financial service industry. This record num-ber of settlers in the city of Amsterdam raised a question, what makes a specific city – or metropolitan area – attrac-tive for (foreign) businesses to settle in? If one would believe the municipality of Amsterdam, the main objective for busi-nesses to come to the main capital of the Netherlands are the “convenient location for both physical- and digital ac-cessibility” [27]. Real numbers about the settlement, show that 10,000 new businesses have been helped to settle in the area of Amsterdam, which is a raise of 26% in comparison to the year before1. It appears that there are some factors 1

Wederom record nieuwe

buiten-landse bedrijven in regio Amsterdam https://www.amsterdam.nl/gemeente/college/individuele- paginas/kajsa-ollon-gren/persberichten/persberichten-2016/wederom-record/ accessed on April, 24 2016

that are of importance for businesses to settle in a specific area, but the big unknown remains the specific factors that play a major role in this decision.

Usually, when an indication of the attractiveness of a spe-cific (metropolitan) area is given, an objective measurement of the attractiveness remains left behind or it does approxi-mate the attractiveness of an area by means of an indication of some concept. Such an example of a measurement is given by the municipality of Amsterdam, who do indicate its at-tractiveness to settle by means of the World Bank Report2– it compares 189 world economies by means of 11 categories which range from business start-up, getting credits, electric-ity and more. Although this seems a good measurement, it raises the question if it really utilizes all the information that is key to the definition of the attractive factors in a specific area. Although there are some measurements that are try-ing to define the attractiveness of a specific (metropolitan) area, there seems to be no consensus of a definition that measures it in the end.

1.1

Objective Data

One potential way to find what attracts businesses is to fo-cus on the available data available in a certain metropolitan area. However, what potential sources of data or informa-tion may be available that may give an indicainforma-tion the attrac-tiveness of a specific area? In recent years the term “smart city” has gained notoriety in (amongst other) academic lit-erature, of which the definition may include: “policies re-lated to human capital, education, economic development and governance and how they can be enhanced by ICT”. [21]. The data that is distributed and released by cities or governments are open data. One of these examples is the city of Amsterdam, which has launched the “Amster-dam Open Data program”3. The data differs in granular-ity and precision as some of it has been provided by the municipality itself and other provided by third parties with potential data (such as private corporations – parking statis-tics, public transportation, etc.). According to the research conducted by Kitchin the importance of the data generated by the city’s digital infrastructure is stressed, and that the data and the analysis of it does aid in “every day living and

2Ease of Doing Business in Netherlands

http://www.doingbusiness.org/rankings accessed on April, 25 2016

3

Open Dataset Collection from the municipality of Amster-dam http://data.amsterAmster-dam.nl/ accessed on: September, 28 2015

(2)

decision-making, and empowers alternative visions for city development” [21].

1.2

Subjective Data

Objective data as shared by the government (or in this case the city of Amsterdam) shows only one side of the story. That is, the subjective points of view are not taken into ac-count when looking into what happens in a city; As these opinions are not displayed in open data. One potential so-lution to find a large amount of opinions is looking on social media. As the name implies, social media contains the un-censored opinion about almost anything. Even more, social media have become more and more popular over the years. For example, Twitter has 241 million users that are active on the service4. Having these vast amount of active users fuels the question whether it contains interesting data that capture the attractiveness of an area. In a study conducted to find the motivations on why “non-commercial users” ac-tually want to send tweets, it was concluded that the in-tentions were mainly intrinsic and image-based [33]. Also, at the beginning the user’s intensions are mainly intrinsic whereas, when users gain more followers the intensions are more image-related. In a research comparing the difference between posts on Sina Weibo and Twitter, it was concluded that 44.6% of the users mentioned the location, 16.0% an organization and 39.4% a person [15]. Given these facts, Twitter may include the subjective views of real persons, together with an real-life depiction of these subjective views by means of multimedia (in this case, pictures). These two means of data may potentially depict the subjective views of people, including a rich set of pictures that are posted by more experienced Twitter users.

These conclusions about Twitter are interesting as these data may contain potential information that specifies the attractiveness for businesses to settle in a specific area. How-ever, gaining insight in data by just looking at data alone seems not a feasible option: information that may be ex-tracted from open data and social media should be visual-ized to create insight. According to North the purpose of visualization is insight, which means: “The capacity to dis-cern the true nature of a situation; The act or outcome to grasping the inward or hidden nature of things or of per-ceiving in an intuitive manner” [25]. However visualization alone does not give insight per se: a powerful method is needed to gain insight in the data to discern the factors that provide insight in the attractiveness for businesses, namely: visual analytics. According to Keim et al. visual analytics encompasses the “science of analytical reasoning facilitated by interactive visual interfaces”. This means that – by means of the data and information that is gained from open data and social media data – insight may be given to a specific target audience by means of a visualization. By utilizing visual analytics the data will be transformed and calculated in such a way that insight may be generated for the target audience.

On the basis of the above mentioned notions, the research

4

Number of monthly active international Twitter users from 2nd quarter 2010 to 4th quarter 2015 http://www.statista.com/statistics/274565/monthly-active-international-Twitter-users/ accessed on: April 19, 2016

question can be stated as: How can social multimedia, in combination with textual, geographical and temporal data, be visualized in such a way that insight can be gained into the business climate of specific areas in a city?

This research question will be answered on the basis of the following sub-questions:

• Question 1: Which factors determine the business climate in a city and what role does social media play? • Question 2: How to provide an integrated view, by means of a visualization, of the business climate in a city?

• Question 3: How can an integrated visualization provide insight into the business climate for users? • Question 4: How can the gain in business climate

insight be measured?

The research is organized as follows: Firstly the theory con-cerning the visualization of data will be discussed, including how insight should be gained into the data (such as the attractiveness of an area), then the related work that will display related studies will be elaborated on. Furthermore, the methods to gain insight in the business climate will be described. On the basis of an evaluation with a domain ex-pert a conclusion will be drawn on the research questions. Lastly, a discussion and future work section will elaborate on recommendations for future research.

2.

THEORY

In this section the state of the art of visualiziation techniques and an in depth study is done into the definition of the attractiveness of metropolitan areas.

2.1

Business Climate

As this research is going to gain insight to the business cli-mate of cities, the definition of what a business clicli-mate actu-ally encompasses is an important aspect. However, a survey on the Internet does not seem to yield any unified results and it even appears that little effort has gone into the defi-nition of the concept. The concept seems to be used by pop-ular press and media to describe the success of businesses in certain areas, with more or less non-quantifiable indica-tors. And even though there are some efforts to measure the business climate, such as the German IFO business climate measurement, this appears to be a survey measuring the at-titude perception of 7000 businesses on the business climate in Germany [8], which is merely a subjective view towards the business climate. Another interesting observation of an indicator of the business climate is that of the city of Am-sterdam. It uses the World Bank’s “ease of doing business” ranking as a good indicator of the business climate – which measures business start-up, getting credit, getting electric-ity and trading across borders – of their celectric-ity [18]. Plaut and Pluta confirms this observation in a study they performed. They conclude that the definition of the business climate is an ”all-compassing” term, that basically measures a wide range of factors; where the meaning of the definition changes per user who is ”using” the term [28]. And even though the research stems from 1983, it still seems to be contemporary.

(3)

Because this research aims to measure the business climate, the ultimate goal is to underpin factors that measure this undefined concept. In research that was conducted by Eriks-son et al. [14] into the business and people climate and how important this relates to regional performance, two differ-ent theories that describe this are described. These two theories build on two already established concepts (or ar-guments as they name it), namely the amenity-driven argu-ment and the institutional/ evolutionary-driven arguargu-ment. Before these two different arguments will be elaborated, some background is necessary. In earlier days, companies were more traditional in the sense that they were manufac-turing physical goods. Two main factors that play a role in these traditional firms are distance to materials used in pro-duction and distance to market. However, in recent years the knowledge-economy has seen an explosive growth, di-minishing the importance for these traditional factors, and having more interest in the attraction of knowledgeable peo-ple [14]. The different factors that the amenity-driven view and the Institutional/Evolutionary-Driven view entail went along with this shift and will be elaborated on further: As the shift from an economy focussed on traditional firms producing products (such as the agricultural sector) towards an economy that produces knowledge, the amenity-driven view ”argues” that amenities attract talented and creative people towards a specific region or area. Without going deep into the details about the origins of the theory – which is extensively discussed by Eriksson et al. – the amenity-driven view stresses the importance of talent and human capital. This means that the geographic area where a business wants to settle is driven by the already established talents. Eriks-son et al. furthermore quantifies factors that are deemed to have an appealing effect on talented workers, namely: urban areas, multiculturalism, quasi-anonymity, caf´es, theatres, a cosy and authentic urban atmosphere, space, and even a sunny climate [14].

The Institutional/Evolutionary-Driven view mainly focusses on path-dependent and self-reinforcing economic processes [14]. This basically means that already established busi-nesses in a specific area that conduct business in a specific sector, do attract more businesses towards that area that operate in the same sector. This is further supported by in-dustrial sites were more companies in the same sector settle in the same area and work together.

On the basis on the theory described in this section, the fol-lowing definition for the business climate is proposed: the business climate: “is the attractiveness of an area for busi-nesses, which is driven by those factors that play an impor-tant role in the process of doing business and the vitality of their employees”.

2.2

Social Multimedia

Social media is a popular concept, however social multi-media to a lesser extent. Social Multimulti-media encompasses the ”online sources of multimedia content posted in settings that forster significant individual participation and that pro-mote community curation, discussion and re-use of content” [22]. Textual descriptors have been studied extensively by research, however social multimedia (thus, images, audio, video, etc.) are not taken into account as much. One of

Figure 1: The visual analytics process consisting of the parts sources, visualization, hypothesis and in-sight, including the feedback loop taken from : Keim et al. [20]

the reasons social multimedia are not studied as extensively as textual social media, are just because social multimedia is not overlooked, but because it is harder to process. A technique such as computer vision is necessary to extract meaning from images, but also the textual descriptors and metadata are not always reliable. Bulterman [7] stresses the latter point even more.

Smeulders et al. highlight in their research the importance of the semantic gap, namely: “The semantic gap is the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation.” [31]. It is for many situations still an existing problem that the information at hand, does not hold the same information that may be inter-preted by a human. This fact alone stresses the importance of visualization systems that convey the correct information – in this research the visualization of the business climate in a specific region.

However, even though social media and social multimedia have become more important to extract information from, it is by no means the ideal platform. Naaman describes social media in his study as “by no means a magic pill”. In the same research [22] is becomes clear that social media context and metadata is often noisy, often inaccurate, wrong or misleading.

To extend the research on this topic, social multimedia will be used to analyze the contents of social media. This means that in this study the focus will lie on utilizing social multi-media but still including other social multi-media data and open data.

2.3

The Visual Analytics Process

Keim et al. provide in their research the visual analytics process, which is depicted in Figure 1. This formalization of the visual analytics process is divided into four different processes. This research will be explained by means of this model. The model consists of four different parts, namely:

• Sources: Consist of the datasets that will be used; In this research the two categories of data that will be

(4)

used are open data and social media data. This data will include textual descriptors accompanying social multimedia, as well as metadata and geographic data. • Visualization: Consists of the visualization of the business climate; based on theories such as providing insight, cognitive models and interaction theories. • Hypotheses: The hypothesis consists of the

assump-tions about the business climate as stated in section 2.1.

• Insight: By means of the datasets, the visualization and the hypothesis, insight (as described in section 2.3) into the business climate will be provided.

The visual analytics process by Keim et al. [20] can be used to create an visualization that gains insight in the business climate in a city or metropolitan area. The visualization is to include the factors as determined in section 2.1, with open data, social media and social multimedia. This means that the data from these datasets will be analyzed and aggregated in an visualization in which a target user is able to input her own preferences, so that insight can be gained.

2.4

Multiple Views in Visualization

Wang Baldonado et al. [34] propose eight (8) guidelines in their study on a multi view system. Multi view systems are systems that use two or more distinct views to support the investigation of a single conceptual entity. In the study the design guidelines that were proposed are based on the no-tion that multi view systems are highly challenging to de-sign, because of their need for sophisticated coordination mechanisms and layout. The first four (4) guidelines that are presented are: diversity, complementarity, parsimony and decomposition which, according to the study, provides the designer with rules for the selection of multiple views. The last four (4) guidelines space/time resource optimiza-tion, self-evidence, consistency and attention management are presented for interaction design questions and presenta-tion purposes.

In adherence to this study, the 8 guidelines proposed above, will be followed in the development of the visualization.

3.

RELATED WORK

Data visualization on the basis of open data and social (multi-)media is not new, many implementations can be found on the Internet, and in scientific research. In the following sec-tion, an overview of similar applications will be described and their differences to this research will be elaborated on. Nakaji and Yanai [23] are one among first to notice the lack of research on utilizing photos that are attached to tweets. Therefore, in their research they propose a new approach to utilize these photos to implement a system that is able to include the detection of events and the selection of photos that are representative for the detected events in an area. Apart from solely focussing on the photos attached to the tweets, the advantage of the included geotags is that they are utilized to identify the events. In contrast to the in this research investigated business climate, the focus in the paper of Nakaji and Yanai is on the representative depiction of an event in a specific area.

One effort in which topic-modelling was used to visualize data from Twitter, was conducted by Ghosh and Guha [16]. In their study, topic modelling was used to extract meaning from tweets, their implementation was LDA. In this research the main objective focussed on finding any relevant public health topics on Twitter. In the case of this research “obe-sity” was chosen for demonstrational purposes. In the meth-ods of this research, tweets were harvested from Twitter, that originated from the United States of America, together with obesity related queries. In accordance to this research, textual data will be used to investigate the attractiveness of an area in an metropolitan area – though this research only focusses on the textual part, whereas this research focusses more on the use of social (multi)media.

Xia et al. [36] focussed on the detection of hyper-local events in a city, by utilizing social media in their study. The basis for their visualization was a constant social media stream extracting the live flow of images from Twitter, Foursquare and Instagram. One of the most important sources to in-clude and to be able to construct their hyper-local event detection was the use of geographical coordinates that were send along with the tweets. Also, the visualization inter-face that was created showed the detected hyper-local events and gave a depiction of it by means of the harvested images from the different social media providers, this is similar to the approach in this research where in contrast to the focus on large scale visualization of tweets, the focus is more on local events in e.g. cities. Finally, the primary focus was on informing journalists in cities, and give them insight in the event happening in a city, whereas this research aims to provide a more general view of the city to show the attrac-tiveness of a specific neighbourhood.

Lastly, in a technical report by Amsing et al. [1] a visu-alization of the city of Amsterdam was presented. In this technical report, the focus was on the different sources of open data provided by (among others) the city of Amster-dam. The focus in this report was therefore on the objective open data, the analysis of this data and transforming this into a visualization. The aim therefore was to visualize this data, without the explicit aim to gain insight in the data. Whereas open data was the main (and only) focus in their report, the inclusion of another potential data source such as social (multi) media was not investigated.

4.

SYSTEM DESIGN

In this research the focus is on visualizing data from Twit-ter, with an emphasis on the use of multimedia. Several types of analysis exists to extract the meaning from these multimedia enriched tweets. The main focus will lie on the extraction of topics, computer vision techniques to extract concepts, and sentiment of the tweets by means of text anal-ysis. The methods that will be used to accomplish this, and how these techniques will be implemented are elaborated in this section.

4.1

Data Sources

In this section the different data sources from which data will be used, will be discussed and motivated. Also, the methods how these data sources were obtained will be elaborated. As already mentioned in section 2.1, the business climate

(5)

is not a uniformly defined term, and its interpretation dif-fers when it is used. In this research the model by Eriks-son et al. is used to define the business climate, which in-cludes the two main important factors that should define the business climate, namely: Amendity-driven view and the Institutional/Evolutionary-driven view. Since multime-dia extracted from social memultime-dia is an important factor to take into consideration in the scope of the research, more ob-jective data should not be overlooked to create an integrated view of the business climate. One valuable source that will be used in this research is the Amsterdam Open Data ini-tiative, which is a platform on which several sources of open data are collected, including data from the municipality of Amsterdam itself [26]. This collection of datasets includes information released by the city of Amsterdam, but also in-cludes information that is released by private corporations, such as information pertaining to the availability of parking spaces throughout the city. On the basis of the established concept of the business climate in section 2.1 and the avail-able sources, the following concepts are considered to be important in this study into the business climate: Recre-ational Facilities, Amenities, Public Transportation, Safety and Office Space.

It would be a naive approach to assume that people tweet about the business climate explicitly. Therefore, the indica-tors that have to do with the business climate as described, will be used to analyse the specific business climate in a specific area. Extracting tweets from Twitter is performed by the Twitter steaming API. The main reason for doing so is the biased return queries from the Twitter search API. According to the statement by Twitter, the search API: “is focused on relevance and not completeness”, referring to the fact that tweets and users may be missing from the results and that the search API is build for relevance rather than completeness5

Because there is a focus for completeness it is important to realize an as objective as possible view on the business climate. The tweets that are extracted from Twitter are only filtered on the basis of their location (i.e. is this tweet send in the Amsterdam area?) and whether it contains an image. No other filters are used for the extraction of the tweets, hence creating an as little biased view of the busi-ness climate of Amsterdam. The initial set of tweets were composed of 65,436 tweets, which were collected over a time span of several weeks. The initial harvest was conducted through the Twitter Streaming API. On the initial analysis of the tweets’ language, it was evident that the most preva-lent languages in the dataset were English and Dutch. To perform any analysis the tweets themselves had to be cleaned. This process included the remove of the Twitter specific “@”s and hash-tags (#). Since the “at” (@) signs do not add any meaning to the tweets itself, because it refers to the name of another user on Twitter, the text following the sign were removed. The hashtag on the other hand may include potential information, as it is a means of organizing or sorting tweets – the tag generally includes an event, or a context [9]. The important data that is used from a tweet, is the tweet itself (the text), the geographic location of it

5Twitter Search API https://dev.twitter.com/rest/public/

search accessed on February, 8 2016

(the coordinates), and the accompanying piece of multime-dia (picture) of that tweet.

4.2

Data Analysis

As this research is focussed on social multimedia, pictures are the main objective in tweets; however, textual clues from tweets play an important role as well. The text includes potential information about topics, that may contribute to the definition of the business climate in a specific area. To extract meaning from the text of tweets, the approach is to use topic modelling on the tweets, which can be done by Latent Dirichlet Allocation (LDA) [3]. This method can ultimately be used to identify topics that have to do with the business climate indicators, as mentioned in section 2.1. This means that by means of identifying the words that are assumed to be in a specific category, a topic that has to do with the business climate (e.g. about green areas) will be identified. By identifying the topic of a given tweet, the ability arises to infer the content of a tweet, and finally assign it to one of the indicators of the business climate.

LDA was discussed, and presented in research conducted by Blei et al. [3]. In this algorithm, the basic assumption is that a document of text is composed out of an n number of random latent topics. The algorithm is trained on all the documents from which the latent topics should be extracted from. On the basis of the n number of topics as input, the algorithm will try to find the n number of latent topics in the documents. The latent topics that are found will not be named by the LDA algorithm itself, but are rather repre-sented by a distribution of words, of which the score (proba-bility) of certainty that these words belong together is given. The last step is for the human user to interpret the topics that were generated. One of the challenges of the LDA ap-proach, is selecting the number of topics as this number is not known before extracting topics from a set of document, and since the is no one truth for this number. Finding the correct number of topics to be found is therefore based on trial and error. The LDA approach will facilitate the cate-gorization of the tweets by the business climate categories, so they can be aggregated and analysed together with the other sources of data. In Appendix B the raw LDA output with the chosen topics can be found.

The tweet’s meaning alone is not useful in determining the business climate of a specific region – it is also important to interpret whether the tweet that the harvested tweet’s sentiment is. For the analysis for the sentiment of a tweet the tool Pattern for Python was used by Smedt and Daelemans [30]. As most harvested tweets are either in the English or Dutch language, it is necessary to have a tool available that is capable of processing the sentiment of both those languages. In the Pattern tool, both the English as Dutch language were trained for sentiment analysis. In the Pattern library for Python, the Dutch language was trained on Dutch book reports, ultimately resulting in a lexicon of 5,500 words that were given a score on the basis of polarity (i.e. positive or negative), subjectivity (i.e. objective or subjective) and the intensity (which is necessary due to the use of adverbs in the Dutch language) [10].

Using textual clues in tweets will not inherently identify the visual concepts that are available in the images that are

(6)

ac-companied with the tweets. To ultimately identify the dif-ferent visual concepts, it is necessary to utilize some form of concept detection in images. In this project the GoogleNet algorithm was used, which makes use of a deep convolutional neural network architecture [32]. The objective of the algo-rithm is to classify and detect different concepts that are present in an image. However, the presence in an image is not a matter of true or false: per concept the probabil-ity (or chance) that this concept is present in an image is given back. The GoogleNet algorithm used in this project is trained on the ImageNet dataset. The ImageNet dataset is described by Deng et al. [12].

On the basis of the in section 2.1 stated requirements for business climate measurement and the ImageNet results, the ImageNet categories that closely match these criteria are se-lected. This means that images in the ImageNet collection that represent the business climate requirements (e.g. a po-lice car may indicate safety) will be picked. On the basis of those matches the different concepts that play a role in the definition of the business climate will be chosen. These image concepts will ultimately play a role in the definition of the business climate in a specific area. The concepts that were used to identify the business climate are added in Ap-pendix C.

4.3

Data Visualization

The last step is the visualization process, in which the col-lected data is visualized. This means that the relevant posi-tive indicators that were extracted are visualized, and there-fore create insight in the current situation of the business climate of a city. The most promising tool that enables such visualization is D3.js. D3 is a tool that enables users to create visualization by means of DOM (document object model) manipulation in a browser, by using Javascript [5]. Evenmore, with D3 it is possible to: “selectively bind in-put data to arbitrary document elements, applying dynamic transformation to both generate and modify content” [5]. Also, there is a possibility to easily enable animation to the elements, without using for example Flash, which is not sup-ported by mobile devices.

On the basis of different techniques that will be elaborated further in this section, the visualization in the form of a prototype, is depicted in Figure 3.

One of the ways to gain insight in the business climate is to see where multimedia containing tweets are being sent from. Since the quantity of tweets that are being sent can be quite large, visualizing the concentration of multimedia containing tweets in an area may gain insight in the areas where is tweeted from the most. One of the ways to gain insight in where multimedia containing tweets are being sent, is by means of hexagons. In a report by Nicholas Lewin-Koh [24], the workings and ideas behind hexagonal binning are explained.

The practice of hexogonal binning is that of creating an evenly distributed grid. The method for creating a hexagon distribution is elaborated in the report of Nicholas Lewin-Koh, namely: (1). Firstly a x-y plane is tessellated with hexagons. (2). Secondly, the number of points that are falling in each hexagon, are are counted in that specific

Figure 2: Visualization of the intensity of tweets with multimedia in Amsterdam by means of a hexag-onal chart

hexagon. (3). Lastly, the hexagons that have a 1 or more counts are plotted using a color ramp

Advantages of using hexagons over other shapes (well know examples are squares and circles), are that hexagons do have symmetry with their neighbours. This means that accord-ing to the study of Nicholas Lewin-Koh [24] the packaccord-ing of hexagons in contrast to squares is, that hexagons are 13% more efficient. Another benefit for using hexagons is that they are less biased for displaying densities that other regu-lar tessellations (such as squares and circles).

This visualization is adapted and based on the visualiza-tion by Bostock [4]. The major changes that are made to the hexagon visualization is the addition of the geographical implementation of the coordinates. Since tweets are located in WGS84 format (which is the World Geodetic Format, which defines a standard coordinate system for the Earth) [11] and the standard visualization expects simple x and y coordinates. As can be seen in the end result of the visu-alization in Figure 2 the hexagons are precisely located on the map of Amsterdam, incorporating the intensity of the number of multimedia containing tweets.

The choropleth is a depiction of the suitability of the busi-ness climate in cities or (metropolitan) areas (in this case the city of Amsterdam). Choropleth charts show the intensity (in this case, the suitability) of an area by color variation. The chart should convey the important areas of interest with ease, by giving the darker colors the meaning of better suit-ablity. The choropleth is based on a quantitive scale6, which generates colors by means of a color ramp. This color ramp was chosen to be purple to avoid confusion with the selec-tion colors of the areas. This is done in adherence to the advise of Heer et al. [17]. They conclude that the choro-pleth chart, even though widely used, needs some attention to avoid that a shaded value in the map will affect the un-derlying area of the geographic region. Also, the map will be fed with normalized data, rather than with the raw data, also in accordance to Heer et al.’s advise. The working of the choropleth is quite simple. It serves as a means of input by

6https://github.com/mbostock/d3/wiki/Quantitative-Scales

(7)

which the user can select the area of choice, and can select up to five (5) different areas to compare. The choropleth uses the standard 10 colors of Bostock’s D3. The choropleth is depicted on the right side in Figure 3.

Not directly integrated in the choropleth, but interconnected is the picker, which shows the selected areas. This picker shows the same colors as the areas that are selected in the choropleth, making having an overview about the selected areas seamless. In Figure 3, at the bottom the picker is depicted.

A radar chart has been chosen in the visualization prototype, because of the easiness of representation of multivariable data. According to Wilkinson [35]: “A radar chart is a line graphic in polar parallel coordinates.” In an article by [13] [13] the radar chart is elaborated on further: Radar charts are closely related to the so-called “star plots” or “kiviat plots”, although these plots are not filled with a color. The usage for these plots is for viewing multivariate data in a compact form. By mapping each variable on one of the axes it is possible to draw a line between all the different values. This ultimately creates an area which can be colored. By creating some transparency between the areas, it is possible to compare the different areas by means of this chart. The radar chart is shown in Figure 3. On the different axes the business climate factors are mapped, as described in section 2.1.

In the article by Keim et al. [20] the visual analytics mantra was proposed. In this mantra: “details on demand” was one of its contents. In this visualization the details on demand are presented by the details of the tweet data, namely the score per business climate factor.

One of the most important input methods in the visualiza-tion, are the sliders by which the user can select the impor-tance of business climate factors, including controlling the influence of social media in the visualization and the influ-ence the pictures have on the total score. This means that the user is also able to see the appropriateness of an area without social media data and can therefore only rely on open data. The sliders are depicted in the screenshot of the visualization in Figure 3.

5.

EVALUATION

Giving insight into the business climate is not an easy task, it depends on whether the supposed user finds the tool helpful in giving insight. Because of this, an evaluation is conducted in which the tool that was developed was tested. Section 1 mentioned the analysis of insight by North [25]. In his paper about measuring insight in visualizations, he proposes three different key points that an evaluation of insight should en-compass, while leaving the original benchmark tests from the protocol entirely. The advantage for doing so is according to him: “The key advantage of eliminating benchmark tasks is that it reveals what insights visualization users gained.”. The three key points North proposes are:

An open ended protocol that emphasizes the importance of letting users explore the visualization in a way that they should indicate when they think they have learned every-thing from the visualization.

A qualitative insight analysis, by means of a think-aloud protocol that makes users able to verbalize their thoughts. This enables the evaluators to capture the user’s insights. An emphasis on domain relevance, where target evalu-ators are important to review the visualization. North con-cludes that “Experimenters should pay special attention to cases where the user goes beyond dry data analysis, and makes domain-specific inferences and hypotheses.”. [25] This evaluation into the business climate has been performed as a first evaluation. This means that the evaluation was performed by a domain expert, who was the first one to see the prototype made in this research. The aim for the evaluation is to find whether the visualization does provide insight into the business climate of a city, and which factors should be emphasized more or could be improved.

5.1

The Protocol

North advises: during the evaluation the think-aloud method was used to conduct the evaluation. Although the think aloud method seems to be a well accepted method to test user interfaces, there seems to be – until this day – no uni-fied consensus on the actual method how it should be con-ducted. The first mentions of the think aloud protocol stem from the 1980’s and is still a popular and widely accepted method in usability testing. The method that entails think-ing aloud is quite simple. The participant is asked to per-form a task on the prototype; this task is then conducted by the participant, though he/she is asked to “think aloud” and is therefore asked to vocalize its thoughts in the process of performing the task.

Afterwards, the system usability (SUS) scoring system is used to elicit the opinion of the participant, in which he/she is asked ten (10) questions encompassing the usability of the system. The System Usability Score is invented by Brooke in the 1980’s, and described it as: “a reliable, low-cost usability scale that can be used for global assessments of systems usability” [6].

5.2

Evaluation Execution

For the evaluation of the prototype in the project, a domain expert from the municipality was asked to evaluate the vi-sualization of the business climate in Amsterdam. This is because a domain expert from the city of Amsterdam is able to provide an unbiased view on the visualization; Evenmore, since the expert has knowledge about the city of Amsterdam and about the provided data so potential errors may also be found.

5.3

Evaluation Result

The total length of the open-ended think aloud session was approximately 45 minutes. During the first evaluation round of the prototype developed in this research, the conclusions in the following list can be drawn.

Bias in Data The domain expert concluded that data around the city center of Amsterdam is standard biased; i.e. when metrics are used that calculate the number of facilities in a specific area, the city center will always represent a densely populated area with facilities, e.g. the city center counts a

(8)

Figure 3: The final visualization of the business climate

vast amount of amenities. By not taking this into account, the visualization shows always better scores around the city center. It is therefore advised by the domain expert to take into account more densely populated areas, so that it does not bias the other areas. The domain expert though, saw some interesting patterns he did not initially expect – such as a substantial number of amenities in a specific area. Implicit Information The city of Amsterdam has implicit information about settling areas for businesses, so they can advise them on the basis of this knowledge. Information pertaining to implicit information is not represented in the visualization yet, nor in the data that is fed into the visual-ization. One of the recommendations by the domain expert was to have a dialogue with the target audience, for whom the visulization is made. The expert had the feeling that the visualization was driven from data, where the end result was a “dashboard”. He said that to make impact for a city as Amsterdam and to present insight into the business climate of a city, a more user-centered approach would be needed. Business Climate Representation During the evalua-tion session of the visualizaevalua-tion some of the tweets that were shown to be into a specific business climate factor, did not initially create an association with the business climate in the city of Amsterdam. The domain expert said that he ex-pected the photos to have a more concrete depiction of the area selected, i.e. the pictures should show an impression of the selected areas, whereas at the evaluation session the photos included “people travelling to Texel” and “A cat”; ac-cording to the domain expert these photos did not attribute to the business climate in Amsterdam. This problem has been solved by incorporating ”establishment” photos from the Google Places API 7. These photos are added to the

7

Google Places API Web Service

visualization, creating a view of the atmosphere.

Input Methods The domain expert indicated that the in-put controls (i.e. the sliders, the hexagons) were in need of explanation. The domain expert said that a user in the current start position of the visualization – with all the slid-ers on zero (0) the choropleth white, and the radar chart empty – would not initially understand the workings of the visualization. Also, he indicated that the controls were not explained clearly. This issues was resolved by incorporating information at the sliders, which indicate what they mea-sure. Also, the sliders were set at the center to avoid showing a white choropleth.

Insight in Raw Data One of the drawbacks of the visual-ization, according to the domain expert, was the lacking of insight into the data itself. This means that the normalized values (in percentages) do not tell the complete story – or do tell enough to understand the data. The domain expert elaborated by saying that having insight in, for example, the number of amenities would add value to the visualization, by which he would get more insight in the data. The remarks of the domain expert are easily solvable by adding the real numbers to the visualization – as these numbers feed the normalization process of the visualization.

SUS Scoring At the end of the evaluation the domain ex-pert filled in a SUS score form. The total score from this usability is 47.5. This means that the visualization needs to be improved, as the current version does not meet the user’s usability expectations. By the above mentioned improve-ments, it is expected that the SUS score would improve. The SUS score form can be found in Appendix A.

https://developers.google.com/places/web-service/photos accessed on May 2, 2016

(9)

5.4

Evaluation Conclusion

From the first evaluation round for the prototype in this re-search, it can be seen that some critical remarks have been made about the visualization. One of the most important conclusions that can be drawn from the evaluation session is that the visualization into the business climate of a city – in this case Amsterdam – is not an easy task and needs improvement. It, however, should be mentioned that the do-main expert thought the initial prototype was a nice start for gaining insight in the business climate of a city as Am-sterdam. The most important improvement requirement, that can be concluded is the incorporation of the implicit data by domain experts and businesses that want to settle in a specific area. By the incorporation of this informa-tion, the visualization should be able to give better insight in the business climate of an (metropolitan) area. Also, some interesting patterns were found by the domain expert in the visualization, indicating the potential of the visualiza-tion. This means that even taking into account the domain expert’s criticism, the business climate evaluation with the remarks and improvement guidelines should be feasible for future improvement.

6.

DISCUSSION

On the basis of the LDA technique, tweets were categorized on the basis of the business climate factors that have been established in section 2.1. The techniques in theory, should perform the categorization on cleaned tweets; however it is doubtful whether the current implementation does suit the diversity of language that is found on Twitter. This point is underpinned by Schreck and Keim [29], who de-scribed the dynamic nature of social media, and its creation of new definitions and jargons. As described in section 4.2 the GoogleNet algorithm was used to classify images, on the basis of pre-trained concepts from the ImageNet dataset. Even though the performance of the algorithm is very good [32], there are some downsides of the ImageNet set on which the algorithm was trained. A brief overview of the chosen categories in the set revealed that the diversity of origin is large as many of the pictures appear to have North Amer-ican characteristics. Also, the calculation of the results of the textual, visual and sentiment results do not always at-tribute equally to the result and should therefore be tuned more finely to shoe a more representative result.

During the analysis of the tweets, it became apparent that not all tweets contain an exact location. This means that the tweet was sent from a specific area (e.g. the Amster-dam Area), rather than from a specific point. Because the area was collected as an array of points, the center was taken from that set of points. This showed that the center of Ams-terdam contains much more activity than other areas. Even though this may be true, tweets send from other areas were sometimes classified as being in the center as well. During the evaluation the domain expert also noted that a bias for most of the business climate factors in this research exists around the city center. Due to this limitation, including Twitter data that also concentrates around the city center, a bias is present around the city center, which influences the visualization correctness in terms of suitability of the business climate in an area.

It should be said that the Hexbin implementation has only

incorporated the density of all tweets that were collected for this research. No functionality was built in, to change the density on the basis of user defined inputs. The function-ality was though requested during the evaluation session. The sentiment of tweets have been calculated on the basis of the Pattern python library. As already outlined in sec-tion 4.2, English sentiment analysis has been widely avail-able, whereas Dutch is not that popular. In the Pattern library the Dutch language was included for sentiment anal-ysis. However, the Pattern library was trained on Dutch book reports – this may pose a problem as already men-tioned, since tweets contain a lot of words that are not in book reports.

7.

CONCLUSION

In this research, an investigation was done into the possibil-ities of gaining insight in the business climate of cpossibil-ities. The main objective of the research was to include social multime-dia in the analysis, where the focus in this research was on pictures. During the investigation into the business climate, it was found that not many research has gone into the defini-tion of a business climate. Methods as LDA and GoogleNet are utilzed to extract concepts and meaning from open and social media data. LDA is used to elicit business climate in-dicator topics from the textual data from Twitter, whereas GoogleNet was used to extract business climate indicators from the images posted to Twitter. The total number of 65,436 tweets were gained from Twitter, which included the location of the tweet, the tweet text and the accompany-ing image. All the elicited topics and business climate data were aggregated in a visualization, which was developed in the form of a prototype.

In the first evaluation round, it can be concluded that the visualization shows potential to gain insight in the business climate of a city. However, due to the fact that there was only one evaluation round conducted, some improvements should be implemented to gain an even better insight in the business climate of Amsterdam. Other conclusions that can be drawn, is that current technologies may limit the full po-tential data have to give insight in the data gathered; This applies for open data and social (multi-)media. Since the lat-ter is very unstructured, it is hard with current technologies to obtain the full potential from these unstructured forms of data.

8.

FUTURE WORK

In the evaluation in section 5.3 it was concluded that the do-main expert missed implicit data. Because no target users were involved in the initial design of the visualization, this implicit information is not included. The other factor that does attribute to this limitation is the lack of implicit infor-mation in the open or social (multi-)media data – no implicit data can also be extracted from it.

In future work, therefore, the focus should be on the im-provement of the visualization and analysis on the basis of implicit data. Because of the limitation in this research of performing only one (primary) evaluation round, the focus should be on more interaction and evaluation with the tar-get users to elicit their (implicit) preferences. This should be conducted by improving on collaboration with the tar-get users, and elicit important implicit information on their

(10)

motives to settle in a specific area.

9.

ACKNOWLEDGMENTS

First of all, I would like my supervisor Marcel Worring for his excellent guidance in this thesis project; Also, I would like to thank the city of Amsterdam, and especially Jasper Soetendaal, for taking the time to evaluate the prototype in this research. Lastly, I would like to thank everybody else who has guided me mentally through the months I worked on this project.

References

[1] Amsing, T., van Alphen, E., Haitsma Mullier, L., Huizing, M., and Cuijpers, K. (2015). Visualization of the Business Climate in Amsterdam.

[2] ANP, N. (2016). Recordaantal buitenlandse bedrijven naar Amsterdam.

[3] Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirich-let allocation. The Journal of Machine Learning Research, 3:993–1022.

[4] Bostock, M. (2012). Hexagonal Binning - bl.ocks.org. [5] Bostock, M., Ogievetsky, V., and Heer, J. (2011). D3

data-driven documents. IEEE Transactions on Visualization and Computer Graphics, 17(12):2301–2309.

[6] Brooke, J. (1996). SUS - A quick and dirty usability scale. Usability evaluation in industry, 189(194):4–7.

[7] Bulterman, D. C. A. (2004). Is it time for a moratorium on metadata? IEEE Multimedia, 11(4):10–17.

[8] CES (2015). CESifo Group Munich - Renewed Increase in the Ifo Business Climate Index.

[9] Chang, H.-C. (2010). A new perspective on Twitter hashtag use: Diffusion of innovation theory. Proceedings of the Ameri-can Society for Information Science and Technology, 47(1):1– 4.

[10] De Smedt, T. and Daelemans, W. (2012). ˆa ˘AIJVreselijk mooi!ˆa ˘A˙I (terribly beautiful): A Subjectivity Lexicon for Dutch Adjectives. Proceedings of the Eight International Con-ference on Language Resources and Evaluation (LREC’12), pages 3568–3572.

[11] Decker, B. L. (1986). World Geodetic System 1984. [12] Deng, J. D. J., Dong, W. D. W., Socher, R., Li, J. L.

L.-J., Li, K. L. K., and Fei-Fei, L. F.-F. L. (2009). ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 2–9. [13] Draper, G. M., Livnat, Y., and Riesenfeld, R. F. (2009).

A survey of radial methods for information visualization. In IEEE Transactions on Visualization and Computer Graphics, volume 15, pages 759–776.

[14] Eriksson, R. H., Hansen, H. K., and Lindgren, U. (2014). The Importance of Business Climate and People Climate on Regional Performance. Regional Studies.

[15] Gao, Q., Abel, F., Houben, G. J., and Yu, Y. (2012). A comparative study of users’ microblogging behavior on Sina Weibo and Twitter. In Lecture Notes in Computer Science, volume 7379 LNCS, pages 88–101.

[16] Ghosh, D. D. and Guha, R. (2013). What are we ’tweet-ing’ about obesity? Mapping tweets with Topic Modeling and Geographic Information System. Cartography and geographic information science, 40(2):90–102.

[17] Heer, J., Bostock, M., and Ogievetsky, V. (2010). A Tour through the Visualization Zoo. Communications of the ACM, 53(6):59–67.

[18] Iamsterdam.com (2016). The Netherlands attractive busi-ness climate | I amsterdam.

[19] Keim, D., Andrienko, G., Fekete, J. D., G¨org, C., Kohlham-mer, J., and Melan¸con, G. (2008a). Visual analytics: Defini-tion, process, and challenges. In Lecture Notes in Computer Science, volume 4950 LNCS, pages 154–175.

[20] Keim, D. A., Mansmann, F., Schneidewind, J., Thomas, J., and Ziegler, H. (2008b). Visual analytics: Scope and challenges. In Lecture Notes in Computer Science, volume 4404 LNCS, pages 76–90.

[21] Kitchin, R. (2014). The real-time city? Big data and smart urbanism. GeoJournal, 79(1):1–14.

[22] Naaman, M. (2012). Social multimedia: Highlighting op-portunities for search and mining of multimedia data in so-cial media applications. Multimedia Tools and Applications, 56(1):9–34.

[23] Nakaji, Y. and Yanai, K. (2012). Visualization of real-world events with geotagged tweet photos. In Proceedings of the 2012 IEEE International Conference on Multimedia and Expo Workshops, pages 272–277.

[24] Nicholas Lewin-Koh (2011). Hexagon Binning: an Overview. Technical report.

[25] North, C. (2006). Toward measuring visualization insight. IEEE Computer Graphics and Applications, 26(3):6–9. [26] of Amsterdam, M. (2015). Amsterdam Open Data -

Ams-terdam Open Data.

[27] of Amsterdam, M. (2016). Wederom record nieuwe buiten-landse bedrijven in regio Amsterdam - Gemeente Amsterdam. [28] Plaut, T. R. and Pluta, J. E. (1983). Business Climate, Taxes and Expenditures, and State Industrial Growth in the United States. Southern Economic Journal, 50(1):pp. 99–119. [29] Schreck, T. and Keim, D. (2013). Visual analysis of social

media data. Computer, 46(5):68–75.

[30] Smedt, T. D. and Daelemans, W. (2012). Pattern for Python. Journal of Machine Learning Research, 13:2063–2067. [31] Smeulders, A., Worring, M., Santini, S., Gupta, A., and Jain,

R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12).

[32] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2014). Going Deeper with Convolutions. arXiv preprint arXiv:1409.4842, pages 1–12.

[33] Toubia, O. and Stephen, A. T. (2013). Intrinsic vs. Image-Related Utility in Social Media: Why Do People Contribute Content to Twitter? Marketing Science, 32(3):368–392. [34] Wang Baldonado, M. Q., Woodruff, A., and Kuchinsky, A.

(2000). Guidelines for using multiple views in information visu-alization. Proceedings of the working conference on Advanced visual interfaces (AVI), (MAY):110–119.

[35] Wilkinson, L. (2005). The Grammar of Graphics. Statistics and Computing. Springer-Verlag, New York.

[36] Xia, C., Schwartz, R., Xie, K., and Krebs, A. (2014). City-Beat: real-time social media visualization of hyper-local city data. Proceedings of the International World Wide Web Con-ference Committee, pages 167–170.

(11)

APPENDIX

A.

SUS SCORE

In Figure 4 is depicted, which includes the scores given by the domain expert of the city of Amsterdam.

B.

LDA CATEGORIES AS DETERMINED

BY MODEL

In Figure 5 the raw output of the LDA analysis is shown. The business climate factors were chosen from these topics:

Recreational Facilities Topics: 15, 17 ,23 ,25 ,27 Amenities Topics: 1, 3, 4, 8, 22, 28, 29, 38 Public Transport Topics: 12, 20, 30 Office Space Topics: 2, 6

Safety Topics: 7, 14, 31, 39

C.

IMAGENET VISUAL CONCEPTS

In Table 6 the ImageNet concept that are used to define the business climate are defined.

(12)
(13)

2016-04-05 11:22:18,001 : INFO : topic #0 (0.025): 0.268*trndnl + 0.115*trends + 0.072*trend + 0.060*alert + 0.012*verified + 0.012*noordzee + 0.012*rts + 0.012*omg + 0.010*zijlstra + 0.009*elke + 0.009*cuba + 0.009*abnamro + 0.009*top20 + 0.008*referendum + 0.008*oekrane + 0.008*more + 0.007*vrouwen + 0.007*at + 0.007*kill + 0.006*studiovoetbal

2016-04-05 11:22:18,003 : INFO : topic #1 (0.025): 0.111*Twitter + 0.032*best + 0.032*dag + 0.029*vandaag + 0.027*geld + 0.020*steeds + 0.020*make + 0.020*top3apps + 0.018*april + 0.017*accounts + 0.016*we + 0.015*gefeliciteerd + 0.015*a4 + 0.015*leven + 0.014*beter + 0.013*doet + 0.012*wachten + 0.012*hoop + 0.012*werk + 0.011*sales

2016-04-05 11:22:18,008 : INFO : topic #2 (0.025): 0.042*ga + 0.032*say + 0.030*nooit + 0.025*x + 0.025*client + 0.025*40 + 0.020*beetje + 0.016*hahaha + 0.014*haarlemmermeer + 0.014*geweldig + 0.013*32 + 0.012*went + 0.012*hiring + 0.011*lovely + 0.011*samenleving + 0.010*worst + 0.010*tutorial + 0.009*to + 0.009*proberen + 0.009*careerarc

2016-04-05 11:22:18,011 : INFO : topic #3 (0.025): 0.194*netherlands + 0.071*7 + 0.060*v + 0.050*10 + 0.036*dank + 0.036*21 + 0.035*amsterdam + 0.032*really + 0.028*hours + 0.021*h + 0.012*app + 0.012*2016 + 0.011*watching + 0.010*game + 0.009*22 + 0.008*gedaan + 0.008*rijksmuseum + 0.007*rest + 0.007*welkom + 0.006*number

2016-04-05 11:22:18,015 : INFO : topic #4 (0.025): 0.058*day + 0.052*happy + 0.049*wilders + 0.035*the + 0.028*gt + 0.024*red + 0.023*eigen + 0.022*light + 0.022*birthday + 0.021*food + 0.019*link + 0.019*full + 0.018*spring + 0.016*girl + 0.015*loveTwitter + 0.013*knoops + 0.011*ones + 0.011*district + 0.010*heerlijk + 0.010*lead

2016-04-05 11:22:18,018 : INFO : topic #5 (0.025): 0.056*great + 0.039*nederland + 0.027*world + 0.027*kijken + 0.020*af + 0.018*friend + 0.017*inc + 0.016*news + 0.016*cc + 0.016*found + 0.015*tonight + 0.014*bijna + 0.013*house + 0.013*gelukkig + 0.012*vaak + 0.012*zaanse + 0.011*schans + 0.011*deuren + 0.011*zaak + 0.010*point

2016-04-05 11:22:18,021 : INFO : topic #6 (0.025): 0.050*time + 0.047*photo + 0.033*going + 0.029*thank + 0.026*always + 0.026*13 + 0.024*alarm + 0.019*half + 0.019*twee + 0.019*wanna + 0.018*ll + 0.017*keep + 0.016*must + 0.016*enjoy + 0.014*business + 0.013*guys + 0.012*krijgt + 0.011*45 + 0.009*kept + 0.009*glad

2016-04-05 11:22:18,023 : INFO : topic #7 (0.025): 0.074*a1 + 0.050*p2000 + 0.026*come + 0.023*morgen + 0.022*mka + 0.020*dam + 0.018*elandsgracht + 0.018*117 + 0.017*1016tt + 0.017*b2 + 0.017*ambulancedienst + 0.017*meldkamer + 0.017*zien + 0.015*willen + 0.012*word + 0.011*join + 0.010*ended + 0.009*mannen + 0.009*leren + 0.008*50

2016-04-05 11:22:18,026 : INFO : topic #8 (0.025): 0.047*made + 0.037*komen + 0.031*nou + 0.029*ongeval + 0.025*zit + 0.023*moeten + 0.018*misschien + 0.016*ff + 0.016*makes + 0.016*staan + 0.015*drinking + 0.012*jaren + 0.012*awesome + 0.012*yesterday + 0.011*thats + 0.011*hold + 0.010*bam + 0.010*aangehouden + 0.010*room + 0.009*kleine

2016-04-05 11:22:18,029 : INFO : topic #9 (0.025): 0.065*nl + 0.035*next + 0.029*25 + 0.027*mag + 0.027*iphone + 0.019*things + 0.018*vindt + 0.017*set + 0.017*maakt + 0.016*blijven + 0.016*vooral + 0.015*year + 0.015*anne + 0.014*gratis + 0.014*amp + 0.014*meteen + 0.013*wakker + 0.011*frank + 0.010*bestaat + 0.009*abdeslam

2016-04-05 11:22:18,032 : INFO : topic #10 (0.025): 0.360*amsterdam + 0.103*m + 0.069*holland + 0.051*noord + 0.041*airport + 0.027*w + 0.024*schiphol + 0.023*i + 0.020*at + 0.018*ams + 0.011*amazing + 0.009*us + 0.006*pretty + 0.006*steur + 0.005*rijsenhout + 0.005*trouwens + 0.004*juist + 0.004*canal + 0.004*time + 0.004*geven

2016-04-05 11:22:18,035 : INFO : topic #11 (0.025): 0.078*20 + 0.066*know + 0.049*19 + 0.037*15 + 0.033*00 + 0.027*17 + 0.025*uur + 0.025*2016 + 0.024*hashtag + 0.020*iedereen + 0.018*14 + 0.016*school + 0.016*top20 + 0.015*katwijk + 0.012*10 + 0.012*opening + 0.010*jaar + 0.009*internet + 0.009*may + 0.009*wo

2016-04-05 11:22:18,037 : INFO : topic #12 (0.025): 0.044*10 + 0.038*weet + 0.032*work + 0.028*stijgt + 0.023*auto + 0.022*long + 0.020*experience + 0.020*goede + 0.019*weg + 0.017*heineken + 0.017*vanaf + 0.013*looks + 0.013*020 + 0.011*media + 0.010*000 + 0.010*trein + 0.010*jongen + 0.010*truffles + 0.010*benieuwd + 0.010*vrachtwagen

2016-04-05 11:22:18,040 : INFO : topic #13 (0.025): 0.034*ve + 0.034*s + 0.032*almere + 0.024*already + 0.020*eerste + 0.020*never + 0.019*keer + 0.019*foto + 0.019*vanavond + 0.019*alle + 0.018*volgende + 0.018*another + 0.016*android + 0.016*wait + 0.016*wij + 0.014*tijdens + 0.013*weer + 0.013*anderen + 0.013*onze + 0.011*90

2016-04-05 11:22:18,043 : INFO : topic #14 (0.025): 0.054*mention + 0.036*since + 0.030*nice + 0.028*video + 0.025*waarom + 0.023*film + 0.020*politie + 0.019*start + 0.017*meet + 0.016*goedemorgen + 0.016*bad + 0.013*c + 0.012*kinda + 0.012*funny + 0.012*moment + 0.011*nieuw + 0.011*niks + 0.011*zaterdag + 0.010*brand + 0.009*worked 2016-04-05 11:22:18,045 : INFO : topic #15 (0.025): 0.073*good + 0.050*week + 0.044*morning + 0.036*ajax + 0.028*psv + 0.028*fuck + 0.028*trip + 0.027*saturday + 0.025*amsterdam + 0.022*mooie + 0.020*travel + 0.014*van + 0.014*coming + 0.013*geworden + 0.012*2nd + 0.012*follow + 0.011*amp + 0.010*succes + 0.009*verdedigen + 0.009*bro 2016-04-05 11:22:18,048 : INFO : topic #16 (0.025): 0.057*18 + 0.056*back + 0.056*re + 0.036*man + 0.024*end + 0.019*r

(14)

2016-04-05 11:22:18,052 : INFO : topic #17 (0.025): 0.089*9 + 0.046*4 + 0.037*got + 0.030*jaar + 0.026*spelen + 0.023*tl + 0.023*b1 + 0.022*11 + 0.019*women + 0.019*web + 0.018*wow + 0.017*olympische + 0.016*looking + 0.012*getting + 0.012*said + 0.012*jou + 0.011*zuid + 0.011*angie + 0.010*p2000 + 0.009*gehad

2016-04-05 11:22:18,056 : INFO : topic #18 (0.025): 0.111*rain + 0.064*amsterdam + 0.048*11 + 0.042*jan + 0.024*de + 0.020*tip + 0.018*2c + 0.016*eten + 0.014*verder + 0.013*voorzitter + 0.012*friday + 0.011*congratulations + 0.011*5c + 0.011*higgins + 0.010*tussen + 0.010*hello + 0.010*quick + 0.009*3c + 0.008*waanzinnig + 0.008*zaandam

2016-04-05 11:22:18,059 : INFO : topic #19 (0.025): 0.050*lol + 0.046*lekker + 0.028*haha + 0.027*beste + 0.025*soon + 0.025*job + 0.024*something + 0.018*erg + 0.016*anders + 0.015*amp + 0.015*beer + 0.014*two + 0.013*intermediair + 0.012*working + 0.012*zeker + 0.012*33 + 0.011*alexander + 0.011*flight + 0.009*new + 0.009*drugs

2016-04-05 11:22:18,063 : INFO : topic #20 (0.025): 0.063*n + 0.059*gaat + 0.047*minuten + 0.047*echt + 0.042*vertraagd + 0.030*mee + 0.028*55 + 0.025*12 + 0.023*city + 0.019*e + 0.018*vind + 0.016*icdirect + 0.016*doe + 0.015*hard + 0.015*ken + 0.014*help + 0.012*z + 0.011*tickets + 0.010*times + 0.009*important

2016-04-05 11:22:18,066 : INFO : topic #21 (0.025): 0.114*3 + 0.065*temp + 0.060*5 + 0.058*6 + 0.057*hum + 0.053*8 + 0.049*2 + 0.042*1 + 0.036*love + 0.018*tijd + 0.017*life + 0.015*need + 0.012*lt + 0.010*down + 0.010*1021 + 0.009*service + 0.009*naam + 0.008*done + 0.008*partij + 0.007*slecht

2016-04-05 11:22:18,070 : INFO : topic #22 (0.025): 0.060*cet + 0.052*2016 + 0.048*t + 0.047*mensen + 0.029*freebiafra + 0.028*freennamdikanu + 0.028*16 + 0.025*open + 0.018*leuke + 0.013*museum + 0.012*euro + 0.011*vs + 0.011*jammer + 0.011*bezig + 0.011*around + 0.010*baby + 0.010*watch + 0.008*vinden + 0.008*dog + 0.008*ligt

2016-04-05 11:22:18,073 : INFO : topic #23 (0.025): 0.248*0 + 0.094*0kph + 0.047*hetweerinwestzaan + 0.047*baro + 0.047*0mm + 0.047*0mb + 0.047*gust + 0.024*even + 0.020*stabiel + 0.016*1019 + 0.014*wij + 0.014*check + 0.012*daalt + 0.010*tweet + 0.009*4 + 0.008*gek + 0.006*1c + 0.006*1020 + 0.006*0c + 0.006*1018

2016-04-05 11:22:18,077 : INFO : topic #24 (0.025): 0.043*d + 0.039*took + 0.038*want + 0.033*denk + 0.022*use + 0.021*ur + 0.021*xd + 0.020*vd + 0.017*turkey + 0.016*bc + 0.015*koffie + 0.015*ok + 0.014*making + 0.011*istanbul + 0.010*dont + 0.010*amp + 0.009*thee + 0.009*send + 0.008*st + 0.008*koster

2016-04-05 11:22:18,080 : INFO : topic #25 (0.025): 0.069*u + 0.051*tweets + 0.039*staat + 0.033*gewoon + 0.025*users + 0.022*maart + 0.021*rts + 0.021*kind + 0.020*lente + 0.017*free + 0.017*fucking + 0.013*maak + 0.013*many + 0.012*little + 0.011*tiel + 0.010*kinderen + 0.010*wish + 0.010*lijkt + 0.009*bijzonder + 0.009*fat

2016-04-05 11:22:18,084 : INFO : topic #26 (0.025): 0.105*topic + 0.047*could + 0.041*stop + 0.040*see + 0.032*first + 0.030*killing + 0.029*started + 0.024*100 + 0.023*please + 0.021*look + 0.020*let + 0.015*tomorrow + 0.015*he + 0.014*kijk + 0.013*win + 0.013*sinds + 0.010*club + 0.008*turkije + 0.007*buharikillingdemocracy + 0.006*bijzondere 2016-04-05 11:22:18,087 : INFO : topic #27 (0.025): 0.117*uv + 0.039*psvaja + 0.029*yes + 0.026*eu + 0.025*super + 0.020*hey + 0.019*friends + 0.019*fijn + 0.017*zonder + 0.015*eerst + 0.014*laatste + 0.014*a9 + 0.013*allemaal + 0.012*zondag + 0.011*2 + 0.011*gezien + 0.010*didnt + 0.010*kun + 0.009*burger + 0.008*maand

2016-04-05 11:22:18,091 : INFO : topic #28 (0.025): 0.126*like + 0.059*people + 0.044*place + 0.037*much + 0.030*would + 0.026*komt + 0.024*take + 0.022*hele + 0.019*art + 0.018*materieel + 0.016*feel + 0.011*training + 0.009*latest + 0.009*picture + 0.009*reading + 0.008*zon + 0.008*party + 0.008*refugees + 0.007*mean + 0.007*even

2016-04-05 11:22:18,094 : INFO : topic #29 (0.025): 0.111*wind + 0.037*today + 0.024*appears + 0.023*o + 0.023*a + 0.020*hotel + 0.019*wereld + 0.017*better + 0.017*sure + 0.015*hi + 0.014*im + 0.013*zeg + 0.013*every + 0.012*didn + 0.012*centrum + 0.011*helaas + 0.010*vriendelijke + 0.010*kans + 0.009*doesn + 0.009*afternoon

2016-04-05 11:22:18,098 : INFO : topic #30 (0.025): 0.070*centraal + 0.049*rotterdam + 0.041*6 + 0.036*icdirect + 0.032*way + 0.029*dutch + 0.023*station + 0.023*real + 0.012*serieus + 0.011*used + 0.010*west + 0.010*9c + 0.010*restaurant + 0.009*brengen + 0.009*yee + 0.008*inderdaad + 0.008*stream + 0.008*words + 0.007*goodmorning + 0.007*mix

2016-04-05 11:22:18,102 : INFO : topic #31 (0.025): 0.082*rit + 0.055*weer + 0.050*get + 0.044*p2000 + 0.044*we + 0.039*via + 0.034*thanks + 0.032*gaan + 0.015*sorry + 0.015*alleen + 0.014*hear + 0.014*j + 0.013*everyone + 0.012*lang + 0.011*bed + 0.010*stoppen + 0.009*denken + 0.008*geloof + 0.008*verdachten + 0.008*mooiste

(15)

2016-04-05 11:22:18,105 : INFO : topic #32 (0.025): 0.134*trending + 0.117*last + 0.042*langzaam + 0.031*waar + 0.027*weekend + 0.026*hoor + 0.024*30 + 0.022*night + 0.019*became + 0.017*europe + 0.016*someone + 0.014*fail + 0.010*hum + 0.010*story + 0.010*klaar + 0.009*miljoen + 0.008*thinking + 0.008*per + 0.008*pm + 0.007*ziet

2016-04-05 11:22:18,109 : INFO : topic #33 (0.025): 0.045*zie + 0.043*net + 0.043*go + 0.038*jullie + 0.033*leuk + 0.030*badhoevedorp + 0.028*reflection + 0.028*home + 0.026*terug + 0.023*shit + 0.022*cool + 0.019*b + 0.018*2018 + 0.015*daarom + 0.013*project + 0.013*graag + 0.011*39 + 0.010*bar + 0.009*dear + 0.009*spreken

2016-04-05 11:22:18,112 : INFO : topic #34 (0.025): 0.125*24h + 0.049*sunday + 0.029*music + 0.027*amp + 0.021*show + 0.021*terug + 0.021*fun + 0.015*samen + 0.015*anyone + 0.015*hand + 0.013*breda + 0.013*big + 0.012*future + 0.012*23 + 0.012*helped + 0.011*body + 0.010*rt + 0.010*stad + 0.010*stay + 0.009*2014

2016-04-05 11:22:18,116 : INFO : topic #35 (0.025): 0.101*wel + 0.045*1st + 0.042*goed + 0.035*coffee + 0.024*jij + 0.024*right + 0.023*posted + 0.022*live + 0.022*nee + 0.020*helemaal + 0.019*top + 0.018*contact + 0.018*miss + 0.015*volgens + 0.014*bent + 0.014*online + 0.013*coz + 0.011*blijft + 0.010*everything + 0.009*stuk

2016-04-05 11:22:18,119 : INFO : topic #36 (0.025): 0.063*one + 0.051*oh + 0.049*biafrans + 0.038*2 + 0.036*god + 0.027*seen + 0.025*prio + 0.025*days + 0.025*democracy + 0.024*think + 0.023*beautiful + 0.022*also + 0.019*1034 + 0.018*thing + 0.014*pics + 0.012*love + 0.011*horen + 0.011*qturn + 0.011*remember + 0.010*published

2016-04-05 11:22:18,123 : INFO : topic #37 (0.025): 0.051*heel + 0.041*zegt + 0.033*mooi + 0.024*kom + 0.022*laat + 0.018*maken + 0.017*true + 0.016*ie + 0.015*35 + 0.014*we + 0.012*jong + 0.012*moest + 0.011*fact + 0.011*vol + 0.010*bit + 0.009*gesprekken + 0.009*higgins + 0.008*klopt + 0.008*line + 0.008*paar

2016-04-05 11:22:18,126 : INFO : topic #38 (0.025): 0.040*nieuwe + 0.020*old + 0.018*vrijdag + 0.018*grote + 0.017*test + 0.017*festival + 0.016*laten + 0.015*minder + 0.015*trying + 0.014*weten + 0.014*nemen + 0.013*kreeg + 0.013*vraag + 0.013*mn + 0.011*ging + 0.011*buitenhof + 0.011*post + 0.011*zat + 0.010*boat + 0.009*verhaal

2016-04-05 11:22:18,130 : INFO : topic #39 (0.025): 0.099*1 + 0.038*p + 0.029*p2000 + 0.027*still + 0.025*letsel + 0.023*well + 0.023*l + 0.019*a2 + 0.015*games + 0.015*pas + 0.014*monday + 0.014*years + 0.013*thuis + 0.013*yeah + 0.011*kas + 0.010*achter + 0.010*might + 0.010*zeggen + 0.009*toe + 0.008*aml

Figure 5: Output generated by the LDA algorithm, specifying the topics distributions in the Twitter corpus loaded.

(16)

Figure 6: Table containing the ImageNet concepts that are used to categorize the images that are used to define the business climate.

Referenties

GERELATEERDE DOCUMENTEN

While organizations change their manufacturing processes, it tends they suffer aligning their new way of manufacturing with a corresponding management accounting

It also presupposes some agreement on how these disciplines are or should be (distinguished and then) grouped. This article, therefore, 1) supplies a demarcation criterion

(1990:193) conclusion is very significant in terms of this study, namely that experiences of transcendental consciousness as cultivated by meditation are

By answering the research question, this research provides a better understanding about why unnecessary visits of elderly on EDs occur by elaborating on

By means of a consumer questionnaire, the four key parameters brand loyalty, perceived quality, brand awareness and brand associations are examined in the

• The final author version and the galley proof are versions of the publication after peer review.. • The final published version features the final layout of the paper including

Abbreviations: BMI, body mass index; CVID, common variable immunodeficiency disorders; ENT, ear nose throat; ESID, European Society for Immunodeficiencies; HRCT, high

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of