Lexical networks in psychology

(1)

Lexical Networks in Psychology

Research Master Psychology 2015, Internship

Simon Stuber Student nr. 10507299

Daily Supervisor: Sacha Epskamp Supervisor: Denny Borsboom

(2)

Content

Introduction ... 3

Part one: A Lexical Symptom Network of Psychopathology ... 7

Abstract ... 7

Methods ... 7

Results ... 9

Discussion ... 11

Part two: An R Package for the Construction of Lexical Co-occurance Networks ... 12

Description ... 12 Usage ... 13 Arguments ... 13 Examples ... 15 Final Comments ... 20 Appendix A ... 21 References ... 22

(3)

Introduction

The Diagnostic and Statistical Manual of Mental Disorders (DSM) facilitates diagnosis in clinical psychology and is also widely used as a classification tool in psychopathological research. However, high comorbidity rates that are commonly observed in psychology still raise questions about true symptom structures of disorders (Sanislow et al., 2010) and the polithetic nature of the DSM in which different disorders share many, but not all, symptoms (Krueger & Bezedjian, 2009). Methodological investigations of symptom structures that aim to explain comorbidity are often based on latent variable models in which symptoms of a specific psychopathological disorder are hypothesized to be caused by their corresponding latent variables which, in some models, are caused by higher-order factors. Caspi et al. (2014) for example proposed that comorbidity rates in psychopathology could possibly be explained by a general latent factor, p, underlying pathologies analogously to the well established g factor which is hypothesized to underlie intelligence.

Latent variable models, however, depend on the assumption of conditional independence. That is, by assuming a common cause variable, one also makes the assumption that the measured variables underlie an implied covariance structure and are conditionally independent given the latent variable. This assumption may be in contrast with observations in clinical psychology were direct relationships offer plausible explanations for the development of disorders and their

comorbidity. Unhealthy eating patterns, for instance, can have a direct influence on depressed mood, sleeping problems can increase irritability etc. Such relationships however violate the assumption of conditional independence.

Network analysis overcomes these problems and allows for different interpretations that do not dependent on latent variables and the associated assumptions. From the network perspective, a disorder is a cluster of connected symptoms that are causally dependent on each other in a dynamic way (Borsboom, Cramer, Schmittmann, Epskamp, & Waldorp, 2011). That is, one symptom can activate, reinforce or weaken adjacent symptoms (Borsboom & Cramer, 2013). In a study from 2011, Borsboom et al. (2011) constructed a symptom network based on the DSM-IV. In this

(4)

network model symptoms are represented by nodes which are connected with an edge if they co-occur in a disorder (see figure 1).

The DSM network model, as Borsboom et al. (2011) presented among other findings, can predict empirical statistics regarding comorbidity. High empirical correlations between disorders correspond to short reverse scaled average shortest path lengths, APL, (Fronczak, Fronczak, & Hołyst, 2004) between disorders in the DSM network.

Figure 1: The DSM network model as presented in the described study from Borsboom et al.

(2011). Every node represents a symptom. Edges connecting the nodes represent the co-occurrence of symptoms in at least one disorder. The colors of the nodes refer to the DSM chapter in which these symptoms occur most often.

To further investigate symptom structures and comorbidity rates, we propose two network analysis methods to detect and analyze lexical symptom structures from textual data. Publicly accessible textual online data such as webpages, social media data, blogs, forums etc. are promising for that matter. Human language is the most common and reliable way to share thoughts and

(5)

emotions and cognitive processes (Tausczik & Pennebaker, 2010). Accordingly, science shows that self-narratives contain useful information for the detection of disorders (Smyth, 1998; Franklin &Thompson, 2005). However, textual data, in contrast to structured quantitative data, have no observable structure and conventional statistical methods cannot be applied (Qiwei He, 2013). Instead, text analysis is used which can be seen as a collection of tools to transform plain text files into quantitative data to which statistical techniques can be applied. To date, it is widely used by profit organizations in, for instance, marketing or customer satisfaction research (Pestian et al., 2012). In psychology, however, the analysis of textual data is relatively new and in the existing body of literature only a few examples of text mining can be found (e.g. Leech & Onwuegbuzie, 2008; Tausczik & Pennebaker, 2010; Qiwei He, 2013). The main challenge therefore is, to fine-tune and reshape existing text mining methods for the construction of lexical symptom networks.

Standardized analyses methods of textual online data, such as diary data or patients case files could improve our understanding of the symptom structures.

During my internship, two studies are conducted in which methods of network analysis are explored as a tool to investigate presumed symptom structures hidden in human language. In 2006, the Linguistic Data Consortium in collaboration with Google analyzed one trillion publicly

accessible internet pages and calculated frequencies of word patterns, called n-grams. In the first study, the n-grams data is used to construct a co-occurrence network of symptom related words, i.e. a lexical symptom network, as a visualization of the putative structure of symptoms in the

population. We show that the network structure of the lexical symptom network is moderately to highly correlated with empirical comorbidity rates. In a second study, a first version of a software package for R (R Development Core Team, 2012), txtnet, is developed for the construction of lexical co-occurrence networks. Computer sciences have developed a variety of text mining techniques that are essential in web search engines, spam filters and sentiment analysis (e.g. Landeghem et al., 2012) and co-occurrence structures are used, for instance, to extract relevant information from scientific literature (Landeghem et al., 2012). Even though different text mining

(6)

methods perform well in the mentioned fields, they might not be an applicable method to analyze textual data in psychology where complex underlying classes, such as symptoms and disorders, are characteristic (Qiwei He, 2013). However, in the second part of my internship we used txtnet to construct co-occurrence networks based on user comments from depression related topics from a well known social news website (reddit.com) and show that network analysis facilitates analysis of semantic relatedness of text material and, in the context of psychological research, might offer insights into the relationships between symptoms.

(7)

Part one: A Lexical Symptom Network of Psychopathology

Abstract

In the present explorative study, the Google LDC Web 1T 5-gram (2006) dataset is used to construct a co-occurrence network of 175 symptom related words. Average shortest path lengths (APL) between disorder-related lexical groups in the network are compared to empirical

comorbidity rates, albeit the analysis revealed no meaningful relationship (r=-.43). However, the here proposed relative APLs between disorders are moderately to highly correlated to empirical comorbidity rates (i=0.71), suggesting that co-occurrence networks might be a feasible way to investigate symptom structures in the population. However, the used n-grams dataset has limitations and the predictive capabilities of lexical networks should be further investigated in future research.

Methods

The DSM network model by Borsboom et al. (2011) (see figure 1) is based on the

co-occurance of symptoms in disorders of the DSM-IV. That is, symptoms are visualized by nodes that are connected by an edge when they are part of the same disorder. In the present study we

constructed a lexical counterpart of that model based on textual data. The expectation was that words describing psychological disorders with high comorbidity rates will have higher

co-occurances than words that describe unrelated disorders and thus will form disorder related clusters. The lexical symptom network is based on the Google LDC Web 1T 5-gram Version 1 dataset (2006) which was created from approximately 1 trillion openly accessible web pages. The dataset comprises n-grams and their observed frequencies with lengths ranging from single words to five-grams. It is important to note, however, that the size of the five-word window is not large enough to comprise complete symptom descriptions, let alone their co-occurrences. Due to that fact, a simplified operationalization is used in which symptoms are defined with a set of 175 symptom-related stemmed words. Therefore, not all symptoms of the DSM-IV can be described and the described symptoms might vary in accuracy. For example, some words are a direct symptom description, e.g. „insomnia“, whereas others are closely related to symptoms, e.g. „sad“. The

(8)

symptom related words are used to create a subset of relevant five-grams from which a word co-occurrence matrix was created. A full list of the used words can be found in the appendix. For the construction of the network, the frequencies were standardized by correcting the co-occurrence of words i and j, by the smallest single co-occurrence of the respective words: /

. This enabled us to analyze the relationship between words while taking their single occurrences into account.

More common estimation methods to analyze lexical relationships, such as similarity measures (e.g. Seifoddini & Djassemi, 1991), mutual information measures (e.g. Church & Hanks, 1989) or the ising model (e.g. van Borkulo et al., 2014), were unsuited to analyze the data as they produced unreliable estimates. This is due to the fact that the dataset has a high percentage of co-occurrences of zero (49%) and extremely high variation in occurrence frequencies between common words, such as anger (f=3953400), and rare words, such as hypervigilance (f=1340) leading to biased estimates. The above described frequency correction, however, allowed us to construct a network with unbiased estimates.

To facilitate analysis of relationships between words on a disorder level, groups of words that are related to disorders were created. These groups are a lexical representation of nine disorders for which comorbidity rates are reported (Slade & Watson, 2006). The groups contain 9-18 words describing the respective disorders. The lexical disorder-related group for generalized anxiety disorder, GAD, for instance comprises the stemmed words anxi(-ety,-ous), avoid, concern(-ed), worry, distress , muscle, tension, rumin(-ate), irrit(-ablility), fatigue, sleep and restless(-ness) (see appendix A for a full list of used words).

(9)

Figure 2: The lexical symptom network based on five-grams data. Nodes represent words, edges

between nodes represent their corrected co-occurances (see appendix A for the list of used words).

Results

In the resulting network, see figure 2, symptom-related words are represented by nodes. Edges between nodes are a representation of the corrected co-occurrences. Even though small symptom clusters can be detected visually, e.g. the cluster between the words anxiety, phobia and

(10)

panic (9,128,122), the network has a dense structure and visual cluster interpretation does not provide much clarity.

To quantifiably analyze the network, APLs within and between lexical disorder-related groups were calculated. The APL between two groups, , is the average number of edges that

have to be passed to reach any node from group i, from any other node from group j (Fronczak, Fronczak, & Hołyst, 2004). As Borsboom et al. showed (2011), in the DSM-network model (see figure 1), APLs between disorders are negatively related to comorbidity rates in the population. In order to investigate whether this relationship with empirical data can also be found for the lexical symptom network, we compared APLs to empirical comorbidity rates. However, the lexical symptom structure, has a weak negative correlation with comorbidity rates (r=-0.43) which is in contrast to expectations based on the DSM network model. This is due to the fact that within disorder APLs in the DSM network model are by definition equal to 1, whereas disorder related words in the lexical symptom network do not form perfect disorder clusters. Instead, words in the lexical symptom network cluster in smaller groups or couples, causing APLs within disorders to exceed one.

The APLs within disorders however, can be bounded to one by dividing APLs between disorders by the mean APLs within disorders: . When comparing the

relative APLs between disorders to the observed comorbidity rates in the population, we can investigate the relationship between lexical APLs and comorbidity rates while assuming that symptom related words underlie the DSM structure. Figure 3 shows the relative APLs between disorders on a reversed scale in blue and the comorbidity rates in red. As can be seen, the relative APLs between lexical symptom groups are indeed moderately to highly correlated to empirical data, (r=0.71).

(11)

Figure 3: In blue: The Relative Avarage Shortest Path Length's between disorders on a reversed

scale. In red: The empirical correlations. Abriviations: MDE=Major Depressive Episode, DYS=Dysthymic Disorder, GAD=Geenarlized Anxiety Disorder, PTSD=Posttraumatic Stress Disorder, SOP=Social Phobia, PD=Panic Disorder, AGPH=Agora Phobia, OCD=Obsessive-Compulsive Disorder, DD=Drug Dependece

Discussion

In the present explorative study, the Google LDC Web 1T 5-gram (2006) dataset is used to construct a co-occurrence network of 175 symptom related words. Lexical disorder-related groups were created for nine distinct disorders and the relative average shortest path lengths between these disorder groups were calculated and compared to empirically obtained comorbidity rates. In

(12)

relation with comorbidity rates.

Based on the results of study 1 it seems feasible to assume that lexical symptom networks offer novel ways to investigate symptom structures in the population. However, due to the

relatively small window of five words, nodes in the network only represent single words, leading to a network structure that prohibits clear interpretations on a symptom level. Furthermore, it remains unclear to which degree lexical groups are a feasible operationalization of psychological disorders. However, as the lexical content of the network is chosen carefully, the results suggest that the network approach might open up new pathways to examine psychological research questions based on co-occurrence networks of textual data. For future research along these lines, it is suggested to use data with a different structure and suitable standardized methods for the selection of the lexical content. Doc-term analysis or text-classifier methods (e.g. Kim, Han, Rim, & Myaeng, 2006; Tan, 2006) might be a good starting point to select lexical content representing the disorders of interest in accordance to the DSM.

Part two: An R Package for the Construction of Lexical Co-occurance Networks

In the second part of my internship I developed a software package, txtnet, for the statistical programming environment R (R Development Core Team, 2012). The package offers a variety of functions to prepare textual data for network analysis and create lexical co-occurrence network structures based on textual data. For the visualization of the networks, txtnet uses the R package qgraph (Epskamp, Costantini, Cramer, Waldorp, & Borsboom, 2015). In this chapter, the functions and possibilities of txtnet will be explained in detail and practical examples will be given.

Description

The main function of txtnet, txt.quantify(), produces and plots co-occurance networks based on textual data.

(13)

Usage

txt.quantify(input, reference, nodes = c("words", "sentences"),

edges = c("shared.words", "shared.sentences",

"shared.reference.words", "shared.reference.sentences"), grouping = c("none", "words.by.sentences", "words.by.reference", "sentences.by.reference.words",

"sentences.by.reference.sentences", "sentences.by.words"),

weights = c("co-occurances", "binary", "odds-ratio", "jaccard"), cut.off = (...), adjacency.matrix = TRUE, filter=FALSE) Arguments input

The input object should either be a character string, or multiple character strings stored in a list.

reference

The reference object can either be a character string of reference words, or multiple character string stored in a list.

nodes

The nodes argument allows users to specify whether nodes in the network should be based on sentences or words of the input object.

(14)

edges

With the edges object different methods can be chosen to draw edges between nodes. “shared.words” draws edges between sentences with shared words. “shared.sentences” draws edges between words that appear in the same sentences. “shared.reference.words” draws edges between sentences that share words from the reference input. “shared.reference.sentences” draws edges between words that share reference sentences.

grouping

The grouping argument allows specifying whether colors should be assigned to words based on their co-occurance in sentences, “words.by.sentences”, or their co-occurances on the reference list, “words.by.reference”. When nodes represent sentences, colors can be assigned based on shared reference words, “sentences.by.reference.words”, shared reference sentences,

“sentences.by.reference.sentences”, or simply by shared words, “sentencs.by.words”.

weights

With the weights argument the (estimation method of) weights can be specified. When the weights argument is set to “co-occurances”, no estimation will be done and edges will be based on co-occurance frequencies. Binary weights can be obtained by the value “binary”. Weights can also be based on odds ratios, “odds-ratio” or the jaccard similarity measure, “jaccard”.

cut.off

Co-occurances that are below the provided cut.off value will be omitted.

adjacency.matrix

(15)

filter

When filter is set to TRUE, nodes that are unconnected will be omitted. The default is TRUE.

Examples

The package can be installed with the install_github() function from the devtools package (Wickham, Chang, RStudio, & base R, 2015) :

library(devtools)

install_github("txtnet","SimonStuber")

The main function of txtnet, txt.quantify(), allows users to construct and plot co-occurrence networks. This function has two input arguments, input and reference, and a variety of arguments to specify characteristics of the network.

The input argument is mandatory and should contain the textual data on which the network will be based. That is, nodes in the network structure represent either words or sentences from the input object. The nodes argument allows users to choose between these two representation methods. The reference input argument should be a list with reference words. This object is not mandatory. When no reference object is provided, the edges argument can take the values “shared.words” or “shared.sentences”. That is, edges between nodes can be based on the degree to which nodes share either words or sentences.

Consider for example the following input object: xinput <- c("I feel sad", "I am lonely", "Are you sad")

When the nodes are chosen to represent sentences, the edges argument can be used to draw edges between sentences that share words:

network <- txt.quantify(xinput, nodes="sentences", edges = "shared.words", weights="co-occurances",grouping="none")

(16)

Likewise, nodes that represent words can be connected in the network by edges representing the degree to which words appear in the same sentences.

network <- txt.quantify(x, nodes="words", edges = "shared.sentences", weights="co-occurances",grouping="none")

Connections between nodes can also be based on the degree to which, for example, two sentences share words from a reference list. Consider for instance the following list of reference words:

yinput <- c("sad", "lonely")

When providing a reference object to the function, the edges argument can take the

additional values „shared.reference.words“ and „shared.reference.sentences“. Furthermore, with the „grouping“ argument, colors can be assigned to nodes based on the total co-occurrences per

(17)

network <- txt.quantify(xinput, yinput, nodes="sentences", edges = "shared.reference.words", weights="co-occurances",grouping="sentences.by.reference.words")

To present the functions of the package in broader context, we downloaded user comments from a well known social news website, reddit.com (https://www.reddit.com/r/depression, 2015), from three different topics related to depression: sleeping problems (1), self-reproach or feelings of worthlessness (2) and suicidal ideation (3). Comments with less than 50 characters were not

included in the analysis as these comments mostly contain responses to other users that do not represent the semantic topic of the thread. The comments were stored in a list of length three, called rawtext. With the clean.txt() function of txtnet, comments were stemmed, transformed to lowercase letters, and stop words, special characters and numbers were removed. The clean.txt() function uses the stemmer() function of the package qdap (Goodrich, Kurkiewicz, & Rinker, 2015) and the removeWords() function from the package tm (Feinerer, Hornik, Software, & GPL Ghostscript, 2015).

xinput <- clean.txt(rawtext,lowercase = TRUE, punctuation="radical", stem=TRUE, stopwords = "radical")

(18)

For every of the nine depression symptoms, an according list of reference words was created, yinput, based on according symptom checklists from the DSM-IV. The reference words are stored in a list object of length nine, representing the according nine symptoms from the DSM IV. Then, a co-occurrence matrix was created similar to the example above.

dep.net <- txt.quantify(xinput, yinput, nodes = "sentences", edges = "shared.reference.words", grouping = "sentences.by.reference.words", weights="co-occurances")

The resulting network, stored in dep.net, shows three main clusters of user comments: „Insomnia/Hypersomnia“ (1), „Worthlessness“ (2) and „Suicidal Ideation“ (3). These clusters thus are in line with the three reddit depression topics on which the network is based. Furthermore, nodes that are classified into the class of „Depressed Mood“, form a central cluster in the network indicating that words from this class are present in all three topics. Nodes that do not contain any reference words are unconnected and are not assigned to any of the nine symptom classes. With the filter argument, such nodes can be omitted.

(19)

> names(dep.net)

[1] "qgraphobject" "matrix" "groups" "labels"

The qgraphobject contains graph specific data and can be used to further investigate the network with qgraph (Epskamp, Costantini, Cramer, Waldorp, & Borsboom, 2015). The matrix contains the weights matrix, in this example the occurance matrix. However, besides co-occurrences, the weights argument of the txt.quantify() function can also be used to estimate the relationships between words with odds ratios or the jaccard similarity measure. It is also possible to visualize an unweighted network when setting the weights argument to “binary”.

As mentioned above, the qgraphobject can be further analyzed with qgraph in order to, for example, quantify the clustering of the network. For instance, an analysis of the average degree, and the average betweenness measures per reference group shows, that user comments related to the three topics together with comments related to depressed mood, indeed form the most central clusters:

(20)

Final Comments

During my internship, I was able to explore text mining techniques and their usefulness for psychological research questions in the context of the network approach. The here presented methods, however, are by no means final and leave room for improvement in future research. However, the first results are promising and suggest that lexical co-occurrence networks might indeed allow for the analysis of unstructured textual data in psychology. Such methods are especially interesting for the analysis of patient’s diaries or patient’s case files considering that standardized methods for the analysis of such data are rare. The txtnet package might be a first step in the direction of developing an intuitive and easy to use software package for the construction of such networks. Augmentation with other measures of semantic similarity, such as the mutual information criterion (e.g. Kraskov, Stögbauer, Andrzejak, & Grassberger, 2003) could increase flexibility and extend the field of application. Furthermore, the assignment of nodes to reference groups could possibly be enhanced by the application of text classification techniques such as the naïve bayes classifier (e.g. Kim, Han, Rim, & Myaeng, 2006). With such improvements, the package could possibly be useful in a variety of research fields in which a structured method is needed for the analysis of text material.

(21)

Appendix A

1=addict, 2=affect, 3=aggress, 4=agitation, 5=agora, 6=amnesia, 7=anger, 8=anorexia, 9=anxi, 10= aphasia, 11=appetite, 12=arous, 13=arrhythmia, 14=attack, 15=attention, 16=autist, 17=aversion, 1 8= avoid, 19=behavior, 20=bladder, 21=body, 22=borderline, 23=bowel, 24=bulimi, 25=cardial, 26=cessation, 27=chronic, 28=circadian, 29=cognit, 30=coma, 31=communicat, 32=concentrat, 33= concern, 34=conform, 35=confus, 36=conscious, 37=critic, 38=crowd, 39=cruel, 40=cut,

41=danger, 42=death, 43=deceiving, 44=dependent, 45=depersonali, 46=depress, 47=detach, 48=di sorgan, 49=dissociative, 50=distract, 51=distress, 52=doubt, 53=dream, 54=drug, 55=dyskinesia, 56=dysphor, 57=dyston, 58=eat, 59=energy, 60=erectile, 61=escape, 62=euphor, 63= exhibitionist, 64=failur, 65=fantas, 66=fatigue, 67=fear, 68=feign, 69=forget, 70=fraud, 71=fright, 72=frustrat, 73=gambl, 74=gender, 75=genit, 76=grandiosity, 77=guilt, 78=hallucinat, 79=harm, 80=helpless, 8 1=humiliat, 82=hyperactiv, 83=hypersomnia, 84=hypervigil, 85=hypoactive, 86=idealiz,

87=ideation, 88=identit, 89=illegal, 90=imagin, 91=imitat, 92=impuls, 93=indecisiv, 94=inflexib, 95=ingest, 96=inject, 97= insomnia, 98=instruct, 99=intercourse, 100=intimidat, 101=intoxic, 102=i nvoluntary, 103=irritat, 104=kill, 105=language, 106=maladapt, 107=malicious, 108=masochist, 10 9=medicat, 110=memor, 111=mood, 112=muscle, 113=narcisstic, 114=narco, 115=nervosa, 116=neurologic, 117=norms, 118=obes, 119=object, 120=obsess, 121=pain, 122=panic,

123=paranoid, 124=paresthes, 125=people, 126=performance, 127=perspirat, 128=phobi, 129=plac es, 130=psychomotor, 131=pupil, 132=reexperienc, 133=reflex, 134=reject, 135=relationship, 136=repetit, 137=respect, 138=respons, 139=restless, 140=retardation, 141=rules, 142=rumin, 143= sad, 144=schizofren, 145=season, 146=self, 147=self-esteem, 148=sentiment, 149=sexual, 150=sha me, 151=simulat, 152=situation, 153=sleep, 154=social, 155=sociopath, 156=spasm,

157=specific, 158=speech, 159=stereotyp, 160=stress, 161=substanc, 162=suicid, 163=suspicio, 16 4=sweat, 165=tension, 166=theft, 167=therap, 168=thought, 169=threat, 170=trauma, 171=trembl, 172=unjustifi, 173=weight, 174=worry, 175=worthless

(22)

References

Borkulo, C. D. van, Borsboom, D., Epskamp, S., Blanken, T. F., Boschloo, L., Schoevers, R. A., & Waldorp, L. J. (2014). A new method for constructing networks from binary data. Scientific

Reports, 4. http://doi.org/10.1038/srep05918

Borsboom, D., & Cramer, A. O. J. (2013). Network Analysis: An Integrative Approach to the Structure of Psychopathology. Annual Review of Clinical Psychology, 9(1), 91–121. http://doi.org/10.1146/annurev-clinpsy-050212-185608

Borsboom, D., Cramer, A. O. J., Schmittmann, V. D., Epskamp, S., & Waldorp, L. J. (2011). The Small World of Psychopathology. PLoS ONE, 6(11), e27407.

http://doi.org/10.1371/journal.pone.0027407

Caspi, A., Houts, R. M., Belsky, D. W., Goldman-Mellor, S. J., Harrington, H., Israel, S., … Moffitt, T. E. (2014). The p Factor: One General Psychopathology Factor in the Structure of Psychiatric Disorders? Clinical Psychological Science, 2(2), 119–137.

http://doi.org/10.1177/2167702613497473

Church, K. W., & Hanks, P. (1989). Word Association Norms, Mutual Information, and

Lexicography. In Proceedings of the 27th Annual Meeting on Association for Computational

Linguistics (pp. 76–83). Stroudsburg, PA, USA: Association for Computational Linguistics.

http://doi.org/10.3115/981623.981633

Epskamp, S., Costantini, G., Cramer, A. O. J., Waldorp, L. J., & Borsboom, V. D. S. and D. (2015). qgraph: Graph Plotting Methods, Psychometric Data Visualization and Graphical Model Estimation (Version 1.3.1). Retrieved from

http://cran.r-project.org/web/packages/qgraph/index.html

Epskamp, S., Cramer, A. O. J., Waldorp, L. J., Schmittmann, V. D., & Borsboom, D. (2012). qgraph: Network Visualizations of Relationships in Psychometric Data. Journal of Statistical

(23)

Feinerer, I., Hornik, K., Software, A., & GPL Ghostscript), I. (pdf_info ps taken from. (2015). tm: Text Mining Package (Version 0.6-2). Retrieved from

http://cran.r-project.org/web/packages/tm/index.html

Franklin, C. L., & Thompson, K. E. (2005). Response style and posttraumatic stress disorder (PTSD): a review. Journal of Trauma & Dissociation: The Official Journal of the

International Society for the Study of Dissociation (ISSD), 6(3), 105–123.

http://doi.org/10.1300/J229v06n03_05

Fronczak, A., Fronczak, P., & Hołyst, J. A. (2004). Average path length in random networks.

Physical Review E, 70(5), 056110. http://doi.org/10.1103/PhysRevE.70.056110

Goodrich, B., Kurkiewicz, D., & Rinker, T. (2015). qdap: Bridging the Gap Between Qualitative Data and Quantitative Analysis (Version 2.2.2). Retrieved from

http://cran.r-project.org/web/packages/qdap/index.html

Kim, S.-B., Han, K.-S., Rim, H.-C., & Myaeng, S. H. (2006). Some Effective Techniques for Naive Bayes Text Classification. IEEE Transactions on Knowledge and Data Engineering, 18(11), 1457–1466. http://doi.org/10.1109/TKDE.2006.180

KRUEGER, R. F., & BEZDJIAN, S. (2009). Enhancing research and treatment of mental disorders with dimensional concepts: toward DSM-V and ICD-11. World Psychiatry, 8(1), 3–6.

Landeghem, S. V., Björne, J., Abeel, T., Baets, B. D., Salakoski, T., & Peer, Y. V. de. (2012). Semantically linking molecular entities in literature through entity relationships. BMC

Bioinformatics, 13(Suppl 11), S6. http://doi.org/10.1186/1471-2105-13-S11-S6

Leech, N. L., & Onwuegbuzie, A. J. (2008). Qualitative data analysis: A compendium of techniques and a framework for selection for school psychology research and beyond. School Psychology

Quarterly, 23(4), 587–604. http://doi.org/10.1037/1045-3830.23.4.587

Opsahl, T., Agneessens, F., & Skvoretz, J. (2010). Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks, 32(3), 245–251.

(24)

Pestian, J., Pestian, J., Pawel Matykiewicz, Brett South, Ozlem Uzuner, & John Hurdle. (2012). Sentiment Analysis of Suicide Notes: A Shared Task. Biomedical Informatics Insights, 3. http://doi.org/10.4137/BII.S9042

Qiwei He. (2013, March 10). TEXT MINING AND IRT FOR PSYCHIATRIC AND

PSYCHOLOGICAL ASSESSMENT. University of Twente, Twente.

/r/depression, because nobody should be alone in a dark place. (n.d.). Retrieved July 5, 2015, from https://www.reddit.com/r/depression

R Development Core Team (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing,

Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org

Sanislow, C. A., Pine, D. S., Quinn, K. J., Kozak, M. J., Garvey, M. A., Heinssen, R. K., …

Cuthbert, B. N. (2010). Developing constructs for psychopathology research: Research domain criteria. Journal of Abnormal Psychology, 119(4), 631–639. http://doi.org/10.1037/a0020909 Seifoddini, H., & Djassemi, M. (1991). The production data-based similarity coefficient versus

Jaccard’s similarity coefficient. Computers & Industrial Engineering, 21(1–4), 263–266. http://doi.org/10.1016/0360-8352(91)90099-R

Smyth, J. M. (1998). Written emotional expression: effect sizes, outcome types, and moderating variables. Journal of Consulting and Clinical Psychology, 66(1), 174–184.

Tausczik, Y. R., & Pennebaker, J. W. (2010). The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of Language and Social Psychology, 29(1), 24–54. http://doi.org/10.1177/0261927X09351676

Van Borkulo, C. D., Borsboom, D., Epskamp, S., Blanken, T. F., Boschloo, L., Schoevers, R. A., & Waldorp, L. J. (2014). A new method for constructing networks from binary data. Scientific

(25)

Wickham, H., Chang, W., RStudio, & base R), R. C. team (Some namespace and vignette code extracted from. (2015). devtools: Tools to Make Developing R Packages Easier (Version 1.8.0). Retrieved from http://cran.r-project.org/web/packages/devtools/index.html