Automated semantic speech analysis in clinical contexts

(1)

Literature thesis

Automated semantic speech analysis in

clinical contexts

Alban Voppel

11149051

Supervisor - Drs. J. de Boer

Co-assessor - Dr. J.E. Rispens

Msc Brain and Cognitive Sciences Cognitive Science track

University of Amsterdam

(2)

Abstract

Automated analysis of semantic space measures can give insight in cognitive mechanisms associated with language, memory and executive functions, as well as impairments thereof. This capacity is especially useful in clinical areas, where semantic analysis can provide in-formation regarding diagnosis, prognosis and assessment for a variety of psychiatric and neurological disorders. The advent of automated semantic analysis has enabled the quanti-fying of subtle characteristics present in produced language, transforming speech or writing into data suitable for further analysis and use.

We review current methods and applications of automated analysis of subject-produced semantic content in clinical contexts - whether speech, writing or output of semantic fluency as measured in cognitive assessments. We review a broad range of recent research and the associated challenges and opportunities in the context of a variety of psychiatric and neurological disorders in this fast-developing field.

We note technical developments in adjoining fields of research usable for future appli-cations in the clinical contexts of diagnosis, prognosis, and research of cognitive functions. We conclude that through automated semantic analysis complex and subtle characteristics of language can be successfully quantified using a variety of methods with great prospective value. We conclude that further research and improvements in this young and promising field are to be expected and can be used to further improve automated semantic analysis in clinical contexts.

(3)

1 Introduction

In clinical contexts the use of language for diagnosis is of paramount importance. Whether through clinical interviews or specific tests, it gives us insight into mental processes and under-lying mechanisms which can be impaired in clinical subject groups. However, language is also complicated and difficult to quantify. Recently, through the advent of automated linguistic and semantic analysis, it has become increasingly possible to research semantics in an objective and quantifiable way. Here, we examine the application of these techniques in a clinical context; what are the recent findings, what areas need further research and what promising developments can help overcome hurdles?

Use of language by a person can hold information regarding the underlying emotional, mental and cognitive processes. Through the examination of language, it can in a sense give us a window into their world (Tausczik and Pennebaker, 2010; Bedi et al., 2014). The products of language can be in the form of unstructured, relatively undirected language, whether in writing or in the vocal modality, or in more specialized contexts, such as those used in clinical settings where specific tasks are given. These tasks range from the retelling of a story to one’s best ability, to the production of as many examples of a given category, such as animals, in a given time-period, to altogether undirected or even spontaneous speech (Clark et al., 2016; Bedi et al., 2015).

Speech - whether undirected or in structured tasks - as well as any form of writing are modal-ities of language, a fundamental aspect of human cognition in general and social interaction in particular (Elvevag et al., 2007). The capacity for language serves a multitude of complex roles in a variety of contexts, and as such carries a wealth of information about the subject which produces it. However, due to its wide range of applications and nuances, it is difficult to an-alyze the derived information and distill the wealth of information to the measures requested (Hoffman et al., 2013). Retrieving and processing information from language is especially hard if complex and subtle features of language are used to elucidate underlying cognitive processes. Various aspects of speech and writing have been used and continue to be used in various contexts. These include measures such as speed of speech, focus on recurrent content, ratio of type and token use of words, emotional tone and affect as measured with sound frequency power spectra - a sign of emotional content with an assumed connection to the experienced emotions of the speaker (Cohen et al., 2008) - pronunciation, as well as measures such as the amount of syntactic complexity. In this review we limit ourselves to semantics - the content or meaning of words and phrases.

The presence of abnormal aspects in semantic aspects of language is used as clinical marker for a variety of psychiatric disorders and pathologies, including but not limited to autism, schizophrenia and Alzheimer’s disease (Luo et al., 2016; Elvevag et al., 2010; Appell et al., 1982). Various aspects of impaired underlying cognition and the associated aspects of language, such as semantic fluency, word recall but also correct usage of syntax can be affected, depending on the psychiatric disorder and the associated underlying affected cognitive functions. Prior to these specialized measures of language use, the assessment of subjects using interviews has a long and rich history as a tool in psychiatry (Spitzer et al., 1992).

Analysis and measurements of language to diagnose, assess severity of psychiatric disorders and use as a marker for the disorder for prognosis typically use restricted versions of semantic products, such as the amount of instances named of a specific category produced in a certain time-period - the so-called semantic verbal fluency (SVF) task (Benton et al., 1994). More com-plex use of language such as free speech or semi-structured interviews presents bigger obstacles to analyze in a quantitative, reliable way but are more comparable to normal language use. The amount of labor required to analyze complex speech through tagging and as well as the manual, subjective nature of categorization of words before analysis was a large obstacle for detailed research and application of findings. However, recent developments in computational language

(5)

analysis such as Latent Semantic Analysis (LSA) have provided a way to incorporate and use these information-rich sources of language in a quantitative way (Landauer et al., 1998). Work-ing with these techniques, large corpora of text, transcribed to a format suited for automated computational analysis, have been combined with algorithms and techniques from the field of machine learning to classify cases and recognize underlying patterns (Bedi et al., 2015).

The quick, validated quantitative automated analysis to assess complex aspects of semantic content in a clinical context provides the opportunities to rapidly and unobtrusively diagnose, assess and prognose, at low cost. Information about underlying cognitive functions and defects thereof can be distilled from the quantified characteristics. Here we review this field of research and point to promises as well as pitfalls in this multidisciplinary field. Underlying assumptions of research are investigated, as are new techniques from the field of automated semantic analysis which show promise for application in clinical contexts.

2 Semantic analysis in clinical contexts

Automated semantic analysis techniques make use of assumptions regarding the meaning of words (or the semantics) in language. Through the use of automated analysis - as opposed to, for example, manual rating of semantic content by an interviewer - various measures can be derived. Changes in these measures have been found in various clinical diagnosed subject populations.

In order to provide an overview of the current applications of automated semantic analysis in clinical contexts, we will review the three main components in the following order; firstly, we review the necessary steps in preprocessing and transforming raw language into a form suitable for analysis. Before any of automated semantic methods can be applied, language produced by subjects in a certain modality needs to be processed to be usable; depending on the modality of the produced language, transcription or other preprocessing is required before analysis can take place. Secondly, semantic analysis methods including the associated processing steps and techniques such as LSA - examining the meaning of words based on the context of other words in which they are used - and semantic clustering - where categories of words with similar meaning, such as ‘farm animals’ and ‘jungle animals’ are clustered together - will be discussed. Thirdly, we review specific applications and findings of the use of these techniques in a variety of clinical contexts. For all three components, we note promising results, techniques and findings but also aspects where further improvement is needed.

Semantic correlates of disorders in a clinical context for possible future treatment will be reviewed as well as more abstract scientific reasoning regarding cognitive mechanisms. This distinction presents itself as some of the research reviewed here focuses on explaining underlying aspects of cognition - thus providing an explanation of cognitive or mental mechanisms and capabilities - while other research mainly focuses on the ability to distinguish groups of subjects, diagnose, or prognose without connecting this directly to explaining the causal mechanisms. The capability to distinguish, diagnose, or prognose has great value when applied in clinical settings, where the ability to perform these tasks does not necessarily need to be connected to knowledge about underlying cognitive mechanisms.

Here, we focused on current applications of semantic analysis in a clinical context, where aspects of research in language interact with clinical populations and the associated possible applications. After providing an overview, various aspects of the interaction between the dif-ferent aspects of the field will have become clear. From there, promises and pitfalls of current research as well as future research in this young, fast-developing field will be discussed.

(6)

2.1 Processing language for analysis

The raw language output of subjects, whether in speech or in writing, needs to be transformed before it is suitable for analysis using computational, automated linguistic measures. In the research reviewed, language is produced in various modalities as response to a certain task. If this modality is speech, the recorded audio needs to be transcribed, a time-consuming process. Through the use of recording and estimating overlap between blinded observers, overlap in transcriptions can be estimated and individual bias can be filtered (e.g. Losh and Gordon, 2014; Lee et al., 2017). Language produced in a written modality does not need to be transcribed from recorded audio, but sometimes needs to be transcribed if subjects produced written text using pen and paper (e.g. Pakhomov et al., 2016), or when errors in spelling need to be corrected (e.g. Clark et al., 2014).

Once the text is in an electronic form, various forms of processing, tagging or labeling can be applied, based on previously-defined methodology. For example, words can be assigned a label of semantic, functional, or emotional nature. A functional label can contain information like noun or verb, according to the role it plays in a sentence. Labeling is sometimes also used to filter out certain words of interest - for example, usage of words referring to the self has been found to differ in subjects with schizophrenia compared to the healthy population (Gallagher, 2000; Fineberg et al., 2016). Additionally, in phrases containing a self-reference an increase in words with reference to internal mental phenomenology is observed. Accordingly, in transcribing and analyzing these words are marked as belonging to a category of interest for further analysis (Fineberg et al., 2016). For semantic verbal fluency tasks previously validated databases exist labeling categories of animals, enabling later analysis to estimate semantic category cluster-switching (Troyer, 2000).

For statistical analysis using automated measures, the complexities of language and grammar rules as present in language - the singular and the plural versions of words, or different forms of a verb depending on inflections - present a conundrum. Commonly used abbreviations such as “isn’t” or “I’m” can be dealt with by using various algorithms that either automatically replace the abbreviation or ‘stem’ the suffixes. Various standard programs and rule-sets exist to perform this stemming in an automated manner (Porter, 1980). Stemming will reshape forms such as “hallucinatory”, “hallucinating” and “hallucination” to an identical stem, in this case “hallucinat”, which can be used in further analysis, capturing all of the associated forms. Depending on the specificity of the required semantic aspects, these preprocessing steps can reduce complexities by simplifying the contexts (e.g. He et al., 2017).

For context-based semantic measures, fillers such as “ah” or “uhm” can be removed from the transcripts, as well as very commonly used words such as “the” or “I” (Prud’hommeaux and Roark, 2015). On the other side of the spectrum, semantic analysis that focuses on the context of word use also remove exemplars of words that occur very rarely in a given text or the associated corpus - for example filtering out words that occur less than a set amount of times in a corpus (Quesada, 2007; Hoffman et al., 2013). Here, available semantic information is not used for analysis, but because the automated analysis methods work by looking at semantic patterns and connections through the use of the context of the words, information regarding common contexts needs to be present. The rarity of words and thus the sparsity of information regarding the corresponding context makes context-derived semantic analysis unreliable (Landauer et al., 1998).

The amount of speech or text production can also markedly differ between subjects in clinical research. In the case of SVF, the amount of correct animals or other exemplars named usually is the primary outcome measure. However, for semantic clustering, switching from related categories is of interest. Switching from the category ‘farm animals’ to the category of ‘jungle animals’ with 10 total exemplars will carry different information compared to making the same switch with 40 named exemplars. Results of switching from one category to another thus need to take into account the total amount of information. For context-based semantic assessment,

(7)

the amount of produced text plays a similar role in regard to ease of classification. In the case of cluster-switching, examining the frequency of category switching, defined using the total number of produced exemplars, can alleviate this problem (Pakhomov and Hemmy, 2014; Clark et al., 2016). In more free-form responses, using only a set minimum amount of spoken language per subject response or a certain amount of words can alleviate inequality in regard to total amount of words produced. A trade-off has to be made between using all available information from single subjects and enabling unbiased analysis by including equal amounts of information per subject.

Finally, corpus creation has to be taken into account. To create a suitable corpus as ‘ground truth’ for assessment of characteristics or as comparison standard, the sources of language -whether books, transcribed radio or newspaper articles - have to be suitable for comparison. The choice of corpus is based on the research question considered and the language produced by the clinical subjects; a corpus based on spoken text - for example, transcripts of radio programs - will have different characteristics to a corpus based on literary works (Brysbaert and New, 2009). In cases where spoken language is analyzed, a corpus based on spoken text will be more suited because of the closer similarity to natural speech (Lee et al., 2017).

2.1.1 Developments in processing language

Technical developments useful to the present research to alleviate the amount of labor required for later scientific research using automated semantic analysis are ongoing in the field of nat-ural language processing. Recently, various automated voice recognition features have been developed, among other for applications in smart-phones to enable complex vocal commands. Through the use of natural language processing and training of artificial neural networks to recognize spoken language in a variety of settings and quickly transforming this to text, fast and speedy transcription have become possible, developed for commercial applications such as Apple’s Siri personal assistant or Google’s Speech techniques (Assefi et al., 2015; Aleksic et al., 2015). These types of developments show promise in greatly reducing the amount of time and work required for transcribing large amounts of spoken data. For patients speaking can also lessen the burden of participating in research, as He et al. (2017) note “With the use of speech recognition techniques, the potential PTSD patients would not have to write down their stories but could speak them out.” Additionally, subjects with clinical diagnoses often posess a below-average level of education, which can manifest in difficulties in writing - thus placing extra load on subjects and changing behavior (Kessler et al., 1995). If speech can easily and accurately be transcribed, these effects can be lessened. Future developments in accuracy, validated by com-parison to established methods also promise an increase in the available data to be analyzed. Other sources of information such as gestures and emotions, quantified in similar ways by using machine vision, could be incorporated as sources of data for later analysis (Clark et al., 2014; He et al., 2017).

In regard to the collection of more data, the specific steps of processing that can be au-tomaticallys performed could make it easier to provide enough accuracy to analyze aspects of semantics without increasing workload; for example, some of the research reviewed removed any lexical inflection of verbs by cutting off all aspects except for the stem (e.g. He et al., 2017). This translates in effect to an reduction in required accuracy of transcripts for a specific analysis, and could also increase the accuracy of automated transcription through removal of error-prone verb forms.

A similar, more familiar use of technology in regard to semantic collection is the rise in usage of digital environments for interaction with clinical groups. This ranges from digital applications on tablets used for in-person testing, to virtual environments created as support groups for clinical groups, with or without interaction or treatment by a clinician (Mezei et al., 2015). Here, the digital form in which the semantic content is provided, and the large quantities which can easily be copied and transformed for analysis, lead to a larger amount of data suitable

(8)

for analysis (Karmen et al., 2015). With larger datasets available, corpus creation of specific clinical groups becomes possible, allowing differentiation or assessment of pathological severity or prognosis by comparing within-group differences derived from semantic measures.

2.2 Semantic analysis techniques

Once language products have been transformed into usable form, whether using transcription, stemming or labeling, various techniques from the field of computational linguistics and natural language processing can be employed to examine semantic measures. For analysis of semantic aspects of language, computational linguistic and statistical semantics methods have found widespread use in a variety of contexts. Examples range from recognizing semantic aspects of spam for filtering purposes (Zhang et al., 2004) and author classification based on textual analysis (Gamon, 2004) to optimizing search engine retrieval (Manning et al., 2008). The computational, automated or statistical part of the techniques refers to the manner in which the input - various semantic objects such as words, phrases, sentences or whole texts - are analyzed. This is done in an highly automated, algorithmic way in contrast to manual classification. Various techniques have been developed to aim at different features of produced language to derive measures of interest.

Aims to quantify language from a clinical perspective can be roughly divided into three groups (Elvevag et al., 2010) - the first, examining descriptive statistics of any produced lan-guage, such as word length or mean speech rate. The second group is composed of information-theoretical measures such as n-gram likelihoods, reflecting how likely a word is to be present based on the previous n words in a given text. These techniques make use of large corpora to build up likelihood factors for words to appear in conjunction with each other. These word likelihood factors then can be used to compare the similarity of a given text or word compared to a standard word corpus (Manning et al., 1999, chapter 6). The third group of techniques uses comparable underlying statistical measures of language but focuses more on general context. A commonly used technique of the third category is Latent Semantic Analysis (LSA), a method that derives semantic information based on the context in which words, phrases or texts are found (Landauer et al., 1998; Landauer, 2006).

A second technique in this group, semantic clustering, makes use of the concept of clus-ters of related words, often words produced in the semantic verbal fluency task. Here the size of these clusters of semantically related words as well as the frequency of switching between them conveys information regarding underlying cognitive processes, and associated impairments found in clinical subject groups. These two main automated semantic techniques used in clinical contexts, latent semantic analysis and semantic clustering, will be reviewed in turn. Addition-ally, we will look at the use of machine learning to classify and analyze the resulting data of automated semantic analysis methods.

2.2.1 Latent Semantic Analysis

Latent Semantic Analysis is a method that creates a representation of the conceptual associ-ations between words (Landauer et al., 1998; Landauer, 2006). Words that often occur in the same context are assumed to carry similar semantic values. As input, LSA requires a large corpus divided into different contexts - with each context being a specific sample of text. This sample can have various sizes - from books to paragraphs to sentences, depending on the re-quired specificity. From this collection of contexts, a co-occurrence matrix is created, containing information regarding which word appears in which context. Here, each word is represented as a series of elements in a vector. Each element of the vector contains the frequency of the occurrence of the word in a specific context with other words occurring in those same contexts (Landauer et al., 1998; Landauer, 2006).

(9)

The co-occurrence matrix is then reduced in dimensionality, while retaining information about the relative pattern of similarity between different vectors (for example, word occurrence in various spoken sentences). To achieve this dimensionality reduction, the most commonly used technique is Singular Value Decomposition (SVD). Through the use of this technique, a lower-dimension per-word vector is returned - typically about 300 elements long. Within this vector the similarity structure containing information about co-occurrence of words is retained, and can be used to compare latent, underlying aspects of word meaning of the associated vector (Golub and Van Loan, 1989; Landauer, 2006).

The approximately 300 elements long word vector carries semantic information which can be projected onto a high-dimensional space; this semantic space represents the meaning of a word by its corresponding vector. Words can be compared in regard to semantic similarity -similarity of meaning - by examining the distance between words in this space, usually in the form of a cosine of the distance between vectors. The underlying patterns of co-occurrence in word use by occurring in the same context can thus be examined using large-scale corpora of texts, using the premise that words that are used in the same context of words share a similar meaning (Landauer et al., 1998).

LSA is a technique that can be adapted in various ways. The contexts which are analyzed can be varied, depending on the available language data and the research question. Addition-ally, through taking the mean, median or other derived summary factor in high-dimensional semantic space of two or more singular words, the distance between sentences, not just words, can be estimated (Bedi et al., 2015). This allows more fine-grained examination of semantic productions. The possibility of creating a semantic space and characterizing words based on semantic similarity lends itself to unstructured language productions, as long as a well-defined context can be extracted from the corpus. However, LSA is a ‘bag-of-words’ method, meaning that words in a specific context are scrambled in regard to word order and thrown in one context for further analysis - the ‘bag’. While the context is taken into account, information carried by word order is not retrievable using standard similarity analysis.

The strength of LSA lies in the capacity to capture latent connections between the semantic productions rather than superficial similarities. For example, two quite similar pairs of phrases both containing the same word but with a different meaning, can be compared using latent semantic analysis: “The radius of spheres” and “A circle’s diameter” are quite similar (cosine distance as measure of similarity of 0.55), whereas “The radius of spheres” and “The music of spheres” have a cosine similarity of 0.01 (Elvevag et al., 2007). Simple word-word correlations - the use of the word “spheres” in both sentences - do not guide semantic similarity as the sentences occur in different contexts, with one pair carrying meaning associated with geometry, while the other pair of sentences carry meaning from more unrelated subjects.

2.2.2 Semantic clustering analysis

The concept of clustering language in a semantic space, based on the semantic shared category of words is often used in analyzing a well-defined, clinically relevant semantic task - semantic verbal fluency or SVF (Tombaugh et al., 1999). In this task, the participant has to produce as many exemplars of a given category - commonly, animals or items one might find in a supermarket - as possible within a limited timespan. Participants were originally scored based on the number of unique, correct exemplars named in the defined period, usually one minute (Troyer et al., 1997). The task is used to provide an assessment of general verbal ability, which is associated with inhibition, executive functioning, memory access as well vocabulary size and as such is used as assessment utility in a variety of psychological and psychiatric conditions (Tombaugh et al., 1999; Shao et al., 2014).

Through examination of semantic characteristics of the produced exemplars, more informa-tion than just the number of correct responses can be retrieved. Participants often respond to this task in clustered subgroups - such as ‘pet animals’ (dogs, cats), varieties of birds, or

(10)

otherwise linked clusters - i.e, ‘jungle animals’ or ‘desert animals’ (Troyer et al., 1997). Through assessing the clusters of words, clues about the underlying nature of the semantic system can be derived. For this, labeling of the produced exemplar can be used, or, through the previously discussed LSA, the semantic similarity through distance in high-dimensional semantic space can be compared.

Using labels, the sequence “dog-cat-elephant-tiger” would show a difference based on the labels ‘pets’ to ‘wild’ and would indicate a switch of categories taking place between the “cat-elephant” exemplars. A similar result would be found with LSA used to asses semantic similarity (or distance, in the LSA vernacular) with the similarity between “dog-cat” and “elephant-tiger” pairs significantly greater than the similarity between “cat-elephant”. From the series of exemplars thus named, a representation can be made showing the clusters the exemplars can be divided into as well as the switches or transitions between them. Thus, cluster size, frequency of cluster switching, semantic distance between clusters and other measures can be derived (Troyer et al., 1997). Various correlates of cluster measures have been found, including neuroanatomical differences (Rich et al., 1999), as well as severity of several neurodegenerative diseases such as Alzheimer’s disease (Clark et al., 2016) and Parkinson’s disease (Raskin et al., 1992).

The traditional approaches to clustering and labeling relied on manual definitions of cluster-ing and judgcluster-ing whether a switch occurs. This is problematic and labor-intensive, as exemplars can have various relationships to each other. A cat can for example be classified as belonging to the any of the nested cluster categories of ‘mammals’,‘house pets’, or ‘hunters’ (Pakhomov and Hemmy, 2014). More automated assessments of categories based on corpora alleviate the severity of these issues. LSA and use of semantic clustering make use of semantic space derived from large corpora, providing a quantitative way to work with language data.

Since the semantic verbal fluency task already is in use as a clinically validated tool and responses of large groups of subjects have been collected, the semantic analysis approach rep-resents a promising method to extract further fine-grained information for analysis. Here, it can serve as a tool to elucidate underlying specific cognitive impairments in domains such as semantic retrieval, possibly leading to improved diagnosis accuracy based on an affordable, non-invasive test (e.g. Clark et al., 2016). A limitation to the widespread application of semantic clustering is the relative sparsity of a constrained verbal task compared to the rich and complex use of language in daily life.

2.2.3 Machine learning

Once language data has been transformed to a quantitative form of a number of semantic measures through for example LSA or semantic clustering analysis, classification with the use of machine learning becomes possible. Classification, the assignment of an exemplar (or case) to a group (or class) is done based on various characteristics, and can make use of a multitude of aspects or combinations thereof - a particular strength of machine learning algorithms (Witten et al., 2016). A classifier is trained on a subset of cases with the correct associated outcomes, and then changes the weights of the connections between input and output nodes accordingly until the correct output (the class) is given for a case. The accuracy of this learned association between a case and its label is then tested by feeding the algorithm another example on which it has not been tested. This serves as validation, to prevent over-fitting on a small subset sample. Not validating on a test-set would reduce applicability to the general application of the algorithm, and can lead to unrealistic accuracy (Joachims, 1998).

To validate the learned associations, cross-fold validation or leave-one-out validation is used to make optimal use of the availability of often sparse sources of data (Refaeilzadeh et al., 2009). The best method for machine learning is to have two different cohorts, where the machine-learning classifier only ever trains on one cohort - with or without a form of cross-validation, but only ever within the training cohort - and is then rated on the accuracy of the validation

(11)

cohort (Kotsiantis et al., 2007).

To our knowledge, this has so far not been used in any clinical applications of automated semantic analysis. Limited amounts of data by a variety of causes, especially the previously mentioned problem of transcription of produced speech in a format suitable for machine learning classification, the heterogeneity of particular disorders such as schizophrenia leading to hetero-geneity of displayed symptoms, as well as problems in collecting large, representative samples are possible reasons for the absence of use for the ‘gold standard’ validation. Properly executed cross-validation is the best possible replacement if two cohorts are not available, but this needs to be tightly controlled to prevent over-fitting and associated poor generalizability (Bishop, 2006).

Classification of groups using machine learning, either through linear discrimination or other, more advanced forms of machine learning such as support vector machines (SVM) or convex hull classifiers can make use of complex sources of data to find hidden patterns which accurately classify the data (Lanzi and Wilson, 2006). The capacities afforded include selection of the most informative sources of data for classification from a large number of input measures through use of random forest classifiers (Breiman, 2001; Liaw et al., 2002). Applications of machine learning techniques have found a growing niche in the clinical context by analyzing the large amounts of data created by methods of automated semantic analysis. This niche includes research in clinical populations and detecting subtle group differences within and between diagnosed populations. Through use of automated semantic analysis, complex semantic measures in large datasets can be assessed in a quantitative way. The large, fine-grained amounts of data thus produced are suited for classification and analysis using the fast-developing tools used in machine learning. Next, we will review the application of work-flows and analysis pipelines such as these in a clinical context.

2.3 Clinical use of language and semantics

Thus far we have reviewed various established and promising techniques in the field of computa-tional semantics. The clinical application of these methods however represents an environment where specific semantic measures show specific deficits or change, based on the characteristics of the psychiatric disorder, as they are connected to specific postulated underlying cognitive mechanisms. In this context Bedi et al. (2014) note that “..the capacity of psychiatry to di-agnose and treat serious mental illness has been hampered by the absence of objective clinical tests of the type routinely used in other fields of medicine.” This missing capacity can at least partially be resolved by using the earlier reviewed quantitative methods to assess and analyze semantic content.

As previously mentioned, various aspects of language and speech both semantic and not -have been used in the past and are used today to diagnose or play a role in diagnosis, prognosis, and assessment of various psychiatric disorders (Spitzer et al., 1992). Measures from produced language can be used to derive measures such as similarity (compared to a standard corpus, previous text of the same group, or other clinical diagnosis), speed, pressure, scarcity, coherence, and others. The content of words spoken as replies to questions regarding mood, convictions and beliefs can be leading in the process of diagnosis. An interview with a clinical psychologist or psychiatrist remains an important, if not the most important, medium to understand a patient and come to the correct diagnosis and associated treatment (Spitzer et al., 1992; He et al., 2017).

Not only the content of answers to questions are used in clinical settings; aspects of the use of language such as the relative disjointedness, coherence or incoherence, tempo of speech, and beliefs expressed can serve as indicators of the severity of for example schizophrenia (Kay et al., 1987). Through the application of computational and automated semantic analysis, these changes can be quantified, as well as use unstructured rich sources of language produced

(12)

by subjects. The unobtrusive manner in which speech or writing can be assessed is another reason why semantic analysis has been performed (Prud’hommeaux and Roark, 2015).

In research with clinically diagnosed populations, two differing populations of subjects are often compared. One of these populations is the diagnosed group of interest; the other is usually a healthy control group, but it can also be a subset of a clinically diagnosed, less-severe group of the same disorder, or a psychiatric group with a different diagnosis. Combined with the group of which a certain aspect is being investigated, a comparison of certain aspects can be made, whether to differentiate for diagnosis or to investigate longitudinal developments of disorders.

Findings related to various psychiatric disorders on which research has been recently per-formed using semantic analysis will be reviewed, as will the associated cognitive mechanisms if these were postulated. The breadth of applications that automated semantic analysis enables can be illustrated through this overview of recent research using automated semantic measures in clinical contexts.

2.3.1 Recent applications of semantic analysis in clinical contexts

White and Shah (2016) explored the (cognitive) capacity for innovative thinking in Attention Deficit Hyperactivity Disorder (ADHD). In ADHD, a mental neurodevelopmental disorder, subjects show divergent thinking on standard measures compared to healthy controls. While negative effects of ADHD exist - postulated by some as caused by a deficit in inhibitory capability (Nigg, 2001) - recently, subjects with ADHD have also been associated with an increase in creativity (Fugate et al., 2013). Capabilities such as the ability to innovate, associated with creative thinking, have been shown to differ in adults with ADHD. Latent semantic analysis was used to measure semantic distance within cue-associate pairs on the word association task (White and Shah, 2016). This semantic distance between cue-associative word pairs, reflecting originality of associations, was found to be significantly greater on average for subjects with ADHD. White et al. associated this greater semantic distance with an increase in innovative thinking. Semantic similarity served as a measure of a positive, abstract cognitive correlate present in a clinical subject group.

In the case of autism, various aspects of cognition are affected, notably the social use of often emotionally laden language and associated social behavior (American Psychiatric Association, 2013). These impairments present themselves in a variety of ways, one of which is associated with story telling. Two studies used latent semantic analysis to assess narrative ability in subjects with a diagnosis of autism (Losh and Gordon, 2014; Luo et al., 2016). Here, tasks which required describing or retelling complex situations and weighting the importance of various characteristics of a situation were performed and the resulting speech transcribed and analyzed. Diminished aspects of narrative ability where evident not as much in amount of content but in the quality thereof during a demanding narrative recall task (Losh and Gordon, 2014). Similarly, a description of a visual scene presented in the thematic aperture test (Murray, 1943) showed significantly decreased narrative competency scores for subjects with a clinical diagnosis of ADHD (Lee et al., 2017). Using these measures of narrative competency, subjects with ADHD can be distinguished from healthy controls.

In a more limited, less free-form task of semantic production, emotionally positive and nega-tive descriptions of friends of subjects diagnosed with autism were analyzed by creating semantic space comparisons (Luo et al., 2016). Differences were found between healthy controls and sub-jects, and a support vector machine (SVM) was able to distinguish subjects with autism from controls by analyzing semantic similarity measures derived using LSA. Thus, latent semantic characteristics of descriptions could be used for training a SVM to automatically distinguish cases from controls with high accuracy (Luo et al., 2016).

Bipolar disorder is often characterized as a psychiatric disorder concerning mood; however, effects on general cognitive functions such as executive functioning and memory have been found (eg. Dickerson et al., 2004). Using semantic verbal fluency, cognitive deficits in regard

(13)

to semantic functioning were examined (Sung et al., 2013). Subjects with bipolar disorder showed less coherent clusters of animal names occurring with a lower frequency, but overall productivity was decreased as well compared to healthy controls (Sung et al., 2013). In this research, an established semantic task was examined to provide insight in cognitive impairments to a finer degree - the specific impairment regarding rare animal names, associated with a concept retrieval/access deficit- instead of a general decrease in production. More knowledge about impaired underlying cognitive mechanisms was thus retrieved from a standard task using automated semantic analysis.

Impaired cognitive functioning is similarly associated with schizophrenia. In this disorder, the symptom occurs together with positive symptoms such as hallucinations and delusions. This clinical group presents a rich target for the application of semantic techniques for analysis, since language production in general and associated measures of coherence and incoherence differ, as do measures of semantic clustering and switching (Elvevag et al., 2007; Sung et al., 2012). As a measure for the symptom of formal thought disorder, semantic coherence is calculated as a derivative of semantic similarity using LSA. A larger amount of dissimilar words in a context compared to healthy subjects serves as a measure of incoherence.

Incoherence as reflected by speech is thought to partially reflect disorganization and inco-herence in thought, as well as being a disorder of language capability in general (Holshausen et al., 2014). The measure is especially interesting as it has shown its use as a marker not only for distinguishing healthy controls from subjects, but also as a measure which, when combined with machine learning, can predict episodes of psychosis with remarkable accuracy in an at-risk group (Elvevag et al., 2007). The measure thus provides potentially very valuable warning signs for early intervention in a pre-clinical subject group, something which psychiatrists (as a gold standard) are thus far unable to reliably predict.

Additionally, various deficits regarding semantic clustering and switching between semantic clusters have been found in schizophrenia (Sung et al., 2012). Subjects with schizophrenia showed less coherence in their produced semantic clusters, both in high and low frequency exemplars produced during a semantic verbal fluency task, as compared to healthy controls. The authors interpret these results as suggesting that a deficit in automatic activation of semantic information is a key feature of schizophrenia (Sung et al., 2012). The research thus not only finds a significant difference in semantic measures between patients and healthy controls, but aims to explain this difference by referring to a specific cognitive mechanism.

Findings relating to mild cognitive impairment (MCI) as a precursor of Alzheimer’s disease (AD) show promise for the use of semantic information in a prognostic role. MCI is a stage of cognitive decline intermediately placed between the decline associated with typical, healthy aging and that associated with the more serious dementia or Alzheimer’s disease (Petersen, 2011). AD is a neurodegenerative disorder with a widespread impact, which is associated with various symptoms. AD is a type of dementia, which is the most prevalent and impacting symptom. The widespread prevalence of the disorder and societal impact lend it importance as a target for research. Pakhomov and Hemmy (2014) show that clustering measures derived from semantic verbal fluency tasks serve as a marker for prognosis of the progression from MCI to dementia. In this case, semantic analysis helps by creating opportunities for earlier intervention based on subtle symptoms, predicting progression.

We have presented a variety of applications of automated semantic measures as applied to various clinical psychiatric groups. However, this is not an exhaustive list of possible uses. Research of semantic characteristics has been applied to other clinically relevant groups such as those diagnosed with post-traumatic stress syndrome (He et al., 2017), Parkinson’s disease (Garc´ıa et al., 2016) and to examine the effect of psychoactive drugs (Bedi et al., 2014).

The possible benefit of these techniques can hardly be overstated, as recent research in predicting suicide attempts illustrates (Cook et al., 2016). The capacity to assess with high accuracy whether a subject will attempt suicide, using semantic information derived from an

(14)

open-ended question (“how do you feel today?”) in combination with machine learning has potentially enormous benefits, as the authors note: “..these models have promise for rapidly identifying persons at risk of suicide or psychological distress and could provide a low-cost screening alternative in settings where lengthy structured item surveys are not feasible” (Cook et al., 2016).

Based on the numerous recent findings and developing methods, it seems likely that future applications in clinical contexts will be found, providing both advances in applications as well as in explaining underlying cognitive mechanisms. Through the use of automated semantic analysis, complex underlying cognitive processes can be quantified and assessed, in turn enabling future applications in life-saving contexts.

2.3.2 Insight in underlying mechanisms of cognition

The noted research does not only play directly utilitarian roles in clinical contexts; cognitive mechanisms have been researched using the methods as well. Cognitive test and aspects of clinical interviews diagnose, prognose, or assess severity of psychiatric disorders by examining semantic measures of language; in turn, these semantic capacities require a range of cognitive functions. Impairment in these cognitive function will be reflected in the semantic production. By assessing semantic impairments or changes in language, information regarding the cognitive functions and mechanisms involved with language can be assessed.

Some findings have been primarily discussed as applications of correlates of future disorder severity, giving a measure of prognosis. However, often a specific semantic deficit - or in the case of innovative thinking in ADHD (White and Shah, 2016) a surplus - is associated with a specific cognitive function, which can be examined or detected by examining its effect on the production of language. Researchers postulate that the word-cue association task can serve as a measure of this underlying aspect of cognition. Through the use of semantic measures, this distance can be quantified, allowing comparisons of this abstract cognitive mechanism.

Various cognitive processes have been measured using semantic analysis, such as word re-trieval access problems in bipolar disorder. Impairments in this case differ from general semantic productivity deficits, narrowing down the cognitive processes affected as well as future treatment options for this disorder (Sung et al., 2013). Future association of these impaired functions with brain activity or neurotransmitters might also help achieve greater understanding of cognitive functioning.

An informative way in which language - and especially the vocal modality of producing speech - has been used in the past in clinical contexts, and continues to be used there, is through semi-structured interviews. In the case of schizophrenia, the interview is used in diag-nosing schizophrenia and assessing specific symptoms of the disorder (Kay et al., 1987). In this disorder, coherence in language has been used as an indicator of coherence of thought, allowing measurements of disorder severity but also predicting disorder course (Bedi et al., 2015). In schizophrenia, the specific reflection of the incoherence in language use has been used in the past to postulate specific turns of thought, but only by using subjective assessment in individual cases. The advent of automated clustering gives us specific insight with reduced effort where and how cognitive mechanisms go astray compared to the healthy population (Sung et al., 2012; Holshausen et al., 2014; Tagamets et al., 2014).

2.3.3 Hurdles for semantic analysis in clinical contexts

Various authors point to the desired objectivity of measures of language. While produced language has served as a marker of clinical factors for quite some time, often judgments based on longer semantic productions were subjective, vague, or dependent on large amounts of labor, through individual interpretation of subtle effects (Pakhomov and Hemmy, 2014; Losh and Gordon, 2014; Luo et al., 2016). The reasons for this complexity are caused by multiple factors

(15)

inherent to language and as such various techniques and methods are required to partially overcome these hurdles.

Firstly, language is inherently complex in a variety of ways. The existence of synonyms -different words with very similar semantic content - as well as homonyms - words which change meaning depending on context - makes semantics inherently ambiguous. For example, the word “crane” can refer to a particular species of bird, but can also refer to a machine used to lift heavy objects (Hoffman et al., 2013). Use of the word-token “crane” might be deemed coherent in a context of bird-talk by an automated algorithm, but can also be deemed incoherent if the different semantic association of lifting mechanism has been learned by the algorithm. Any way of measuring semantic aspects of word use in an objective manner has to account for these possible ambiguities.

Secondly, before speech can be used for analysis, work is required to shape, tag, and classify the words. Effort is required both to make sure procedures are objective, and repeatable, but also in the amount of labor to preprocess any produced language for semantic analysis. First, a procedure has to be developed that classifies or tags every word based on predetermined cate-gories, for example as ‘jungle animal’. Although established procedures exist (Tombaugh et al., 1999; Voutilainen, 2003), the tagging of rare words can differ between different researchers, introducing possible bias. While tagging with labels such as noun or verb is easier, semantic content is harder, more so in ecologically plausible language production tasks, with produced language being relatively unconstrained. Here, procedures for transcription of produced speech or transfer thereof to a format suited for statistical analysis have to be performed, often in a blinded way in order to avoid possible bias. Inter-rater reliability assessment during transcrip-tion procedures can overcome these hurdles, but introduces additranscrip-tional labor.

The previously reviewed promises in regard to automated transcription of speech hold promise here as well; however, care has to be taken that solutions developed and fine-tuned for a healthy population are adequately adapted for a clinical population. Reliance on au-tomatic transcriptions on a normal manner of speech risks removal of exactly the abnormal aspects in which researchers are interested. Generalization serves a purpose in this manner by aggregating large amounts of data to find the underlying pattern, but care needs to taken that unique characteristics of smaller populations such as those diagnosed with disorders are not lost. A speculative solution might be the creation of a semantic corpus per disorder, but this is contingent on the amount of data available.

Other issues exist in regards to generalization of semantic findings from clinical contexts, especially those involving clinical (semi) structured interviews and tasks. The role of the in-terviewer is a factor which can lead to the introduction of bias. How much rapport there is between the interviewer and responding subject, based on connection, age, gender, but also the absence or presence of an accent can play a role, for example in affecting tempo of speech. It is almost impossible to achieve the exact same level of rapport in regard to different subjects as an interviewer due to the aforementioned issues. The interaction between interviewer and respondent, and thus the produced speech, can change based on roles taken during the inter-view, for example in adopting a more or less assertive role in conversation. Thus, any effort to quantify measures of speech risks introducing bias due to differing interaction between the responding subject and the interviewer (Dykema et al., 1997).

Similar issues exists regarding differences between the interviewers, on one site or across multiple sites. The subject of the conversation is another important possible confounder; neutral use of language such as naming animals can be affected in a different manner compared to aspects such as emotional speech in the case of depressed patients, or use of specific words which might trigger associations of particular events in subjects diagnosed with post-traumatic stress syndrome. Topics of conversation during the interview can play a role in the semantic aspects of language produced, and thus in quantified measures derived thereof. Additionally, longitudinal changes in language are an unknown factor. How reflective are subtle measures of

(16)

quantified language in a context of prognosis of schizophrenia if persons can have mood or sleep-related changes in a day? This question becomes especially poignant if used to discriminate between diagnoses that can present similar symptoms, such as bipolar disorder or depression.

These issues can potentially be alleviated by collecting large and varied amounts of data, collecting from a large amount of subjects, interviewers, sites of treatment and other relevant possible confounding factors, combined with careful filtering and analysis of relevant factors. While automated semantic analysis in clinical contexts has shown a number of interesting find-ings, such a comprehensive data-collection and associated knowledge of analysis does not yet exist; a number of the aforementioned possible issues and questions are as of yet unanswered.

2.4 Promising developments in semantic analysis

A number of findings and applications made possible by the usage of automated semantic analy-sis of various modularities of language have been reviewed, but new developments are visible on the horizon. Through the quantification of semantic similarity, avenues of analysis have opened up through the reduction of bias and introduction of a stable, accepted methodology. While surface measures such as exemplar count in verbal fluency tasks, word count or type-token ratio are already informative by showing differentiation between various subgroups whether within a clinical subject group for prognosis or between healthy subjects and controls, measures using semantic space show finer-grained differences within at-risk groups, a valuable capacity in a clinical context.

The use of semantic space by techniques such as LSA or semantic clustering provides a way to elucidate the latent similarities between words in an n-dimensional space. The automated way in which semantic space can be derived from a large enough corpora, and the objectiveness of comparisons of either corpora or phrases as assessed within a corpus, enables the assessment of symptomatology of specific psychiatric symptoms, including but not limited to depression, schizophrenia, mania, Parkinson’s disease and Alzheimer’s disease.

The desire to be objective in measurements of semantic content can be realized using quanti-tative analysis instead of previously used qualiquanti-tative or subjective scoring of manually identified categories. As such, some significant hurdles have already been overcome. Complex sources of information can be analyzed to reveal underlying patterns, making it possible to reduce language in dimensionality allowing for quantitative assessment. Machine learning opens up further av-enues to explore, classify and interpret large, complex sets of data. The ability to detect subtle, underlying patterns shows particular promise as a future avenue in order to diagnose heteroge-neous psychiatric disorders through measures of language.

The more data available for machine learning from a wider range of cases, the better machine learning algorithms are at classification and generalizability in regard to new cases, both due to improved validation but also due to data ideally coming from a wider range of (sub)types of cases. Improvements in the availability of data - through a decrease in workload by automated transcription as well as the accumulation of larger amounts of clinical cases - will improve machine learning capabilities and thus usefulness of semantic analysis in the future.

The use of computational assessment of semantic content is developing rapidly, with a va-riety of use-cases such as search engine optimization and extracting semantic content for use in advertisement. Various new algorithms that examine semantic content are thus becoming available, such as word2vec, gensim and GloVe (Goldberg and Levy, 2014; Rehurek and Sojka, 2011; Pennington et al., 2014).

Techniques like word2vec hold specific promise, as finer-grained techniques of examining word embedding within text enables the use of word meanings not only through overall contexts as in LSA, but also the specific words next to it. Thus, techniques like word2vec allow the detection of differences in syntax and word order, which carry meaning in language (Goldberg and Levy, 2014). As such, the word2vec word vectors give finer-grained measures of word embeddings, opening up the possibility of further, more detailed semantic characteristics.

(17)

Together with automated transcription, surface semantic measures such as word count, statistical measures such as word occurrence, and quantification of word context using latent semantic analysis the application of an objective, quantitative analysis of the complex phe-nomenon of language in various psychiatric contexts is made possible.

3 Synthesis and recommendations

The application of semantic analysis has alleviated some issues present in clinical contexts -notably, the problem of quantifying the complex and ambiguous nature of language. A variety of promising research has been reviewed, both in applied as in more research-oriented fields. In this section we provide a synthesis of issues in the use of semantic space analysis in clinical settings.

The reviewed research is wide-spread in regards both to disorders studied, techniques used, findings, and association with specific cognitive functions. Word-cue association tasks in a study of innovative thinking stand next to the prognosis of psychotic episodes in subjects at risk for schizophrenia (White and Shah, 2016; Bedi et al., 2015). Both make use of automated semantic analysis in a clinical context, illustrating the range of applications. While we reviewed the main techniques employed in the field - LSA and semantic clustering - as well as promising new techniques, the field is relatively new and as such has not yet settled. Techniques and methods examined here could in this sense be regarded as initial efforts, rather than standardized applications, especially considering ongoing developments in language acquisition, processing, and analysis.

The wide range of applications can be regarded as an indication of the strong indicative capability of language on cognitive processes - the aforementioned window into the world of the mind (Tausczik and Pennebaker, 2010). Since cognition can be impaired in different ways by different pathologies, a variety of semantic measures can be expected to be impaired; the range of reviewed research is an indication of this variety.

As we have seen, use of language provides a valuable window into the mind and the cognitive processes as well as their impairment in clinical settings. To distill the information for practical applications requires overcoming obstacles, especially the quantitative objective assessment of a complex phenomenon such as meaning in language. Automated semantic analysis provides the tools required to at least partially overcome these obstacles. Through the use of mathematical techniques to grasp underlying structures of meaning in language, the complexity of homonyms and synonyms can be partly overcome, allowing applications in clinical contexts.

3.1 Prognosis, diagnosis, and assessment

Since semantic measures are one way of collecting data of cognitive functions and impairments thereof, it can provide information regarding different aspects of a variety of clinical disorders which impact cognitive functions. Here, we distinguish three related aspects of use in clinical contexts; prognosis, assessment, and diagnosis.

In the case of prognosis, semantic measures can help predict how or whether a disorder will develop. Predicting psychosis from at high-risk individuals is an example of predicting whether possible prodromal symptoms are already present which disorder will develop into a clinically-diagnosed psychotic episode (Bedi et al., 2015). If an individual has been determined to have a high chance of developing a psychosis, preventive treatment of symptoms and avoidance of high-risk factors in developing one might be taken. Currently, psychiatrists can not reliably determine who will and who will not develop a psychotic episode, a highly disruptive and impacting event. A similar prospective use has previously been discussed predicting suicidal ideation using machine learning (Cook et al., 2016). Transforming these findings into clinically useful tools would thus be of immense value.

(18)

The assessment of patients is another promising venue. Since semantic aspects of language are thought to be indicators of even relatively subtle cognitive mechanisms, changes over time in these cognitive functions should reflect in language use. Using this framework of underlying cognitive mechanisms being reflected in language, any intervention aimed at improving cogni-tive function can be measured by assessing the language. Used in conjunction with a clinical intervention, semantic analysis can shed light on the efficacy of treatment. As a speculative application, we can imagine changes in doses of anti-hallucinogenic medication in schizophre-nia being correlated correlated to measures of semantic coherence. Analysis of these semantic measures could then be used to assess whether doses of medication can be safely increased or decreased to prevent the occurrence of psychotic episodes, similarly to the proven ability to detect future psychotic episodes in at-risk patients (Bedi et al., 2015).

Regarding diagnosis of clinical disorders through semantic analysis, we note that a majority of current research distinguishes or compares a clinical group with healthy controls. The ability to distinguish between different clinical groups is a hurdle that still needs to be overcome, since the ability to distinguish between a healthy control and a patient is not particularly useful in a clinical context. After all, if a patient is admitted to an interview with a health-care professional, it is certain the patient is not a healthy control. In clinical contexts, differentiation between heterogeneous disorders is of vital importance.

Seen from this point of view, quite some of the previous findings are merely proofs of concept which are not directly applicable to a practical context. Expressed differently, it could be the case that much of the reviewed research only shows some semantic aspect of language being ‘abnormal’, rather than being able to detect a disorder-specific impairment in cognitive mechanisms. Discovering an abnormality is easier than defining the mechanism and relative uniqueness in regard to other disorders. Thus, research which focuses on between-diagnosis differentiation and difference is a major area of importance in the quest to apply semantic measures in a clinical context. The reviewed studies which performed prognosis in regard to the development of Alzheimer’s disease from mild cognitive impairment using clustering (Pakhomov and Hemmy, 2014) and classification of future episodes of psychosis from ultra-high risk groups using LSA (Bedi et al., 2015) are encouraging examples of valuable additions to the clinical setting because of their ecologically valid comparison, once sufficiently validated.

For research to find its way to clinical application using these semantic measures, larger datasets for validation will be required, as will fine-tuned measurements of language between diagnoses that are hard to distinguish for the clinical professional. The previously mentioned hurdles that stand in the way before true applications in the clinical setting can be overcome, but do require significant work, especially due to the heterogeneous nature of various psychiatric disorders and the overlap between disorders, as varied as schizophrenia and autism (Konstanta-reas and Hewitt, 2001). The issue of cross-diagnosis is in our view one of the biggest obstacles not yet sufficiently researched.

3.2 Improvements in collecting and processing language

Most research reviewed makes use of automated ways to analyze the data in semantic aspects by using corpora which use context of words, or show the clusters of associated exemplars with which words often occur. The analysis of these aspects by automated techniques are an improvement because they show underlying patterns in a fast, objective, quantitative way (Landauer et al., 1998; Landauer, 2006).

However, the preparation of these data is still a labor-intensive procedure, especially if speech has to be transcribed. This is not as intensive for verbal fluency tasks, where there is a limited timespan, objects are all of one sort (i.e. animals). With tasks such as story retelling or semi-structured interviews - which are already used for other measures and thus present a valuable source of information - transcribing audio recordings is a long process, often requiring specialized companies, as well as blinding and coding of participants to prevent the introduction

(19)

of bias. To alleviate the dual problems of effort required and the possibility of bias, automated methods for language processing can play a larger role in the future.

Advances in these methods have been previously discussed in 2.1.1; natural language pro-cessing and the greater availability of digitally-produced text show promise in alleviating the sparsity of available data for semantic analysis. The processing of human speech and auto-mated transcription of the words spoken promises to be a reliable, methodologically repeatable improvement in the analysis of semantic measures, both in application as well as in exploring underlying cognitive mechanisms, primarily by making available a wealth of information which would otherwise be inaccessible simply by the amount of labor required. These techniques are developed for commercial purposes such as Apple Inc.’s Siri personal assistant or Google’s Speech algorithms (Assefi et al., 2015; Jaitly et al., 2012) but, by adding new features, will certainly add more possibilities to the clinical applications of language analysis (Aleksic et al., 2015).

3.3 New algorithms for semantic analysis

As touched upon in chapter 2.4, the recently described word2vec technique makes use of word contexts by transforming a phrase, sentence or word to a vector similarly to LSA, thus measuring word embedding in contexts (Goldberg and Levy, 2014). The word2vec technique additionally takes into account the order of words as used in a sentence. Using LSA, context of words is taken into account, but the order of the words present is not; thus, the phrase “dog bites cat” is a very good match for “cat bites dog” due to the words having the same context. Word2vec and similar techniques can differentiate between these phrases, which in these types of cases carries semantic information. As such, it provides a valuable addition to the assortment of techniques usable for semantic analysis of language produced by subjects in clinical contexts.

Similarly, deriving semantic information can also be used in order to select specific phrases or words deemed important with relative ease. It has been postulated that self-referencing words are particularly relevant in schizophrenia (Gallagher, 2000; Fineberg et al., 2016). Other clinically defined groups can be analyzed in a comparable way to find aspects of language or even use of specific terms which differ from the healthy population, in effect completing the circle by measuring differences in specific word use associated with a disorder.

Through the combination of other semantic measures, new algorithms as well as an increase of data availability due to reduced workload from automated transcription and greater avail-ability of digitally produced language, various problems become more tractable. The problems discussed in chapter 2.3.3 regarding generalizability of results across different interviewers, in a more ecologically plausible variety of environments, can be alleviated by having larger datasets and usage of new techniques from machine learning, such as random forest classifiers (Breiman, 2001). As available data grows, fortuitous discoveries can be made. A large sample of analyz-able semantic productions from a clinical group might be explored for significant differences in unexpected dimensions compared to healthy controls or another clinically diagnosed group of subjects.

Other developments in machine learning enable the combining of measures such as LSA with other semantic measures derived from tasks such as SVF. Using these developments a more holistic, comprehensive assessment of language for use in clinical contexts can be performed. Through the advent of advanced techniques like ensemble learning, predictors from a variety of fields can be combined to classify aspects of disorders (Dietterich, 2002). Such a combination of various measures, combined with machine learning techniques allow the pruning and selection of the most informative measures out of all possible combinations of semantic as well as non-semantic measures collected in clinical contexts. In our view, these developments make it likely that in the future, automated semantics will play a greater role in clinical contexts.

This greater capacity to analyze semantic aspects of language promotes research regarding derived measures which in turn might enable better classification, and the associated diagnosis,

(20)

prognosis and assessment as well as information regarding cognitive mechanisms.

3.4 A window into the mind

By using prior findings from the literature regarding correlates of impairments in cognitive mechanisms with language produced in speech and writing, research has connected automated semantic analysis to underlying cognitive mechanisms. Through the application of model-based neuroscience, fine-tuned effects of cognition have been implicated in specific disorders (e.g. Rosenstein et al., 2014). Speech disconnectedness in schizophrenia measured through LSA enabled researcher to indirectly measure cohesiveness of the underlying thoughts (Holshausen et al., 2014). By using quantitative measures associated with semantic productions or the cognitive processes involved in the production of it, knowledge of the underlying cognitive processes can be derived. Various authors of the reviewed research on the fact that language and speech provide a window into underlying, usually inaccessible phenomena - cognitive processes and content.

The amount of information that can be conveyed using aspects of language carry makes it both a complex as well as a rich target for analysis. As the underlying cognitive functions are equally if not more complex then the language that they help produce, the connection between the two allows assessing aspects of one from the other. However, the phrase “window into the mind” as used by Tausczik and Pennebaker (2010) should be interpreted in a limited way. Cognition is involved in more activities than just production of language, and although a connection exists between the realm of cognitive processes and language, this connection is not exhaustive or unfiltered.

While we here have focused on automated analysis of semantic aspects, other use of language such as in clinical interviews and undirected speech performed by clinical subjects should not be forgotten. One of the goals of semantic analysis in clinical settings is to provide more information that can ultimately be used to help improve treatment. Here, automated semantic analysis is one of the tools available for research and use, although a very promising one; previous efforts to assess cognitive mechanisms using language have suffered from the absence of quantitative methods of analysis related to subtle effects present in speech or writing. Research and assessment of these measures has been enabled by automated semantic analysis, opening the window slightly more.

4 Summary and Conclusion

Automated analysis of semantic content in clinical settings shows promise, both in the role as application as well as an explanatory tool for underlying cognitive processes. Techniques such as LSA, semantic clustering and associates are able to quantify a rich source of information which can be used to diagnose, prognose, and assess various clinical groups using semantic measures. Future research using new techniques, on larger datasets, is a prerequisite before widespread application, with differentiation between clinically diagnosed groups of subjects being a main issue. Moving beyond research mainly using healthy controls versus a single clinically diagnosed group is necessary for the clinical contexts where differentiation between similar, heterogeneous disorders is a requirement. Some studies have shown promising results in this more ecologically plausible task.

Advancements in automatic transcription, natural language processing and digitally avail-able language can alleviate some of the identified issues. Together with the use of new, more precise algorithms for semantic analysis, research of semantic aspects of language in clinical contexts has a promising future. The combination of automated, quantitative analysis with the complex, informative phenomenon of language makes it a rich field for further research.

Automated semantic speech analysis in clinical contexts

Literature thesis