• No results found

COMPUTER-ASSISTED ANALYSIS OF MODERN GREEK POETRY

N/A
N/A
Protected

Academic year: 2021

Share "COMPUTER-ASSISTED ANALYSIS OF MODERN GREEK POETRY"

Copied!
85
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

COMPUTER-ASSISTED ANALYSIS OF MODERN GREEK POETRY

by Maria Charitou Student number: 11103701

A thesis submitted in partial fulfillment of the requirements for the degree of Master of Arts

in Media Studies: New Media and Digital Culture at

The University of Amsterdam June 2016

Supervisor: Prof. Dr. K.H. (Karina) van Dalen-Oskam

Second Reader: Prof. Dr. R.A. Rogers

©2016 Maria Charitou All Rights Reserved E-mail: charitou.ma@gmail.com

(2)

2

To Vasiliki,

(3)

3

Acknowledgements

I would like to express my gratitude to my supervisor Karina van Dalen-Oskam, Professor of Computational Literary Studies at the University of Amsterdam for the useful comments, remarks and engagement through the learning process of this master thesis. I would also like to acknowledge Richard Rogers, Professor of New Media and Digital Culture at the University of Amsterdam as the second reader of this thesis, and I am gratefully indebted for his very valuable comments on this thesis.

Furthermore, I would like to thank George K. Mikros, Professor of Computational and Quantitative Linguistics at the National and Kapodistrian University of Athens, for his advice and suggestions.

Overall, I would like to thank my sister Vasiliki, who has supported me throughout entire process, both by keeping me harmonious and helping me putting pieces together.

(4)

4

Table of Contents

1. Introduction ...6

1.1 Digital world and the Digital Humanities ...6

1.2 Digital Literary Studies ...13

1.3 Literary Text Analysis: Towards a Definition of ‘Style’...15

1.4 Computational Analysis of Poetry ...17

2. Research Question ...19

3. Methodology ...21

3.1 Corpus of texts ...21

3.1.1 Corpus preparation ...25

3.1.2 Modern Greek Language ...26

3.1.3 Poetic Generation: Definition and Periodization ...27

3.2 Software tools ...29

3.3 Stylistic Analysis: Style Markers and Text Collection ...31

4. Findings ...36

4.1 Most Frequent Words (MFW) ...38

4.1.1 Results on the whole corpus of texts ...38

4.1.2 Word-Frequency Results by Generations ...42

4.1.2.1 Conclusion ...51

4.1.3 Results by Gender ...52

4.2 Conclusion ...62

5. Literary Names ...63

5.1 Introduction ...63

5.2 Names in Modern Greek Poetry ...65

5.2.1 Methodology and Findings ...65

5.3 Discussion: Function of Names ...69

6. Conclusion ...73

6.1 Aim ...73

6.2 Suggestion for Future Research ...74

(5)

5

Abstract

The (over)abundance of information and the ability to have access on it, the digital revolution and the disposal of digital tools we tend to use nowadays in our daily life for academic, for research or other purposes, seem to mark a new era for Science, and especially for the Humanities. From ‘Humanities Computing’ to ‘e-Humanities’ and then to ‘Digital humanities’, this new interdisciplinary field of studies promises to provide new possibilities regarding the production of knowledge. Especially, knowledge is guided through a new form of science, the so called ‘data-driven’ science, once machines and algorithms are being used increasingly in order to study (literary) texts. This essay examines the possibilities given by the use of software tools and investigates how quantitative analysis can contribute significantly to the study of poetic texts, radically altering the way on which Humanities are being understood and practiced. Therefore, it mentions a number of significant debates on the field showing its vigorous effect, and makes a particular reference on literary studies and the challenge they face by the implementation of algorithms. Focusing on Stylometry and Onomastic studies, this study aims to analyze Modern Greek poetry with the use of digital Stylistic tools and to investigate how Proper Names operate in poetry. Overall, it aims to explore the role of computers for the analysis of Modern Greek poetry.

Keywords

Digital Humanities, Digital Literary Studies, Modern Greek Poetry, Textual Analysis, Computational Stylistics, Word-Frequency, Names.

(6)

6

1. Introduction

1.1 Digital world and the Digital Humanities

Digital technology and the current deluge of data are shifting, amongst other things (i.e. access to information), the way research is being conducted (Kitchin 2014, 128-148). Not only because research is increasingly being conducted through the use of digital technology, but also because with the rise of digital tools many traditional methodologies seem to have changed or tend to do so. Thus, digital technology spreads to a variety of epistemological fields and challenges traditional disciplines from the (Social) Sciences to Humanities, creating what is being called ‘Digital Humanities’.

Digital Humanities is an “emerging” interdisciplinary field of studies that intersects Computer Science and Humanities and examines the way Humanities deal with (digital) technology, (new) media and digital methods (Svensson 2010; Arthur and Bode 2014). The change of the term, from ‘Humanities computing’ to ‘Digital Humanities’, signifies the progressive development of the field as it “emerged from the low-prestige status of a support service into a genuinely intellectual endeavor with its own professional practices, rigorous standards, and exciting theoretical explorations” (Hayles 2012b, 43). Particularly, digital tools such as text mining, machine learning, data mining, data visualization, information retrieval, etc. merge with traditional Humanities practices, such as Hermeneutics, close reading, philology, archiving, critical textual or visual analysis, text analysis etc. The availability of tools, algorithms, databases, interfaces, software etc. suggests a different analysis from the traditional one, “focused on the finding of patterns, dynamics, and relationships in data” (Rieder and Röhle 2012, 70), and demonstrate the shift from micro to macro analysis (Jockers 2013).

Subsequently, this ‘digital diffusion’ places in a new frame the production of knowledge and has both epistemological and ontological repercussions: not only because digital tools, platforms, (computational) techniques, and (new) media have significant consequences on the production and dissemination of knowledge in the

(7)

7

Humanities, but also because they “represent new possibilities to study human interaction and imagination on a very large scale” (Rieder and Röhle 2012, 68). More precisely, it appears that digital technology affects the way we think and how we think about thinking, regarding the fact that nowadays we tend to think “through, with, and alongside media” (Hayles 2012a, 1). Moreover, it has implications on how we conceive of universities and of learning processes -by the application of new pedagogical practices in classrooms and new research methods.

But whereas the rise of Digital Humanities is said to have brought a “new lustre to a tired field” (Kitchin 2014, 141), they have been criticized of leaving behind old research methods –such as hermeneutics, semiotics and searching on library catalogues and archives collections–, thus giving rise to significant debates. Stanley Fish and Stephen Ramsay, amongst others, disputed strongly on whether literary criticism can be assisted by computer algorithms providing data ready to analyze (data-driven analysis), or, whether the criticism that derives from close reading of a text is unique and therefore irreplaceable (Fish 2012; Ramsay 2012). Gardiner and Musto while trying to define Digital Humanities, they pose a number of questions related to the scope and features that differentiate Digital Humanities from the Humanities. Amongst other questions the one that wonders about the significance of the digital seems to be crucial: “has the arrival of the digital forever changed the way humanists work, in the way they gather data and evidence or even in the very questions that humanists and the humanistic disciplines are now capable of posing?” (Gardiner and Musto 2015, 2).

More precisely, the growing use of digital tools in Humanities signify what Berry calls “computational turn” (2011, 11) in the Humanities and promotes new ways of gathering information, sharing and/or analyzing texts and data, teaching, and/or publishing. This “computational turn” though, has opportunities as well as challenges and tends to divide scholars into opponents and advocates, causing debates in the field (Gold 2012). Especially, there are those believing that the digital world is radically changing the way research is being conducted and those thinking that the digital simply helps the work of the scholar or even that it might undermine the core of it: “many humanists tend to view the digital humanities as a methodology that brings the tools and power of computing to bear on the traditional work of the humanities.” (Gardiner and Musto 2015, 3).

(8)

8

To some, it sees that digital tools give the opportunity to a scholar to conduct his/her research by spending less time and effort: (s)he now has at his/her disposal more sources (digital libraries; repositories; digitized manuscripts; Google books; Project Gutenberg etc.), that offer easy and quick access, regardless of scholar’s geographical position, and -last but not least- they are less costly (Gardiner and Musto 2015). Regarding that, it is worth mentioning that Digital Humanities comes at a time, when Humanities are dealing with budget cuts and scepticism about their value (Liu and Thomas 2015, 35). Specifically, humanities today seem to be “the victim of a perfect storm” and “budget cuts stemming from a persistent recession [...] have threatened humanities programs everywhere” (Jay 2014, 1). At the same time, Digital Humanities “experienced a banner year that saw cluster hires at multiple universities, the establishment of new digital humanities centers and initiatives across the globe, and multimillion-dollar grants distributed by federal agencies and charitable foundations.” (Gold 2012, ix).

Despite its financial aspect and general boost, Digital Humanities suggest a new approach to Humanities methodology. Computer and software tools in general can assist into giving different perspectives and aspects, “alternative visions” as Ramsay argues, when it comes, for example, to an analysis or interpretation of a text (Ramsay 2011, 16). In other words, software, as Berry declares, “allows for new ways of reading and writing” (2012, 13-14). By referring to Gertrude Stein’s book The Making of Americans, Berry points out that an analysis of such impenetrable texts can be facilitated through text mining (Berry 2012, 13). More specifically, the reading and interpretation process of The Making of Americans, a text “almost impossible to read it in a traditional, linear manner” (Clement 2008, 362) because of its structure, was facilitated through various computational techniques, such as the visualization of certain patterns. As Clement argues: “by visualizing certain patterns and looking at the text ‘from a distance’ through textual analytics and visualizations, we are enabled to make readings that were formerly inhibited. […] Using text mining to retrieve repetitive patterns and treating each as a single object makes it possible to visualize and compare the three dimensions upon which these repetitions co-occur—by length, frequency, and location—in a single view.” (2008, 361). That clarifies that in some cases algorithms may lead to a different perspective of a text and/or allow to have different kinds of knowledge about a text –such as patterns, trends, correlations, quantities or frequencies of particular words and phrases.

(9)

9 In addition, visualization and data mining (graphs, maps etc.) can be used in order to do ‘distance readings’ of literary books: “where ‘distance’ is not an obstacle, but a specific form of knowledge” (Moretti 2005, 1). Where traditional humanities base their research method on close reading, digital humanities introduce distant readings of texts. As Moretti declares, traditional humanities focuses on a “minimal

fraction of the literary field [...] canon of two hundred novels, for instance, sounds very large for nineteenth-century Britain (and is much larger than the current one), but is still less than one per cent of the novels that were actually published: twenty thousand, thirty, more, no one really knows – and close reading won’t help here, a novel a day every day of the year would take a century or so. .. And it’s not even a matter of time, but of method: a field this large cannot be understood by stitching together separate bits of knowledge about individual cases, because it isn’t a sum of individual cases: it’s a collective system that should be grasped as such, as a whole.” (2007, 3-4). Moretti speaks about a transition from texts to models, where computers can operate as, what Ramsay calls, “reading machines” (2011).

But, what does “reading” as a process of interpretation stands for and what kind of readings do we have when texts are being treated or studied through algorithms/machines? Briefly referring to the concept of the term, ‘Hermeneutics’ perceive ‘reading’ as a process in which the meaning of a text is being recovered by an attentive reader via the act of ‘interpretation’ (New Princeton Encyclopedia 1993, 516-520). Reading as a knowledge-producing process aims to unfold the meanings of a text, those that lie behind or within it. Kittler pointed on the transition from the ‘writing world’ to automation and machines, and particularly on how new media changed knowledge-producing processes, such as writing and reading (Kittler 1999). Therefore, ‘reading’ is a process that needs a reciprocal action between the text and the reader, where the reader interacts with the text (Kittler), whilst in the context of Digital Humanities ‘reading’ is a process of “becoming integrated with the text” (Evans and Rees 2012, 27).

A distinction between types of readings –close reading (the one that “correlates with deep attention”), machine reading (“an analysis through machine algorithms”) and hyper reading (“often associated with reading on the web”) – is pointed out by Hayles who asserts that within the Humanities “reading connotes sophisticated interpretations achieved through long years of scholarly study and immersion in primary texts” (2012a, 11-12). On the other hand, Hayles continues,

(10)

10 “‘reading’ implies a model that eschews human interpretation for algorithms employing a minimum of assumptions about what results will prove interesting or important” (2012a, 8). It is true that with digital technology the reading process seems to have changed: especially because it is now being done through and with search engines, and via the opportunities that Internet provides, thus producing a type of reading that ‘scans’ (reading as an act of scanning) due to the fast and huge diffusion of information.

Along with Hayles, Evans and Rees argue that what have changed are the methodology and the results along with the subject and fields of research; thus the desire for the acquisition of knowledge still remains, whether Humanities are called digital or traditional (2012, 31). That the use of computers and particularly of digital tools brings out a new methodology to Humanities has been pointed out by demonstrating the significant facility of them on the procedure of interpretation. Either the close reader (i.e. the scholar etc.) or the computer, each one contributes differently to the interpretation of a text. As Burrows declares, “the close reader sees

things in a text […] to which computer programs give no easy access. The computer, on the other hand, reveals hidden patterns and enables us to marshal hosts of instances too numerous for our unassisted powers” (Burrows 2002, 696).

Ramsay’s proposal of an ‘algorithmic criticism’ acts as a new way to incorporate digital tools in Humanities research as they allow “critical engagement, interpretation, conversation, and contemplation” but also that via those tools “we channel the heightened objectivity made possible by the machine into the cultivation of those heightened subjectivities necessary for critical work” (Ramsay 2011, x). In other words, close reading of traditional Humanities is related to the subjective gaze of the scholar, while Digital Humanities can be objective because of the tools they are bringing (for example, coding).

A related point to consider though is that Ramsay does not reduce the value of subjectivity or subjective judgement when it comes to literary criticism: “Literary-critical interpretation is not just a qualitative matter; it is also an insistently subjective manner of engagement” (2011, 8). He therefore asserts that the patterns (data) a critic gets through a machine (computer) can be used for “grander rhetorical formations that constitute critical reading” (2011, 17) and he uses the term ‘algorithmic criticism’ to determine a literary criticism assisted by computers (2011, 32).

(11)

11

Furthermore, macro-analysis focuses on the analysis of concepts and features at the macro-level. This new approach points out the significance and potential of computational analysis for the interpretation of literary texts in a large scale. Jockers (2013) uses the word analysis instead of reading, thus emphasizing on the examination of data and the importance of computers, which can do large-scale analysis faster than humans. He tries to overcome the boundaries and possibilities of close-reading’s literary interpretations, by outlining the way this method can work in order to understand the individual works into large digital collections: for example by using topic models, relative work frequencies, stylometrics and statistical methods.

Digital tools might promote a literary research without even reading a text, based on the difference between close (human-based) and ‘distant reading’ (computer-based) (Culler 2010). “Literature cannot meaningfully be treated as data” Marche states (2012), because literary texts are more than just words or lyrics: as they constitute feeling expressions and/or reveal thoughts and opinions of the writer, algorithms are incapable of indicating the meaning behind the words. Subsequently, Marche argues (2012) that “the process of turning literature into data removes distinction itself. It removes taste”. More concretely, ‘taste’ constitutes a fundamental element of culture, enabling to “sense or intuit what is likely (or unlikely) to befall […] an individual occupying a given position in social space. It [i.e. taste] functions as a sort of social orientation, a ‘sense of one’s place’” (Bourdieu 1984, 484-85). In terms of this, ‘taste’ seems to include more or less a subjective gaze, which tends to be lost, according to Marche, when algorithms are being used. But how objective can a machine be? And do we really lose ‘taste’?

Before examining the extent to which objectivity is possible or real when it comes to algorithms, allow me to refer again to Marche for his controversy is worth mentioning. He refers to algorithms as “inherently fascistic, because they give the comforting illusion of an alterity to human affairs. “You don’t like this music? The algorithms have worked it out” is not so far from “You don’t like this law? It works objectively.” Algorithms have replaced laws of human nature, the vital distinction being that nobody can read them. They describe human meanings but are meaningless”.

An algorithm can decide for us or even play an auxiliary role in our decisions only if we allow it to do that. In other words, machines and computers are being used and -in a sense manipulated- by people who decide what to exclude and what to

(12)

12

include when coding, programming and/or analyzing (textual) data. In that sense there is not enough space for objectivity as we would like to pursue when it comes to algorithms and further on to Digital Humanities.

Especially, objectivity constitutes one challenge among the five ones that Rieder and Röhle (2012) present while questioning the growing use of digital tools in the Humanities: objectivity, visualisation, ‘black-boxing’, institutions and universalism (i.e. the desire to universality). While the main aim of Humanities’ research when using digital tools is objectivity, this cannot be entirely achieved. Machines might easily be used for a variety of tasks enabling research by overcoming any obstacles, but “they should not be taken to guarantee a higher epistemological status of the results” (Rieder and Röhle 2012, 73), as they might cause more complexities than unravel. Although machines seem to diminish or even to eliminate any human errors and/or subjective judgement, there are elements whenever an analysis takes place, such as selection and modelling, that are based upon subjective criteria and therefore this cannot be an objective procedure (Evans and Rees 2012,

27).

Moreover, the advent of visualisation tools has enabled the Humanities to present their research results in a different way, via images, graphs etc. Although visualisation as a procedure comes up with advantages, it also reveals drawbacks regarding the cogency of images: “their imagined link to an external reality, and the obscurity of their production process. Both problems can be addressed to some extent by drawing on the tradition of critical enquiry into the use of images that the humanities have fostered over the years” (Rieder and Röhle 2012, 75). Along with the desire of the Humanities to be universal, Rieder and Röhle point out that despite the “practical need to formalize contents and practices into data structures, algorithms, modes of representation, and possibilities for interaction” that practicality does not make “the methodological procedures more transparent” (2012, 75).

In terms of this, the interpretation of a text can -in a way- co-produced both by humans and machines. Digital technology assists today’s Humanities scholar by providing him useful tools and allowing him to have open (and sometimes free) access to literary works, books and scientific articles that huge databases include. By retrieving key words, phrases, titles and linguistic patterns located into thousands of literary texts in repositories and digital libraries, a researcher can find correlations between authors and genres, highlighting any linkages between cultures, and

(13)

13

investigate literary tendencies and movements of the past between different places and time periods.

An algorithm can provide us observations and/or interpretations that might otherwise not be possible to have (Moretti 2007), but on the contrary there are a number of limitations on what a computer or algorithm can accomplish. Barocas and Selbst demonstrate that “an algorithm is only as good as the data it works with” (2015, 1), meaning that algorithms and their results depend on the type of data each time the scholar applies. Thus, coding and computers might broaden the act of criticism, but they cannot substitute it. They can only play a supporting role by assisting the scholar in his/her research: “The computer’s role is only to ask how our engagements might be facilitated, but it does so with a staggering range of provisos and conditions” (Ramsay 2011, 66).

1.2 Digital Literary Studies

In this Digital Humanities era where the presence of computer and tools is so concrete, hegemonic and inevitable that it leads towards a ‘Post Digital Humanities’ era as Berry argues (2014), Literary studies are facing a great challenge and experience a significant gradual change. Hyper reading, e-Philology, digital literature, digital archives, computer-assisted analysis, digital editing, etc. are some of the consequences that take place when Literature encounters new media and digital tools. More precisely, Digital Literary studies constitute a branch of Digital Humanities that examine the possibilities and application of digital methods, practices and (software) tools in the field of literary studies (computational literary analysis: macro analysis, data mining, distant reading, topic modelling, visualisation etc). Digital humanities and digital literary studies “are concerned with reading practices, the creation of scholarship, the representation of knowledge, and the refinement and expansion of methodologies of interpretation—all undertaken [...] in a computer-assisted environment.” (Price and Siemens 2013). The scholar nowadays can be assisted by various visualisation tools that aim to depict data derived from large literary corpora. As Price and Siemens argue, “digital scholarship is remaking literary

(14)

14

studies, creating new lines of inquiry, reshaping our fundamental practices, and, in doing so, laying the foundation for a future in our field where computation (use of a computer) is an assumed, rather than a notably innovative or particularly remarkable, method of literary pursuit.” (2013).

In other words, Digital Literary studies introduce a variety of methodologies that contributes to an analysis of texts (stylistic, thematic analysis etc.) and interpretation supported by digital technology, by computers and algorithms. They signify amongst others a transition from close reading to machine and hyper reading, i.e. from reading papers and/or books to texts readable by computers. Amongst other things, Digital Literary studies concentrate on “style, diction, characterization, and interpretation” of either one literary text or across several groups of texts and apply “statistical techniques used in the narrower confines of authorship attribution to broader stylistic questions” (Hoover et al. 2014, 2). This quantitative approach to literary texts does not only focus on style but also to the research of issues such as genre, themes, characters etc.: text analysis programs have been developed to be employed for the study of various parts of speech (noun, adverb, preposition etc.), discrete themes, named entities (personal names, toponymy, etc.), sentiment, and meter (Jockers 2013, 16). In other words, computational text analysis focuses on stylistic analysis, stylometry, authorship attribution (i.e. the attempt to find the author of an unauthorized text) and intersects with natural language processing, artificial intelligence, data mining etc.

Stylistic studies, in particular, focus on examining and describing an author’s style (usually distinctive) and/or compare and contrast it with the style of one or more other authors. This comparison is based on the same language of written texts, genre, historical period etc. (Hoover et al. 2014, 90). In authorship attribution, for example, software tools have been employed to mark choice of words and measure letter and punctuation frequencies or function words and sentence structure, as they also constitute part of a writer’s style, in order to identify the author of a text. More recently, a visualization of eight classic fiction books was based on punctuation use, marking the differences between the authors (Calhoun 2016).

Moreover, in the context of text analysis, word frequencies and syntactic phenomena are being counted, such as word classes (noun, verb, prepositions, adjectives etc.) and phrasal categories, as well as grammatical and semantic categories (tense, number, gender; themes derived from the words, i.e. emotions words).

(15)

N-15

grams (sequences of words or letters), collocations (words that appear close to other words), sentences and phrases can also be counted as well as the length of various elements, such as word length (characters/letters), sentence length (the number of words per line, syllables or characters per sentence or clause) and of course the length of a text (the total number of words, sentences, punctuations, paragraphs, stanzas, chapters etc.). Apart from these linguistic features, it can also measure literary aspects, like motifs, characters, themes (death, war, fashion etc.) and names (Hoover 2008). In terms of this, comparisons can be made by examining different authors (ancestors, predecessors, contemporary), texts from different time periods and gender, different works or passages of the same author, translations of the same work etc.

1.3 Literary Text Analysis: Towards a Definition of ‘Style’

But what does ‘style’ stand for in a literary context? Various scholars have analysed its significance for literary studies and suggested a number of definitions throughout the years. This plurality of definitions comes from the different perspectives of ‘style’, either seen linguistically or from a Digital Humanities perspective.

Firstly, The New Princeton Encyclopedia of Poetry and Poetics (1993, 1225) connects the word ‘style’ to an author's individuality, his/her choice of words and collocations that construct the meaning of a text. Its feature of distinctiveness is also pointed out by Wales who defines ‘style’ as “the perceived distinctive manner of expression” (2001, 371). From a linguistic perspective, ‘style’ refers to a specific use of language to a specific context by a specific writer/speaker, in a particular time period. Subsequently, the term ‘style’ often signifies particular linguistic choices of an author (syntactic, lexical, morphological or other preferences or choices), which characterize/formulate his/her unique writing style (for example, scholars have analysed the style of T.S. Eliot, of Shakespeare etc.). Writers such as Leech and Short consider ‘style’ as “the way language is used in a particular genre, period, school of writing or some combination of these: ‘epistolary style’, ‘early eighteenth-century style’, ‘euphuistic style’1

, ‘the style of Victorian novels’, etc.” (Leech & Short 2007,

(16)

16

10). Leech and Short speak about a “linguistic ‘thumbprint’”, a stylistic fingerprint that declares the identity of the writer based upon this “individual combination of linguistic habits which somehow betrays him in all that he writes” (2007, 10).

That dominance of the linguistic habits which identify the ‘style’ of one author is also present to its definition from a Digital Humanities perspective. Into that context, “style is seen as anything that can be measured in the linguistic form of a text, such as vocabulary, punctuation marks, sentence length, word length, the use of character strings. [...] Every word and every feature contributes to the general outlook of the text; any other ratio in frequencies, any difference in mean sentence length, every individual punctuation use results in a different outlook of the text. In short: everything is important” (Herrmann, Van Dalen-Oskam, and Schöch 2015, 38). Herrmann, Van Dalen-Oskam, and Schöch propose a new definition of style, according to which “Style is a property of texts constituted by an ensemble of formal features which can be observed quantitatively or qualitatively” (2015, 44). Such a definition emphasizes the constitution of style through various text characteristics (levels such as syntactic, semantic, lexical etc. and sentences) and style is being viewed as a composite system: “By ‘ensemble’ we mean that style is constituted by the combination of many possible features and should be seen as a complex system, with features situated at different linguistic levels. By ‘formal features’, we mean linguistic features at the level of characters, lexicon, syntax, semantics, but also features going beyond the sentence, such as narrative perspective or textual macro-structure;” (Herrmann, Van Dalen-Oskam, and Schöch 2015, 44). By highlighting two approaches into the study of style, the quantitative and the qualitative, Herrmann, Van Dalen-Oskam, and Schöch refer to a specific style that can be studied with computational methods “based on computing frequencies, relations, and distributions of features and relevant statistics, as well as methods based on precise observation and description of individual occurrences.” (2015, 45).

Subsequently, the study of ‘style’ and of the particular use of language is called Stylistics; more accurately “a method of textual interpretation in which primacy of place is assigned to language.” The significance of language is determined by the fact that its structure is defined by various levels (syntactic, phonetic etc.) and forms that declare the function. And “the text’s functional significance as discourse acts in turn as a gateway to its interpretation” (Simpson 2004, 2). However, prose differs from poetry regarding linguistic choices and structure of the text. When someone

(17)

17

examines the style of a poetic text, she/he has to take into consideration various aspects of poetic levels. At the phonemic level, a poet’s style is formed by rhyme and rhythm, and sound devices, such as assonance, alliteration, consonance, resonance, repetition and meter. The syntactic level of a poem reveals stylistic options related to order of words, use of phrases, punctuation marks, the number of stanzas and lines or the number of words per line etc.; and the morphological level relates to word length, type of words (function and content words; conjunction words, nouns, adjectives, adverbs, verbs, participles etc.), and considers the variety of verb tenses and persons due to classification of pronouns, possessive determiners, and verb forms.

Despite the various definitions of style that have been proposed during time, today with the advent of the digital technology, new methods of analysing texts and studying style have come into use. Computational Stylistics or Stylometry refers to the study of literary style of an author or group of texts with the use of various computational techniques and it includes studies for authorship attribution and author style. Except from the study of style in fiction, Computational Stylistics has also applied to poetry. Especially, Computational Stylistics in poetry focuses on analyzing the frequency of the above mentioned aspects of a poetic text that signify a poets’ ‘fingerprint’.

1.4 Computational Analysis of Poetry

Poems constitute complex structures, as they consist of three major levels: the phonetic and semantic level and their form. Each one of them has a range of features determining the interpretation of a poem: its form consists of stanzas, lines and lyrics; the phonetic level refers to meter, intonation, prosody, rhythm etc. and the semantic one refers to genre, words, sentences etc. Each one of these features formulates ‘style’.

Regarding computational analysis, former studies on the field have proved that computational methods can contribute significantly to a better understanding of the poetic text. Specifically, computational analysis of poetry has been the main subject of a number of papers and studies from the very beginning of linguistic and literary computing (Kenny 1982; Beatie 1967) to the very recent time. In fact, previous

(18)

18

research on computational analysis of poetry has investigated influences between poets regarding style, theme etc. and similarities based on words and syntax (Coffee et al. 2013), and analysed patterns in poetry such as rhythm and rhyme (Plamondon 2006). Related studies have, also, focused on different aspects of the poetic texts, such as stylistic, phonetic features and authorship attribution.

In particular, Kao and Jurafksy (2015) analyzed the stylistic features of English poems of 19th and 20th century and compared them by looking at sound devices, diction, words of sentiment/emotional language etc. The analysis of those poems, written by different poets -Imagist2, professional, contemporary professional, and contemporary amateur poets- signified the impact of Imagism on modern poetry and unveiled differences between “‘high’ and ‘low’ art” (1-2). Kaplan and Blei (2007) estimated the division between the style of American poets (using poems of Robert Frost, Marianne Moore and Frank O’Hara) by examining a variety of features – orthographic, syntactic and phonemic– and visualized the differences based on poets’ relation.

Subsequently, Can et al. (2014) used computational methods by applying Automatic Text Categorization methods (ATC) to Ottoman poetic texts in order to classify a text written by an unknown poet or unknown time period. Moreover, Brooke et al. (2012; 2015) studied stylistic inconsistency and heterogeneity in The Waste Land, a poem written by T.S. Eliot, and known, amongst others, for the different voices it contains. They proved that computational stylistic analysis can be helpful in the identification of distinctive voices into a poetic text. Arefin et al. (2014) also examined word frequencies of texts from the Shakespearean era in order to find relations between them. Their results signified the importance of authors’ style into “explaining the variation of word use”, and “the differences between tragedy and comedy, early and late works, and plays and poems”.

(19)

19

2. Research Question

As showed above, Computational stylistics’ methods have been broadly employed to show stylistic differentiations or identify the author of unknown texts in several different languages. Taking into consideration the possibilities that software tools can provide us with when we study literary texts, my purpose is to demonstrate their importance for the study of Modern Greek Poetry. More concretely, this study suggests a way to unravel any relations between Modern Greek poets based on computational stylistics by focusing on analyzing word frequencies and named entities (name types and functions of names). I expect that my results will allow me to make concrete comparisons across different poets and time periods.

In terms of this, my main research question is: what do frequencies of semantic categories/words unfold for Modern Greek Poetry of 19th and 20th century and generally for Modern Greek Poetic Language? In achieving this, I will focus on the analysis of nouns and try to find what their frequencies reveal for Modern Greek poetry in general and for each Modern Greek poetic generation. I will, subsequently, search if these words are affected by literary period and whether and to what extent they reflect gender. The second part of this study will focus on names, their types, frequency and function. For my sub-research question -to what extent the use of named entities reveal relations between poets? - I will try to study names both quantitatively and qualitatively.

To the best of my knowledge nothing relevant has been studied before regarding Modern Greek literary texts and precisely Modern Greek poetry by using computational techniques, methods, and statistical analysis. Regarding computational stylistic analysis though, Pantopoulos (2012) examined the prominent stylistic features and word frequencies of three translators (Rae Dalven, Edmund Keeley and Philip Sherrard) of C.P. Cavafy’s canon poems and pointed out the difference in stylistic choices between the translators. Several Modern Greek studies have examined the relations between Greek poetic generations in an attempt to reveal influences and/or intertexts between predecessors and successors (Pylarinos 2009; Garantoudis 1999). These influences and intertexts occur in diverse levels of poems, such as themes, style, use of myth, form of the poem etc. (i.e. rewriting of forms,

(20)

20

myths, themes etc.) and might occur between Modern Greek and non Greek poets. In particular, Malli (2002) examined the extent on which the ‘Generation of the 70s’ has been influenced by American Beat poetry and Imagism. Although studies have examined the style of specific poets and/or generations (Malli 2002), it has not been published so far a study referring to stylistic analysis and comparison of several poetic generations simultaneously or a stylistic analysis focused on nouns used by poets belonging to a large poetic range (from 1821 and onwards).

Regarding names and their appearance on Modern Greek poetry of 19th and 20th century, relevant studies have not yet published. Although scholars studied the influence of a specific myth and mythical characters, such as Ikaros, Elpenor and Odysseus, in various Modern Greek poets (Savvides 1981; Oikonomou 2004; Katsigianni 1999) and how these are being re-presented, it still misses a study that will examine named entities in Modern Greek poetry, their function and difference between diverse generations. By comparing the frequency and function on different poetic generations, I hope to show the range of use on name types per generations, their function and difference. Therefore, the following chapters aim to fill in this gap and contribute to the studies for Modern Greek Poetic Language.

(21)

21

3. Methodology

3.1 Corpus of texts

The first step of my study was to determine the corpus of texts to be investigated. Since a limited percentage of Modern Greek poetic texts are available online, in digital form and freely accessible -mainly through the Digital Library of Modern Greek studies, the Anemi- the possibility of digitizing a certain amount of poetic collections or use OCR to already digitized books was rejected due to lack of time (digitization is a harsh and time consuming process). Therefore, I chose to work with already digitally/electronically existing poetic texts that could be found online.

I begun with a Web search in order to find and determine my set of electronic texts. My Web search located specific poetic anthologies that host texts from the 18th to the 19th century. Finding electronic Modern Greek poetic texts proved to be a harsh procedure: Project Gutenberg does not contain enough Modern Greek poems; Internet archive includes poems or anthologies not published during 19th century and moreover the number of books is limited (I got 28 results by searching the term ‘Modern Greek Poetry’ in Google); and the majority of Google Books related to Modern Greek Poetry are available only for preview, due to copyright issues. There are also a significant number of other online anthologies (such as ‘Portal for the Greek Language’ and ‘Myriobiblos’), but either their content was similar to the two online anthologies I finally chose, or, the collection of poems was not large enough or contained poets not belonging to those poetic generations I was looking for (mainly the newest generations).

Moreover, the number of poetic texts in electronic form varies (per poet, poetic generation, and gender), and usually the sites or databases that publish those poems are not official (blogs, online journals etc.), or, contain no information about the print source of poetic collections. Moreover, one concrete factor determining where to look for and get my set of electronic texts was their copyright license. Thus, before selecting the texts, I had to be sure that I could use them for my research and I could have information about their print source.

(22)

22

The next step was to create a corpus of poetic texts by retrieving them from specific official websites or databases in order to have a representative sample of Modern Greek poems. Therefore, Snhell.gr and Ekebi.gr were chosen as my main resources, not only because they both constitute official databases/archives and contain anthologies of Modern Greek poems, but also because they mention the print source of each poem (although Ekebi does not follow that for every single poem). Subsequently, I contacted the creators of Snhell.gr, and Ekebi.gr to have their permission before using their online anthologies.

The website ‘Spoudasterio Neou Hellenismou’, hence the acronym ‘Snhell’, was conceived and created by the Center for Neo-Hellenic Studies in Athens, Greece. It is edited in Modern Greek and contains 856 Modern Greek poetic and prose texts available in electronic form3. In total the database contains 101 Greek writers and the texts are derived from various historical periods: before World War I, between World War I and II, and after World War II. It is worth mentioning that the database is constantly enriching its content and, besides the collection of the above mentioned poems, it also includes prose texts, anthologies of testimonies, of Audio Readings, of children’s tradition, and information on the work of the poet C. P. Cavafy (1863-1933), of the historian and critic Konstantinos Th. Dimaras (1904-1992), and G.P. Savvides (1929-1995), an important scholar of modern Greek Literature.

Figure 1. Snhell: The first of the two online anthologies I chose for my corpus of texts. Its aim is to “promote Modern Greek Literature and Culture”.

(23)

23

Of great importance for my research is that the site declares the print source of each text. Although Snhell contains a significant number of poetic texts, it does not include enough poets from the latest generations or enough female poets. In order to have a greater variety of poems for the stylistic comparison between poets, I needed to add some female poets and include more poets from the latest poetic generations. Thus, the enlargement of the corpus of texts was done by adding poems from a second online anthology, the one published on ekebi.gr.

The website ekebi.gr –an acronym that stands for the ‘National Book Centre of Greece’– contains an anthology of Modern Greek poets, mostly derived from the Post War Generation, the Generation of 70’s and 80’s. Specifically, the collection of poems made by a Greek poet himself, Dimitris Kosmopoulos, includes 510 poems and 137 Modern poets of the 19th and 20th century, both Greek and Cyprian, derived from the years of the National Greek poet Dionysios Solomos (1798-1857) until the latest poetic generations.

Figure 2: Ekebi.gr promotes reading and books and contains several digital archives.

By unifying the content of the above mentioned online anthologies, ekebi.gr and Snhell.gr, I managed to compile a corpus of poetic texts from a broad range of poets, both female and male, derived from different time periods. The final anthology provides a representative sample of Modern Greek Poetry.

Subsequently, I decided to limit my corpus of poems to those written in the 19th and the 20th century; and therefore I excluded from the corpus: a) 14 poets of the

(24)

24

18th century and further back, mainly from snhell.gr; b) any prose texts existing in the database; specifically 14 writers that are being listed in snhell.gr; c) any poems translated by C.P. Cavafy (10 poems in total); and d) any Cyprian poets (appeared only in ekebi’s database). Thereafter, I combined the poems from the two databases and deleted any poems I encountered twice. Regarding the corpus of texts chosen for stylistic analysis, I had to exclude poets that only have 1 or 2 poems published in the anthologies (22 poems in total and 17 poets). The number of the poems is so small that cannot be representative of their poetry. However, these 15 poets were taken into consideration in the second part of this study, i.e. the study of named entities.

Overall, from both online anthologies I excluded 34 poets: 28 poets mentioned on snhell, and 6 Cyprian poets whose poems are published on ekebi. Thus, for my analysis it remained a total of 153 poets and 1167 poems. The graph in figure 3 unifies the two databases and depicts the exact amount of poets I used for my study:

Figure 3: The total amount of male and female poets on both online anthologies.

From the whole set of 1167 Poems, 77 poems are written by female poets and 1090 by male poets. As can be seen from the pie chart in fig. 3, female poets are greatly underrepresented in the Greek online databases compared to male poets. It is true that there are less published female than male poets until the 19th century, a phenomenon

(25)

25

probably related to education and general rights of women in Greece. Published writers of that time period were mostly men. Secondly, a main characteristic of the anthologies in general is that they mostly reflect the subjective gaze of the anthologist, who based on his/her own criteria decides which poet to include and what poems represent him/her more. This is not of course to accuse anthologists for not including female poets in online collections, but rather to pinpoint a great gap on the online presentation of female poetry in Greece.

3.1.1 Corpus preparation

In order to prepare the texts for my analysis I saved them in a ‘txt. file’ and converted them into ‘UTF-8’ coding to be readable by the tools I was going to use. From the main corpus of the poems I left out the titles, any mottos and/or any comments written by the poet or copyeditor, as those elements belong to paratexts (Genette 1997). The creation of the files was based on authors and poems (each author constitutes one file containing his/her poets), on gender (files divided into female and male), and poetic generation. Also, one file was created containing all poems by all poets for an overall analysis. For the poems taken from the website ekebi.gr I had to convert the system of tones, from polytonic to monotonic, in order to have a homogeneous corpus of texts and be readable by the software tools. For the conversion procedure I used an online tool, available on translatum.gr (see fig. 4 a,b).

(26)

26

Figure 4 (a, b): An example of conversion from polytonic to monotonic Greek.

3.1.2 Modern Greek Language

It is useful at this part of the study to briefly mention some features of the Modern Greek Language, as we are going to refer to them in the following chapters. Modern Greek language marks its official beginning with the fail of Constantinople, the capital city of Byzantium at 1453, although some features of the modern language have existed centuries before. It followed an evolutionary process originating from Ancient Greek and Byzantine that also marks the division of Greek Literature, into Ancient, Byzantine and Modern Greek.

One significant characteristic of Modern Greek was the matter of diglossia that took place during the 19th and 20th century. The term signifies a language controversy and the simultaneous use of two forms of Greek: Katharevousa and Demotic Greek. Katharevousa corresponds to a combination of Ancient and Modern Greek and was used for official purposes and in literature. It depicts the written and spoken language of the elite or well educated people of the period. On the other hand, the form Demotic referred to a daily use of Greek language that was mostly spoken by the mass. Demotic Greek became the official form of language of Greece in 1976 and in 1982 the polytonic system was removed. Katharevousa and Demotic Greek differ

(27)

27

not only in the tonic system but also in the syntax, both in the morphology as well as the syntactic form of language (Christidis et al. 2003).

Having selected poetic texts from 19th and 20th century I encountered a significant amount of variations of words, due to the fact that Greek has declensions, i.e. variations of the form of a noun, pronoun and/or adjective. These variations identify the grammatical case, the number and the gender of the word. Moreover, I came across different forms of Greek language, types of Demotic Greek as well as Katharevousa, especially when comparing texts of 19th to 20th century. Those variations of words where combined in order to refer to one word type by ignoring/deleting the suffixes and keeping only the root of each word (for nouns, adjectives, verbs, pronouns etc.).

3.1.3 Poetic Generation: Definition and Periodization

The term generation is as a way of classifying poets into generations by using as a basic criterion their age, i.e. date of birth, and/or the year of publication of their first poem(s) (Argyriou 1979). This factor determines the social, historical and ideological environment in which they grow up and are most influenced by. Poets of the same generation usually share experiences and beliefs and they often have common literary influences as well as aesthetic orientations. This is being depicted in their poetry as they share ways of expression (vocabulary, language, style) and themes (Garantoudis 1998, 194-202). Although Vitti (1995) gave a definition of the term ‘literary generation’ referring to the ‘Generation of the 30s’, that term was widely accepted by the critics as representative (Garantoudis). According to Vitti, a ‘literary generation’ is a group of writers having innovative ambitions, desire to differentiate significantly from their predecessors and present new themes and forms based on their common experiences (1995, 53). As Garantoudis declares, the terms “Generation of 1880”, “Generation of the 1920” and “Generation of the 1930” because of their long-term use in Modern Greek Literature, they formed the basis on which it was developed a trend for systematic genealogical classification of Modern Greek poetry.

Following this generally accepted classification and periodization of Modern Greek Poetry (Beaton 1996), I will present some of the main characteristics of each

(28)

28

generation that appears on my corpus of texts: The ‘Ionian Islands’ School’, The ‘Athenian School’, the ‘Generation of the 1880s’ or the ‘New Athenian School’, the ‘Generation of the 20s’, the ‘Generation of the 1930’, the ‘Post-War Generation’, the ‘Generation of the 70s’ and the 80s, and the ‘Generation of the 90s’.

Romanticism and Classicism were two defining features for both the ‘Ionian Islands’ School’ and the ‘Athenian School’. Both Schools developed patriotic themes in their poetry -their poems speak about freedom, duty and fighting spirit- and focused on themes such as love and nature. Often they appear to have an idealistic conception of art. The choice of the language seems to be a distinctive element of these two Schools: the poets of the ‘Athenian School’ wrote in Katharevousa (an archaic form of Greek language), while the poets of the ‘Ionian Islands’ School’ wrote mostly in Demotic Greek and were inspired by Italian poetry (Beaton 1996, 59 -81). The ‘Generation of 1880’ or the ‘New Athenian School’ that came next was divided into three groups, the Symbolism, the Parnassianism and the Romanticism, and the poets of that period were mostly interested in establishing demotic Greek into poetry (Beaton 1996, 102-104, 120-129).

The ‘Generation of the 20s’ was greatly influenced by Symbolism and can be divided into two groups of poets with slightly different directions. The poets of the first direction maintained bonds with tradition, they grieved a life which lost its ideals and used symbols in order to express their thoughts and mental conditions (for example, the poet Miltiadis Malakasis). The poets of the second direction, such as Kostas Karyotakis and Tellos Agras, also lamented their lost ideals and expressed a feeling of general fatigue and dissatisfaction. They wrote poems with a clear social message and stinging sarcasm, declaring their controversy in the society. They renewed the way of poetical expression by adding words closely to the style of prose (Beaton 1996, 168-173).

The ‘Generation of the 1930’ brought an innovation into poetry as the poets aimed to renew Greek poetry verbally, thematically and metrically by introducing and establishing the free verse. The ‘Generation of the Thirties’, the avant-garde of Modern Greek Literature, avoids lyricism and sentimentality (Vitti 1995, 49) and introduces a new poetic language peculiar to spoken Greek. The themes are derived from simple everyday facts of life and common human feelings (Vitti 1995, 87-184). Subsequently, the poets of the ‘Post-war Generation’ also known as ‘Generation of Defeat’, which is subdivided into the ‘First Post-War Generation’ and

(29)

29

the ‘Second’, were matured during the World War II, the Nazi occupation army and the Greek Civil War (1946-1949), and their poems are marked by a sense of defeat. Particularly, the poems of the first period are depicting the experiences of those years and forming social comments (political dimension), while at the second period the social comment and the lyrical element recedes and gives rise to a more personal poetry often with tragic tone (Menti 1995).

The next Generation, the ‘Generation of the 70s’, or ‘Generation of defiance/contestation’, is characterized by the broadness of the poetic voices. Despite the wide range of poets, there are some common themes deployed in their poetry, mostly related to the social, historical and political situation of their time in Greece and abroad: consumerism and intense urbanization, technological explosion, the dictatorship of the colonels in Greece (1967-1974) and the post-civil war climate of the ’50s. Their criticism and negative attitude towards any establishment or ideology, their diffuse and their nihilism often indicate an ironic and sometimes satirical language. From their first poetic impressions the poets of this generation incorporate a multiplicity of language, of textures and techniques. Their use of language it is believed to have brought a significant renewal of the poetic vocabulary, not only because poets tend to use simple words derived from the everyday speech, but also because they incorporate foreign words and their style is usually sharp and ironic (Alexiou 2001). The ‘Generation of the 80s’ or the ‘Generation of the private visions’ is a generation known for its introversion and avoidance of referring to social events or to issues of public concern. As Kefalas illustrates, the term ‘private vision’ “signifies the end of common myths and the absence of a collective vision and social collusion.” (1990, 136).

3.2 Software tools

The software used for the text analysis were two tools that generate word-frequency lists, word lists and concordances along with visualizations: AntConc and Stylo(). These two tools were chosen for my analysis because they are both free to download and available online, and they provide a multitude of options to the researcher. As

(30)

30

Hoover demonstrates, they can “test hunches about how specific words are used in a text [...], statistically compare word frequencies among several texts and can show which texts have unusual frequencies of words of interest [or] generate lists of collocations (words that occur repeatedly near each other).” (2013).

Specifically, AntConc is an open and available software program for text analysis developed by Laurence Anthony, Professor in the Faculty of Science and Engineering at Waseda University in Japan. It is available in several versions and runs on computers running Windows, Macintosh, and Linux. It has seven tools, the Concordance, the Concordance Plot, the File View, the Clusters/n-Grams, the Collocates, the Word List and the Keyword List, each one of them has different function (Anthony 2015). I downloaded the Windows version 3.4.4.0 of AntConc. The Word List tool counts the most frequent words in the corpus and presents them in a list, an element which was helpful in my study in order to retrieve and analyze the nouns and the names. AntConc was mostly used to get word frequency lists and word lists, as it has the option to retrieve words based on their root. In that way I was able to create lemmas, something that it was important for my analysis.

Stylo () is a free, available and open-source R package, written in R programming language, destined for a variety of stylometric analyses and developed by Maciej Eder, an associate professor at the Pedagogical University of Kraków, Poland, and at the Institute of Polish Language of the Polish Academy of Sciences, Jan Rybicki, Assistant Professor at the Institute of Modern Languages at the Pedagogical University of Kraków, and Mike Kestemont, Assistant Professor in the Department of Literature at the University of Antwerp4. Stylo has a variety of functions -stylo(), classify(), oppose(), rolling.delta(), rolling.classify() etc.- that can provide handy implementations of analyses for computational stylistics and can be used for a variety of purposes, such as authorship verification, stylistic analysis, genre and gender recognition etc. It is accompanied with a graphic user interface that allows the user to apply certain settings and choose from a variety of options. Moreover, it generates diverse graphs/visualizations, such as Principal Components Analysis, Cluster Analysis, Multidimensional Scaling, and Bootstrap Consensus Trees (Eder et al. 2016).

(31)

31

Especially, the stylo() function in the R package contains a variety of methods, such as multidimensional scaling, principal component analysis, cluster analysis, bootstrap consensus trees etc. (Eder et al. 2015, 1). More precisely, the stylo() function “processes electronic texts to create a list of all the words used in all texts studied, with their frequencies in the individual texts; selects words from the desired frequency ranges; performs additional procedures that might improve attribution, […]; compares the results for individual texts; performs a variety of multivariate analyses; presents the similarities/distances obtained in tree diagrams; and produces a bootstrap consensus tree (a new graph that combines many tree diagrams for a variety of parameter values).” (Eder et al. 2015, 4). Moreover, it gives the opportunity to the user to “automatically load and process a corpus of electronic text files from a specified folder, and to perform a variety of stylometric analyses from multivariate statistics to assess and visualize stylistic similarities between input texts.” (Eder et al. 2015, 7). Stylo was useful for my research not only for the stylistic analysis (word frequencies and word lists) but also for the production of visualization of texts based on their similarities and differences. Thus, I first installed the 64bit version of R package, i.e. the software program for statistical computing and graphics, and then the version 0.6.3 of the R package (stylo). Every time I wanted to start a R session and use stylo() I was loading the ‘stylo’ library, that simultaneously makes all functions of the R package active.

3.3 Stylistic Analysis: Style Markers and Text Collection

‘Style Markers’ or ‘indicators’ are those language features that may be part of the unique style of a writer and can be retrieved by the measurement of word frequencies, word bi-grams, word tri-grams, collocations, characters, letter bi-grams and tri-grams, syntax preferences, frequency of function words, punctuation etc. (Eder 2011). Specifically, the term ‘Word n-grams’ refers to a set of co-occurring words, letters, phonemes etc. Depending on the value of N, can occur unigrams, i.e. the individual items (words, phonemes, letters etc.), bigrams, i.e. a pair of words, trigrams, a set of three words or letters, and so forth (Wikipedia).

(32)

32

As ‘style markers’ I used word types and word tokens, in terms of most frequent words. Words can be calculated as types or as tokens: types are word-forms and tokens are occurrences of word-forms. As Kenny declares whenever we measure the occurrences of a word into a particular text, we measure the number of tokens of the occurrence of a particular type (1982, 66). Hoover also defines word types as unique sequences “of alphanumeric characters not broken by a space or by any punctuation except the hyphen or the apostrophe” (2010, 251). Thus, the term ‘word token’ signifies the occurrences of types, i.e. how many words appear on a text, no matter how often they are repeated; and the term ‘word type’ refers to the diverse forms of words in a text.

Regarding the corpus of texts, I focus on poets of at least 3 poems in the dataset by excluding 15 poets in total (particularly 2 women and 13 men), because the number of word types and tokens of these poets is not representative. As mentioned earlier before, the total number of poems for each poet is not enough in order to have reliable and/or satisfactory results. The stylistic analysis can be effective when there is enough number of words for each poet/writer. This indeed was a significant problem in this study, since the total number of poems for each poet is small or not quite satisfactory. This happened due to the fact that the corpus of poetic texts was based on online and specific anthologies, which show a significant drawback of Modern Greek poetry that is not adequately presented on the Internet. The majority of poets on my corpus are being represented with 3 to 10 poems, however the selected set of poems constitutes a representative sample of Modern Greek Poetry: they cover two consecutive centuries, 19th and 20th, and belong to diverse poetic generations. My aim was to have as many poets as possible in order to make comparisons between the style of poets, their generations and the function of names. However, all poets have been included in the analysis of named entities.

Table 1 presents all the poets of the corpus in alphabetic order (following the English alphabet). Next to the English name of each poet is given into parentheses the number of poems that I had available for analysis. I also mention the name of the poets in Greek and the year of birth and death, if applicable. The rest of the columns refer to the poetic generation that each poet can be seen to belong to, the century, and the number of word types and tokens I got for each poet.

Referenties

GERELATEERDE DOCUMENTEN

Regarding the independent variables: the level of gross savings, all forms of the capital flows and the fiscal balance is expressed as the share of GDP; private debt level

In our first meeting, many of these questions and insights became prominent through works that focused, for instance, on the “elsewheres” or

One of the research projects in the joint working group ‘The future audit firm business model’ involves audit partner performance measurement and compensation systems.... During

Diffusion parameters - mean diffusivity (MD), fractional anisotropy (FA), mean kurtosis (MK) -, perfusion parameters – mean relative regional cerebral blood volume (mean rrCBV),

This review fo- cuses on the problems associated with this inte- gration, which are (1) efficient access to and exchange of microarray data, (2) validation and comparison of data

ENERGIA Gender Mainstreaming in Energy Projects.  ENERGIA is international network

During subsequent growth of the individual layers in the superlattice structure, the RHEED oscillations were used to monitor the precise growth on the unit cell

The rationales provided to study specific issues commonly refer to the psychological antecedents and implications of moral behavior and thus are seen to capture “the psychology of