British Quotative Use on Social Media Platform, Twitter

(1)

British

Quotative

Use on Social

Media

Platform,

Twitter

A thesis submitted to the Faculty of Humanities in partial fulfilment of the requirements of the degree Master of Arts in General Linguistics at University of Amsterdam

Sarah Maclean-Morris

10848452

Supervisor: Ingrid van Alphen (UvA)

30

th

_{June 2015}

(2)

Acknowledgments

I would like to express my acknowledgements and sincere appreciation to all who have supported me during the researching and writing and completion of my thesis. First of all, I would like to thank Ingrid van Alphen for agreeing to supervise my thesis after I took an interest in quotatives during her Sociolinguistic class. Without your constructive and candid feedback, as well as your support this completed project would not have been possible. Also to Silke Hamann, not only as my second reader but for your support and guidance during my degree, especially in the first few weeks as a Master’s student here at the UvA. I owe thanks to my mum, Mary, for proofing my drafts as well as coding all 319 of my tweets as judge, whilst writing a dissertation of her own. Without your continued encouragement and indispensable advice I might not have made it to this position. To Kyle, for not only coding my corpus, but for also being my rock during my time here in Amsterdam, pursuing the same academic passion, and able to keep me laughing during the hardest moments. Finally to Michiel and to my close friends and family for their love and encouragement, and infinite belief in my abilities during my study, even when I doubted myself.

(3)

Abstract

15 years have passed since Darcy DiNucci first coined the term Web 2.0, referring to the change in content creator, as well as interactivity between internet users. Today in 2015, the internet is awash with content sharing, content creation and social media platforms, no longer accessed from simply a computer, but through new technology such as smart phones and tablets. This thesis looks at the use of quotatives in online social media platform Twitter, a self-proclaimed microblogging site, in order to identify differences between quotative use in speech and in a written Computer Mediated Communication (CMC) environment. It focuses on ways of reporting direct speech by UK Twitter users, from four British cities, addressing issues that have not yet been examined on a CMC social media platform. Is there a relationship between quotative use and geographical location within the UK? Is there a relationship between the semantic topic of a twitter message and quotative frequency and distribution? Finally, is there a relationship between Twitter’s 140 character restriction and the frequency of zero quotatives? Cross-correlation tables are calculated for both geographical location and topic, which show that whilst there seems to be no significance between city and quotative variant within the UK, the topic of a Twitter message and the quotative used, seem to be related. Zero quotatives are the most prevalent quotative variant in the Twitter corpus analysed, differing from previous findings in speech, which normally show a preference for a lexicalised variant. A short qualitative analysis regarding several variants including the zero construction help frame the quotative in the environment of its Tweet.

(4)

1. Introduction

1.1 Quotatives and Social Media

McCarthy wrote in his book Spoken language and applied linguistics that “hardly any stretch of casual conversational data is without reports of prior speech” (1998: 150). Indeed, direct quotation of speech, thought and activity is a dynamic domain within linguistics. However as more of the world plugs itself into a global router, and becomes connected to the internet, the phenomenon may no longer be limited to that of spoken conversation.

Quotatives, described as ‘dialogue introducers’ by Yule (1998), find their way into conversations, reporting the speech of everyone from friends and family to celebrities, notable television figures, as well as self.

(1) Mr Blazer says: "Beginning in or around 2004 and continuing through 2011, I and others on the Fifa executive committee agreed to accept bribes in

conjunction with the selection of South Africa as the host nation for the 2010 World Cup." (BBC News, 2015)

(2) So I thought, Ah, look at him there, the big, smug, Range Rover driver, I'm not letting him out. […] then I thought, Ah, no, I'll let him out, it'll be – and mark me closely here – it'll be good karma. (Keyes 2014)

(3) I’m like “I thought you were telling me to shut” she goes “shut up, shut up, shut up that means I like what you’re saying keep talking to me.” (Barbieri 2005: 247)

(4) An’ I came home from work one day an’ there’s his collar open with his tag layin’ there an’ my dog is gone. An’ I’m goin’, “God! I hope they figure out he’s deaf because if you turn him loose you can’t call him.” (Cukor-Avila 2002: 7)

(5) This is Jane to me the other day, ‘your kitten off my floor and see if it lands on <laughing> its feet’. (B1327075 _{as cited by Fox 2012: 246 from COLT)}

(6) A: He’s gonna storm out of there Ø ‘you bastard Peter! You bastard. What, You bastard Peter, you bastard’. (COB132503/179-181 as cited by Palacious Martínez 2013: 447)

(7)

since research began, with newer constructions such as ‘like’ continuing to assume numerous functions outside of the quotative structure. These newer constructions seem to have emerged geographically independent of one another, yet over time are now reported in areas far from their original source, such as the usage of ‘like’ as an English quotative, which is believed to have originated in North America for example as a quotative (Buchstaller 2014: 90). Exposure to North American media, as well as communication, is believed to facilitate this quotative spread. However exposure types are expanding and old media such as television programmes and music are no longer simply the only platforms that a worldwide audience can access. The increasing accessibility to an internet connection, whether through a computer or a smartphone, has brought online communication to the masses, facilitating the continuing emergence and development of social media platforms. These platforms belong to a “group of Internet-based applications that build on the ideological and technological foundations of Web 2.0, which enables the creation and exchange of user-generated content” (Kaplan and Haenlein 2010: 69). These platforms also allows for the projection and sharing of identities and societal norms whether they are expressed lexically or pragmatically.

Kaplan and Haenlein (2010) classify social media by social presence and self-presentation/self-disclosure in the following table.

Figure 1: Classification of Social Media by social presence/media richness and self-presentation/self-disclosure

(Kaplan and Haenlein 2010: 62)

Despite the varieties of social media listed above, this paper will look at the appearance and usage of quotatives on the social networking site Twitter, a self-proclaimed microblogging platform, which according to the table requires a high level

(8)

of self-presentation. In the table above, Twitter fits in between low and medium social presence omitting many details that can be disclosed by Facebook users, allowing for a little more anonymity. However, its interactivity with an immediate audience, as well as the dynamic change in trending topics and subjects, display a greater ‘social

presence’ than a blog written on a single domain, that is not as easy to find as a tweet on a social media platform like Twitter.

Taking the results of quotative use in spoken British English, this study is curious to determine the frequency of quotatives used by UK users on Twitter, and if the

frequency by quotative variant represents the results found in spoken corpora.

1.2 Importance of this study

As of 2015, no work has been carried out regarding the evidence of direct speech markers in online communication. Whilst numerous sociolinguistic studies have

highlighted the use of quotatives in different varieties of English, particularly new constructions in spoken conversation (Blyth et al. 1990, Romaine & Lange 1991, Buchstaller 2011), their appearance in online social media is yet to be analysed. Maclean-Morris (2015) carried out a social attitudes test on UK residents regarding their own use and perception of quotatives, both off and online, which functioned as a precursor to this research. Respondents reported an awareness of their presence online, either used by themselves or others. This thesis therefore focuses on the use of old and new quotatives in four chosen UK cities’ posts on social media website Twitter. Unlike previous studies, age and gender will not be variables considered within the analysis. As Twitter does not disclose its user’s ages or gender, this study will focus on the relevance of the geographical location of users as well as the topic of the tweet; its semantic environment. Taken as a starting point, this thesis hopes to highlight the usage of quotatives on social media in relation to a previous oral findings, and open a discussion surrounding their use, paving the way for future research.

1.3 Outline of thesis

Chapter two provides a comprehensive review of the theoretical background supporting this study, explaining the notion of quotatives and presenting previous research. It will also introduce the social media platform that provided the study with its

(9)

final corpus, microblogging site Twitter. Finally, the chapter will introduce the paper’s research questions as well as predicted hypotheses.

Chapter three outlines the methodology used to collect the data, introduces the chosen geographical locations, and demographics, as well as outlining important limitations that should be taken into account.

Chapter four presents and discusses the results of the study both quantitatively by frequency and distribution in regards to location and topic, and qualitatively by the context surrounding a token.

Finally chapter five provides the conclusions of the study, a short discussion, limitations, as well as a basis of ideas for future research.

(10)

2.0 Theoretical Background

This chapter will firstly introduce previous literature and concerns regarding reported speech and thought to quotation. This will lead onto the introduction of quotatives, both old and new constructions, with reference to previous literature and will also draw upon the data source of my study, Twitter. Research questions and preliminary hypotheses will then be proposed and discussed.

A plethora of work has been carried out on the use of quotatives in different areas of the United Kingdom (UK hereafter), and this is only increasing. However, with the growth of social media in computer-mediated communication (CMC), there has yet to be work carried out into how quotatives are used on this platform.

2.1 Reported Speech and Thought

2.1.1 Speech

Previous traditional accounts of reported speech prefer to make a distinction between direct and indirect speech, arguing that the relationship between the reported content and the current ‘report’ differs. Consider the following:

(7) a. Stephen said ‘I can’t make the party’.

b. Stephen said that he couldn’t make the party.

According to Cameron’s (1998: 51) approach, direct speech ‘re-enacts’ the exact words of the original speaker, whereas indirect speech describes and expresses the content of the utterance, although not always in its original form. Therefore considering these definitions, (7)a would be an example of direct speech and (7)b one of indirect

speech. Although this seems rather straight forward, the boundary is not always as clear

between the two categories, especially within spontaneous conversation.

Buchstaller (2014: 37) states that “quoted voice is embedded with a chunk of speech that is produced by the voice of the narrator”. Whereas in the case of indirect speech, the utterance is that of the reporter, who uses themself as the ‘spatiotemporal point of reference’, direct speech is still the speech of another and therefore the reporter adopts the ‘deictic orientation’ of the person they are quoting. This will be further elaborated on in subsection 2.1.1.1.

(11)

Tannen (1986) prefers to refer to reported speech as constructed dialogue, in light of the idea that although the general recollection and therefore meaning of the utterance is accurate, how it is worded is normally more of an approximation. In some instances even, utterances reported in direct speech have not occurred and may never occur. Consider the following example.

(8) She goes, “Mum wants to talk to you.” It’s LIKE, “Hah, hah. You’re about to get in trouble.” (Romaine & Lange 1991: 230)

Like in this example implies that the speaker might not be quoting word-for-word the

original utterance and interestingly the third-person neuter ‘it’ has not only a referential but an existential function in the construction. This will be discussed further in 2.4.3. There are several different schools of thought in making the direct and indirect distinction and these will be presented over the following paragraphs.

2.1.1.1 Deictic Perspective

Buchstaller (2014: 56) states that some authors make the distinction between indirect and direct speech based on the deictic orientation of the quotation. Therefore, a direct quotation has a deictic orientation, personally, spatially and temporally towards the experiencer of the reported utterance. The perspective of the utterance is therefore of the experiencer and not the reporter as is in indirect speech. To illustrate, (7)a takes the perspective of the experiencer Stephen, which is shown personally through the use of first-person pronoun I, spatially ‘here’ and temporally through the contrast of the present tense of the utterance with the reported past verbum dicendi. (7)b on the other hand takes the perspective of the reporter, using the third-person pronoun he, the ‘back-shift’ in the verb tense from ‘can’t’ to ‘couldn’t’ and therefore a spatial deixis of ‘there’. Again, as previously mentioned, there remains fluidity between the two speech types in this argument caused by ambiguity of temporal and personal orientation (Buchstaller 2010: 56). In terms of tense, a present verb within a quotation will not trigger a ‘temporal backshift’ in indirect speech, and combined with first person, speaker perspective alone cannot proficiently disambiguate a direct and indirect form.

(12)

(9) does seem to lean towards an indirect interpretation, - I tell her that I can’t make the

party- with a viable and productive that insertion just before the quotation. However the

utterance could just as easily fit into a direct speech structure resulting in, I tell her, “I can’t make the party”.

Other problems such as the case of the conversational historical present (CHP) pose further problems for deictic distinction and therefore indirect and direct speech distinction.

She goes “so just wait til I get back.” And we’re like “well maybe we will.” And she’s all “whatever.”

(Norrick 2007: 133)

CHP, popular in narrative environments, is indicative by past tense activities that are encoded with non-past tense morphology.

2.1.1.2 Discourse-Pragmatic Perspective

Coulmas (1985: 43) takes a discourse-pragmatic approach to the distinction, observing that while “direct speech is expressive, indirect speech is descriptive”. He believes that there are certain elements of direct speech which are not evident in indirect speech, including interjections, swear words, differing intonation, the presence of imperatives and interrogatives, as well as false starts, self-correction and repetition (1985: 48). Indeed, even Plato notes a fundamental difference between mimesis, the direct representation and diegesis, the summarized output. Clark and Gerrig (1990) also appear to agree with the expressive and descriptive distinction of direct and indirect speech. The former displays to an audience not only the original words, but facial expressions, prosody and actions. As a result, the reporter ‘does not say what the content of the quote is (…), instead he does something that enables the hearer to SEE for himself what it is’ (Clark and Gerrig 1990: 802) (as cited from Buchstaller 2014). Indirect quotation, on the other hand, according to these arguments, merely states what the content of the quote is, devoid of mimetic effects such as those expressed by Coulmas.

(13)

Again however, the distinction between direct and indirect speech$ is not concrete. Not all direct speech has to contain mimetic elements, and likewise, indirect speech is not simply limited to just diegesis. Performative elements for example can also be found in the latter.

In conclusion, according to Buchstaller (2014: 64), “it is difficult if not

impossible to come up with hard and fast criteria for distinguishing direct and indirect quotes”. This study focuses only on direct speech and thought, excluding the ambiguity of defining indirect speech. However, the next issue is that of reporting thought.

2.1.2 Reported Thought

Unlike direct and indirect speech, which although cannot be precisely defined, it does allow a rough distinction, thought is a little harder to classify.

Quotatives can be used to mark evidentiality, which is ‘the way in which the information was acquired (Aikhenvald 2004: 3), via visual, non-visual and other sources for example (Buchstaller 2011: 63). However, an issue arises as reported speech can also mark ‘stance’, which linguistically express attitude, personal feelings and judgements of the proposition in question (Biber et al. 1999: 966). Indirect speech, stance and modality in reported thought evaluates and describes the content of a message through potentially subjective means, whereas evidential meaning of reported thought informs the listener of the reported ‘thinker’ and reported thought. Consider the following example from the Glasgow section of the corpus:

(10) (147)1_{Soph I was like thinking see if you died I'd get Beyoncé to your}

funeral then have an after party at mine how class would that event be.

Without having access to a speaker’s mind, how can it be possible to tell if a speaker is reporting prior thoughts to an interlocutor (in this instance direct quotation), or if the thought has occurred at the present moment, alongside the spoken utterance? If the latter is the case, then ‘think’ is a stance marker, as it doesn’t ‘frame the re-production of previously occurring material’ (Buchstaller 2014: 70).

Therefore in the case of reported thought, each occurrence has to be considered separately, as to whether a verb of thought is framing previous material, and therefore acting as a quotative, or if it is marking stance and modality.

(14)

All this previous literature regarding reported speech and thought is taken into consideration when collecting and analysing the data of this study, and helped to clear some ambiguity within my corpus. A final issue to consider is that quotation is not merely limited to just vocal demonstrations as has been discussed in this section. Blackwell et al (2015) looked at quotatives in multimodality, notably vocal versus bodily demonstrations. They found that there was a strong correlation in using quotations with both vocal and bodily demonstration simultaneously, with new quotatives be like and go more associated with performance and gesture than say (Blackwell et al. 2015: 6). Performance and gesture is not as easy to convey in social media platforms such as Twitter with a text-bias. However, it is possible to attach pictures, and media links within the text, which provides a multimodal output to convey a speaker’s intentions, as well as providing further context.

Semiological characters such as emojis and emoticons are also popular in online social media (Maclean-Morris B 2015: 13), functioning as a pragmatic device,

signalling paralinguistic cues normally present in face-to-face spoken discourse. (11) Quand tu revises et que tu penses avoir une bonne note, et la tu vois que ta

4\/20 tes comme sa 2

(French Twitter Corpus (5), Maclean-Morris B 2015)

Further examples are given within the analysis of a potential multimodal quotative construction. The next section introduces the notion of quotatives.

2.2 Introduction to Quotatives

It was only just over thirty years ago (Butters 1980, Romaine and Lange 1991), that work on quotatives started to make interesting and exciting developments. The standard verba dicendi say, think and told for example, have been pushed out of the limelight, allowing for new ways of introducing constructed direct speech and thought.

2 When you revise and you think you’ve written a great test, and then you see your 4/20, you’re like that

(15)

In this paper it is the verba dicendi that I define as old, canonical quotatives, with newer innovations such as be like, go, this is + NP as new non-canonical quotatives.

New quotatives like and go are often acknowledged as the first recognized innovative quotative constructions; the work of Butters (1980, 1982) the first to bring these innovations to the fore. However, in 1970’s Belfast, Milroy and Milroy (1977) discovered previously unreported constructions here was I and here’s me when looking at language in an urban setting. This suggests that although innovative quotative forms have only been identified over the past 30 years, there is every possibility that these constructions have been around and used long before the date they’re attributed to. Standard quotatives have gradually been pushed aside due to a global emergence of new quotatives (Buchstaller and van Alphen 2012), a new phenomenon evident in numerous languages. Although studies tend to be English-centric, innovative quotatives have not been restricted simply to the English language. There seem to be parallels with constructions in English and both related and non-related languages. For example, consider the English comparative like, a new quotative that has taken on the role of reporting direct speech and thought. The Dutch preposition van follows a similar pattern, although also allows for indirect speech reporting. .

(12) Toen had ik zoiets van “ja daar wil ik ook aan meedoen”. Then had I something van “yes there want I also to participate”.

[Then I felt like “yes I also want to participate in that”] (Foolen et al. 2006: 139)

In this Finnish example, both a comparative and a deictic pronoun (in bold) are combined to produce an innovative construction

(13) Matti oli niinku et "en voi uskoa et opiskelen perjantai-iltana" [Matt was like that “I can’t believe I’m studying on a Friday night”] (Example and translation kindly provided by Jenni Karvonen 2015)

Therefore, English and the development of innovative forms such as be like should not be considered purely an autonomous and isolated occurrence, but rather compared as part of wider cross-linguistic trends that are shared by many languages, which seem to utilise similar lexical source material for their new quotatives. Academics have

(16)

identified a list of four semantic sources that new quotatives emerge from. These are as follows, and each contains examples from several languages.3

Comparative: Dutch van ‘like’, English be like, French comme ‘like’, Hebrew kaze ‘like + this’, Polish typu ‘type’, Swedish typ ‘type’,

Demonstrative deictic: Dutch zo ‘so’, London English this/here is NP, German so ‘so’, Croation ono ‘that’, Russian takoij ‘such + like +this/that’, Spanish asi ‘so’,

Quantifiers: Danish bare ‘just, only’, Dutch helemaal ‘all’, English all, ‘Finnish

vaa(n),

Generic verbs of motion and action: English go, Dutch komen ‘to come’, Greek kano ‘do’

(Buchstaller and Van Alphen 2012)

An important issue to address is how the gap between speaker awareness of linguistic innovations such as be like and the adoption into a person’s quotative

vocabulary bridged? Innovative quotative this is + NP was first acknowledged in 2007 (Cheshire and Fox), used by younger inner London speakers. It was likely in use long before this date, and might be familiar already to many people. However, how long will it take, if indeed it does, to become part of a listener’s repertoire? Although media is often cited as a factor, it could also have to do with the community a speaker identifies with. The adoption of a newer quotative within a community could therefore be

dependent on its “adaptability” and ease at assimilating into societal norms and culture of community.

However, in the first instance, this paper will investigate the preference for and popularity of quotative variants within the UK with respect to previous literature.

2.3 Current Frequency by Spoken Quotative Type in the United Kingdom

The results of a social attitudes questionnaire (Maclean-Morris 2015) from 51 UK participants regarding new quotatives found that 94 per cent (48) of the participants

3 See Buchstaller and van Alphen, 2012, p.XIV for more typological examples within these four categories.

(17)

were familiar with New Quotatives. 72 per cent then confirmed that they used new quotatives themselves, although it is important to note that the majority of respondents were aged 35 and under. However age is not always a determining factor when using new constructions. Despite pejorative attitudes to the youth language of today and its adoption of new linguistic expressions and phenomena, usually within the media, many of the accusers are probably not without guilt. Remember that older speakers in 2015 were young too at one point and would have likely picked up, or at least experienced the slang of their day during their own adolescence. Once a speaker leaves adolescence ‘linguistic habits tend to remain relatively stable’ (Buchstaller 2014: 205). Lexical innovations and non-standard language of a person’s youth will be at least remembered and may become part of the speaker’s standard, unmarked lexicon. Cheshire & Fox (2007) found 17 (4.6 per cent) instances of the new quotative go used by elderly residents in inner London within their corpus of speech (see figure 2).

In an ideal world, it would be useful to have access to a table that displayed the proportion of quotative types and their frequency used within the UK as a whole in spoken discourse, allowing identification of regional variation in preference and usage. Although this is not possible, several studies that have focused on certain areas and certain groups of people have produced smaller scale and more restrictive results. Although beneficial, there are still parts of the UK that have been excluded in quotative research such as Wales and Northern Ireland. It is however much easier, quicker and cost-effective to collect online data filtered by the location of the content writer as in this study, and allows inclusion of participant data which in traditional field methods would not otherwise be accessible. The Newcastle Electronic Corpus of Tyneside English 2 (NECTE2) corpus holds data of younger (17-34) and older (35+) speakers and was consulted by Buchstaller to look at the quotative choices of both these age groups in the local area. From this data collected between 2007 and 2009, she found that 58 per cent of older speakers used say in comparison with 17 per cent of younger speakers (Buchstaller 2014: 166). Be like was again the most used quotative of younger speakers, attributed to 40 per cent of total occurrences in this age group, with only three per cent in older speakers.

Consider the table below (as taken from Buchstaller 2014: 119) which refers to quotative forms in the speech of a group of ten young people aged between 19 and 21 who are middle class, university students and have never lived outside of Newcastle, UK.

(18)

Table 1: Distribution of “Overall occurrence of quotative forms amongst young British speakers” Frequency % Like 171 43.3 Say 75 19.0 Zero 74 18.7 Other 29 7.3 Go 26 6.6 Think 20 5.1 Total 395 100.0

From the results, the participants used be like the most frequently, with 171

occurrences, almost half of the total corpus, followed by canonical, old quotative say in second place with 75 occurrences. However even then, there is significant difference between these two frequencies.

In the south of England, Cheshire and Fox (2007) identified quotative distribution and frequency of speakers from inner and outer London. The following table (fig. 2), as taken from their presentation at the 4th_{International Conference on Language Variation}

in Europe illustrates the frequency of certain quotatives within these speakers (Cheshire and Fox 2007).

Table 2: Geographical “Distribution of Quotatives”

Inner London Elderly % (n) Outer London Elderly % (n) Inner London Adolescents % (n) Outer London Adolescents % (n) Say 70.8 (261) 73.5 (200) 27.4 (351) 31.2 (328) Think 4.1 (15) 10.3 (28) 12.8 (164) 6.1 (64) Go 4.6 (17) 0.4 (1) 11.7 (150) 26.4 (279) Zero 18.9 (70) 12.9 (35) 15.1 (193) 12.3 (129) Be like - - 24.4 (313) 20.8 (219) This is + (S) - - 4.8 (61) -Tell - - 1.9 (24) -Others 1.6 (6) 2.9 (8) 2.0 (26) 3.2 (33) Total N 100 (370) 100 (272) 100 (1282) 100 (1052)

Interestingly, the use of quotative type is not merely limited to age or gender. In Cheshire and Fox’s (2007) results, the use of quotative tell is limited to just under two per cent of total occurrences in Inner London Adolescents. However, this quotative appears much more frequently in a past-tense variant form ‘telt’ (told) further north of

(19)

the country. The Scottish non-canonical form telt can be used in front of both direct and indirect speech with more flexibility, although to my knowledge this variant has not yet been largely investigated as a quotative. Its usage is shown below in an extract from a tweet, not part of my corpus.

(14) A lecturer at ma uni telt me tae speak in English. His wee face when I telt him “When in Rome, when in Rome. YOU learn tae talk right" Eejit

[A lecturer at my uni told me to speak in English. His face when I told him “When in Rome, when in Rome. YOU learn to talk right” Idiot.] (Twitter 2015) Overall, from figures 1 and 2, say and be like in spoken contexts battle for first and second place dependent on age, and location. Say dominates the quotative system of older speakers, something that is echoed in other work (Buchstaller and D’Arcy 2009), whereas there is more flexibility and variation in the quotative system of the younger speaker. For the results of the older speakers, unframed zero quotatives take second place. However it is pushed further down in the speech of adolescents, taking third place in Buchstaller’s results (fig. 1) collected between 2007 and 2008, and third and fourth place in the speech of inner and outer London adolescents respectively (fig. 2) collected between 2005 and 2006.

In summary, from previous work on quotatives in spoken British English, the most frequent are say and zero constructions as well as like in the repertoire of younger speakers. With these results in mind, this study questions if these preferences are also reflected in written quotative forms in online social media platform Twitter. The next section will introduce a more in-depth description of quotatives and their usage.

2.4 Types of Quotatives

This section provides further description, social attitudes and history of the quotatives investigated in this study as well as a ‘linguistic profile’ of a speaker based on previous results.

2.4.1 Say and Think

As traditional canonical quotatives, both say and think have nowhere near the wealth of literature that newer innovative quotatives have, providing the most neutral

(20)

way of reporting speech and thought. Therefore this section will provide a brief overview of main observations and preferred contexts for their use.

As an original verbum dicendi to introduce both direct and indirect speech, say can occur before (I said) and after (said I) the quoted utterance, providing neutral way of reporting utterances. As mentioned in 2.2, both the use of say and think in both direct and indirect speech can cause confusion in the perspective of the predicate.

(15) a. John said, “I was responsible for Lauren’s failure”. b. John said I was responsible for Lauren’s failure. (from Blyth, Recktenwald and Wang 1990: 216)

This pair of examples is ambiguous, especially with the omission of indirect speech marker that. However other issues with a direct and indirect comparison will be highlighted in this section, after an overview of previous literature.

From the combined results of several studies it appears that say is the most popular lexicalised quotative used by older people (Cheshire and Fox, 2007, Buchstaller 2014: 166), losing out to be like and other innovative constructions in the younger generation. From Buchstaller’s results from Tyneside data (2014) say is roughly favoured in both first person and third person contexts by both young and old alike in the 1990s and 2000s

Think on the other hand, introduces only the reported speaker’s internal mental

state or dialogue. It occurs most frequently in the present tense (Buchstaller 2014: 163), and as is expected is preferred in first person contexts. It is sometimes hard to

distinguish between think as a marker of evidentiality, and therefore taking on the function of quotative, and stance, which marks attitude and opinion, as the listener does not have access into the speaker’s mind. It must be mentioned however, that think is not the only quotative used to report thoughts.

Buchstaller (2014) analysed data collected from young US and UK speakers in order to determine the effect of reported speech versus thought and quotative choice. Whereas in both geographical areas, go and be all is preferred in reporting speech, be

like seems to statistically favour reported thought in UK young speakers. The following

(21)

between most frequent quotative verbs and the content of a quote amongst younger UK speakers, on the basis of the following question:

“Out of all quotative choices, what are the odds that be like encodes reported speech and what are the odds that it encodes reported thought?” (Buchstaller 2014: 127)

Table 3: Correlation between most frequent verbs and content of quote amongst younger UK speakers- calculated as a fraction out of all quotative variants.

Like Zero Go Say Other SUM

N % N % N % N % N % N % Speec h 10 5 39 53 19. 6 2 3 8.5 64 23. 7 25 9.3 270 10 0 Thoug ht 66 52. 8 21 16. 8 3 2.4 11 8.8 24 19. 2 125 10 0

Each correlation is calculated using the sum of all quotative variants as the

denominator. Although think is not defined in this table, we can presume it is included in the Other category, which makes up about 19 per cent of total verbs used in reporting thought. However it is the frequency of be like, more than half in reported thought contexts which is the most striking finding of this table.

2.4.2 Go

Butters (1980) first noted the role of go in introducing direct speech when looking at the language of young male American adolescents in 1973 (Butters 1980, 305). As a quotative verb of motion and action, at that time the closest usage in the Oxford English Dictionary (OED) was in conjunction with “imitative interjections or verb-stems used adverbially, eg. to go bang, clatter, cluck, crack” (as cited in Butters 1980). Overall, quotatives are reiterated in Blackwell et al. (2015: 2) as being used for non-speech sounds, such as ‘the pre-linguistic babbling of an infant’.

(16) The babies were going, “Ga ga ga da da.”

According to a study by Buchstaller (2002) comparing the degree of hypothetically between be like and go, she found that go was favoured to be like when expressing direct speech (realis), 45 per cent compared to 22 per cent respectively. However was hardly used when introducing internal dialogue and thought.

(22)

In the social attitudes study conducted by Blyth et al. (1991: 224), U.S

Respondents described users of go as ‘jocks’ or ‘blue-collar’. 17 years later, Buchstaller (2008) found no effects of social economic status on the usage of go in the US

suggesting an effect of chronological time. However there was an association in the U.K with go and middle class users (Buchstaller 2008). Furthermore, in British English it is predominantly favoured in first person contexts (Tagliamonte and Hudson 1999), whereas in US English (Blyth et al 1990, Singler 2001, Barbieri 2005) as well as Australian and Canadian English (Winter, 2002, and Tagliamonte and Hudson, 1999 respectively) favoured in third person use. Interestingly, in the U.K, it seems that its frequency could be threatened by the popularity of other new quotatives. In Maclean-Morris’ (2015) social attitudes questionnaire, one question allowed participants aged 18-73 (34 female, 15 male), to choose one of four ways of expressing an utterance using a quotative in the past tense, including be like, go, and two variants of the canonical quotative say (said, was saying). From 51 participants, not a single one selected the option to use go, with six choosing

be like and the other 45 selecting one of two variants of say. This raises the question of

how go is perceived as a quotative. Perhaps it has become standardised in British English and as a result no longer appears as overtly acknowledged as an innovative quotative. Maybe the emergence of newer quotative constructions have pushed go out of the picture when a speaker chooses how to report direct speech.

2.4.3 Be like

(17) When I see girls with the same attitude as me, I’m like “let’s just be bffs already” lol.

(Maclean-Morris 2015a)

The use of like was first acknowledged by Butters (1982) in an editor note to Schourup, with Blyth et al. (1990: 215) later stating that unlike previous new quotative

go, like was not simply restricted to direct speech, but could also be used to express

inner monologue, something that was attributed to the canonical quotative think. Romaine and Lange noted in their 1991 paper that looking at their corpus of teenage conversations (1985 and 1988), there was no apparent change in the referential meaning of a sentence when be like was used in place of say (1991: 227). They

(23)

suggested then that there had been grammaticalization of like as quotative complementizer.

Unlike Romaine and Lange’s unidirectional approach, Buchstaller (2002)

proposes that a more flexible model that accounts for the ambiguity and overlap within the uses of grammaticalized like is required. Like has the primary meaning of similarity which underlies the notions of comparison and approximation (Buchstaller 2002: 3). Even in taking on new syntactic function, as a discourse marker or focuser for example, the underlying semantic meaning is still that of similarity. Therefore when used before reported speech or thought, the speaker uses like to index their relationship to the quote and the degree of its hypotheticality, if it is word-for-word related to the original utterance (realis), if the utterance occurs only for the first time in the current communicative situation (situational) or just used to signify the mental state of the speaker (hypothetical) (Buchstaller, 2002: 5). This ties in with Romaine and Langes’ (1991) proposition that be like in reported speech is a way in which to reduce the speaker’s commitment to the reported utterance; The hearer should then be aware that the quoted utterance may be an approximation or may have never occurred at all.

Speaker Profile

Although it can be found in different varieties of English, the profile of the type of speaker using this quotative vary cross-culturally. U.S respondents of a study by Blyth et al. (1990: 22) attributed the use of be like to middle-class teenage girls, with relatively strong attitudes, ‘vacuous’ and ‘airheaded’ just two of responses. Whilst much research has attributed the use of be like to female speakers, there are conflicting ideas regarding the effect of sex on this quotative. Ferrara and Bell (1995) found that over a period of three years, there was a neutralization of the sex effect. At the beginning of the study in 1992 women used be like twice as frequently, however at the end, both men and women used this quotative at ‘roughly equal rates’ (Buchstaller 2014: 318).

However, Buchstaller and D’Arcy (2009) found a significant effect of gender in their ‘English English’ dataset, in favour of male usage of be like, with the same, albeit a non-significant pattern in the ‘New Zealand English’ dataset. It was only the ‘American English’ set that showed females favouring be like, and even then this result was

(24)

Recorded as having a presence in the UK in the early 1990s (Tagliamonte and Hudson 1999), in 1993, be like was noted in the Corpus of London Teenage Language (COLT) by Andersen (1996). Several years later in 1997, further north of the country, Macaulay (2001) reported a 14 per cent usage of be like amongst other quotatives amongst Glaswegian adolescents.

Durham et al. (2012) compared the change in ‘social and linguistic effects on be

like usage and acceptability’ looking at samples from university students in both 1996

(Tagliamonte and Hudson 1999) and 2006 in York. They found an increase in the use of quotative be like from 18 per cent in 1996 to 68 per cent in 2006 and therefore

becoming the most frequently used quotative by the undergraduate students studied (2012: 324).

Through the use of a social attitudes test, Dailey-O’Cain (2002: 75) found that in the United States (U.S), the use of quotative be like was mostly associated with a young speaker. Furthermore, the use of be like triggered only positive associations with

solidarity traits.

Similarly, Buchstaller (2001, 2003, 2004, 2005, 2006, 2007, 2009, 2010, 2011 and 2012) has carried out substantial work on new quotatives, in particular their usage in the United Kingdom. Through the use of a social attitudes survey (2006), she examined perceptions and opinions of quotatives be like and go of British respondents in comparison to the results of similar studies in North America. Unlike Dailey-O’Cain, she found that in the U.K, be like triggers both positive and negative associations with social attractiveness judgements (2006: 372). Buchstaller (2006) also found that British informants of various ages all seem to have a similar attitude to be like, and that the age of the informant did not have an effect on how they would perceive the use of a new quotative. Furthermore she discovered that British speakers did not overwhelmingly associate be like as a quotative from the U.S.

Many studies (Singler 2001, Barbieri 2005, Buchstaller and D’Arcy 2009, Haddican et al. 2012) conclude that be like is preferred by first-person subjects. However this seems to be dependent on the different person subjects included in the study. Unlike other quotatives,be like is favoured by the third-person neuter it , which when included in person comparison, can skew results due to its high preference for like as opposed to other constructions. Some studies therefore omit it completely from analysis (Barbieri 2005, Haddican et al. 2012), partially (Buchstaller and D’Arcy 2009),

(25)

or include it (Buchstaller 2008). Furthermore, when it is included there are differences in cross-cultural frequency; in the US be like occurred more frequently with third-person neuter pronoun it than in the UK, where the first-third-person subject was favoured (Buchstaller 2004). This existential quotative form it is argued to function differently to forms with animate subjects (Schourup 1985), referring to thought within the speaker’s mind rather than spoken direct speech.

Due to the ambiguity and lack of consistency in previous research, this study will also omit the third person neuter variant from analysis to avoid a skew towards

quotative be like in comparison to the other studied quotatives.

2.4.4 This is + NP

(18) This is me, “I can’t believe you told her!”

The discovery of this new quotative is attributed to Cheshire and Fox (2007, Presentation) who presented their findings of its usage in the speech of young adults in inner London. They took a sub sample of 53 speakers from the London English Corpus who were either elderly (70+ years-old) or adolescents (16-19-years-old), either male or female and who lived in either inner or outer London.

This is + NP made up nearly five per cent of quotatives used among adolescents

in inner London. The subject is non-restrictive, although Cheshire and Fox found that users tended to be female, and the quotative was overall strongly favoured in first person contexts. Unlike the other new quotatives in this study, this is + NP is favoured predominantly in present tense, with no examples found of historic past.

They state that amongst London adolescents, alongside go and be like, this is +

NP is one of the three main competitors to canonical quotative say. As such a recent

reported quotative, a detailed speaker profile, social attitudes to and perception of the use of the construction are yet to be found.

2.4.5 Zero Quotative

(19) My mum was so angry, Ø “why didn’t you shut the door?!”

Zero quotatives can be defined as direct speech constructions introduced with no verb or indication of speaker identity. They are heavily supported by prosodic queues

(26)

such as change in voice pitch or timbre to represent another’s voice (Buchstaller 2014: 35).

Zero quotatives have been recorded in numerous studies (Tagliamonte and Hudson 1999, Cheshire et al., 2011, Buchstaller and Van Alphen 2012, Buchstaller and D’Arcy 2009), as well as a feature of many languages. However, they have not received as much attention as other quotatives in terms of its discourse functions. This is

especially interesting as they are a popular quotative choice and rank high in research. In Cheshire et al. (2011)’s study of language of London teenagers, zero quotatives made up around 15 per cent of the total quotatives recorded. Mathis and Yule (1994) carried out an extensive study on zero quotatives in American woman, which highlighted its usage in ‘performing’ rather than simply ‘reporting’ the direct speech act (1994: 67), and that it could be used in combination with changes in prosody to signal turn-taking and different voices within an utterance. This importance of mimesis is echoed in Buchstaller’s diachronic quotative work on Tyneside speech (2011), which also showed an increasing preference for mimetic effects in zero constructions as opposed to none (56 per cent: 1960s/1970s, 76 per cent: 1990s 81 per cent: 2000s). Zero constructions occur mainly in reported speech rather than reported thought (Buchstaller 2014: 172). However, work focusing on zero quotatives has been fairly limited. Palacios Martínez (2013) compared the use of zero quoting in the speech of British and Spanish

adolescents, and discovered that this construction favours certain linguistic contexts such as ‘non-lexicalized and sound words’, the noise produced by cars, or animals for example as well as dynamic and dramatic reconstructions of previous events. He stated that pragmatically, the use of zero quotatives allows for a more fluid narrative, involves the interlocutor more directly and finally ‘gives more self-assurance to the narrator’ (Palacios Martínez 2013: 458).

(20) Tengo los dos discos Ø ‘andy one really really really’.

‘I’ve got two records “andy one really really really”’ (Palacios Martínez 2013: 448)

Speaker Profile

Interestingly, Buchstaller’s (2011: 71)’s findings from the 1990s Phonological Variation and Change in Contemporary Spoken English (PVC) corpus, showed that zero constructions were produced more often by working class (74 (17 per cent)

(27)

343). However, results from the 2000s NECTE2 corpus showed no difference in frequency of unreported zero forms in working class and middle class, with an

occurrence in 18 per cent of quotative variants of both class sets. Zero quotatives do not seem to be stratified by gender, with no pattern of either sex using them more or less frequently.

Within the Twitter corpus of this study, the occurrence and usage of zero

quotatives will pose many questions. Without the mimetic features found in speech, as well as the introducing lexical quotative (eg. “I’m like...”, “Thomas said…”) in other constructions, a user will have to rely on other means to indicate the reported utterance in a zero construction as well as attribute it to a reported speaker. See section 2.6 for research questions and hypotheses.

2.5 Twitter

This section introduces the source of this study’s resulting corpus,

‘microblogging’ platform, Twitter, introducing its background and its defining features, as well as the motivation in choosing such a data source.

First launched in 2006, Twitter is a platform of computer-mediated communication (CMC), with a restricted character allowance, predominant

asynchronicity as well as containing features more associated with spoken language. Messages or ‘tweets’ are publically viewable, although they appear in the newsfeeds of those who have chosen to follow the author of the message. Therefore, a user is writing not only to known friends but also to an anonymous audience of an unknown size. Each

tweet can be viewed seconds or even months after it has been posted and is limited to

140 characters which are 20 characters shorter than a standard SMS message length. As a result, users are required to be brief and concise in order to express themselves, with abbreviations such as syllabograms and truncations, as well as omissions being a frequent occurrence. Messages can also be forwarded or ‘retweeted’ by another user, and using the ‘@’ sign, a tweet can be addressed to a specific user. Hashtags’#’ enables users to search Twitter for messages regarding a certain topic, functioning as a

categorisation tool. However within the tweet it also provides contextual clues in tweets to a user’s audience (Scott 2015). The content within these hashtags can ‘trend’, a phenomenon where sometimes due to an external event, breaking news or celebrity activity for example, the hashtag is utilised significantly by thousands of users in a short

(28)

period of time. When considering that 500 million Tweets are sent per day, with 35 languages supported on the network (Twitter 2015), a multitude of data is available for analysis.

According to Ofcom (2014), Facebook is the default social media service used by the British public, followed by Twitter with three in ten social networkers using the latter. Twitter is most commonly used in the UK by users aged 25-34, at 25.4 per cent in 2014, followed closely by 18-24 aged users at 24.5 per cent (Emarketer 2014). In 2013, 80 per cent of active UK users also used Twitter on their mobile phone,

confirming its status as a multiplatform network.

This platform was chosen for many reasons. Firstly, all tweets are publically accessible by default, and although a user retains their rights to any self-created content on the website, at sign-up they grant Twitter a ‘worldwide, non-exclusive, royalty-free license […] to use, copy, reproduce, process, adapt, modify […] and distribute such content in any and all media or distribution method’ (Twitter 2015). This allowed for a large collection of tweets, and data that would not be available from a more private and closed-network medium such as Facebook. Secondly, as well as functioning as a written medium, Twitter fuses together features akin to oral culture, such as conversational practices and expressing topicality, albeit the latter through the use of hashtags. Twitter could be thought of, therefore as facilitating different groups within an online

community, establishing norms, identity and certain practises. Virtual communities such as Twitter “can support and maintain the use of a shared repertoire of practices and forms of expression” (Buchstaller 2014: 95). If this is indeed the case, quotatives will be used in accordance with an online community’s norms and expectations.

This study is intrigued therefore to see if the usage of quotatives on a written platform with a restricted character allowance shares similarities with those in a spoken context such as casual conversation.

2.6 Research Questions

This thesis is predominantly a quantitative analysis to identify the types and frequency of quotatives written in Twitter by users within the United Kingdom. The data returned from Twitter are rich in detail, including a user’s ‘time zone’, their geographical ‘location’, ‘username’, and even the number of ‘followers’ and ‘friends’.

(29)

From these results, ‘location’ is an interesting variable when analysing the use of quotatives quantitatively. It refers to the geographical location of a Twitter user, defined by a pair of longitudinal and latitudinal coordinates. As mentioned in 2.4.4, Cheshire and Fox (2007) identified the emergence of the new quotative this is + NP within inner city London. Eight years since the identification of this new quotative, this current study asks, firstly is this quotative present on social media, and if so, is it used only by those from the London area or has it spread? Furthermore, is there a relationship between the topic of a Tweet and the use of certain quotative variants? Topic can be defined as the semantic content within the tweet, and these categories are introduced later on in section 3. These ideas will be addressed by my research questions below.

1. Is quotative usage on written medium Twitter reflective of quotative use and distribution in spoken discourse?

2. Is there any preference in certain geographical cities of the United Kingdom to use a specific quotative construction in a user’s online social media

communication?

3. Is there a relationship between quotative use and topic of a twitter message? 4. Is there a relationship between Twitter’s character restriction and the non-lexical

zero quotative?

In relation to these questions, the study presupposes several notions and hypotheses towards their answers. Firstly, previous research (Maclean-Morris 2015) suggests that quotatives are used on social media. Therefore these questions are based on the premise that quotatives are present online. As previously mentioned, age and gender are not viable nor are known independent variables and unlike previous work on quotation in speech, they cannot be drawn upon when comparing written and oral quotative use. Despite this, this study presumes that quotative use on Twitter will differ from previous spoken quotative findings, due to differences in spoken and CMC

communication.

There has yet to be a cross-regional study on the distribution of spoken quotative use by geographical location in the United Kingdom. Previous research however has thrown up a few ideas; Macauley’s (2001) Glaswegian adolescent participants’ preference for go over all over variants contrasts with York’s university student preference for be like, collected in 2006 (Durham et al. 2012). Although this may indicate cross-regional differences, chronological dates must also be taken into

(30)

consideration, and the makeup of quotative use in a young, Glaswegian’s repertoire for example may have changed considerably over the span of fourteen years. Therefore, I am unsure if geographical location alone will be a determining factor in quotative choice.

Although there is no direct work on conversation topic and quotative variant preference, Barbieri (2005) looked at the frequency of spoken quotatives in three different registers in a university environment, Conversation, Service Encounters and

Office Hours. She found from these three different American English Corpora that be like dominated quotative use in Service Encounters, with respectable and stable

occurrence throughout all three corpora. Say was also the most prevalent quotative variant in both Conversation and Office Hours. These findings suggest that there does seem to be a relationship, at least between the register of spoken discourse and

quotatives; each of these different registers might show preference for certain topics not found in another. This thesis therefore hypothesises that within this current study, there will be a relationship between topic and the quotative variant used in a Tweet.

Finally, the 140 character limit on Twitter forces users to be brief but concise in their tweets. This lends itself to the presupposition that if a writer has only a strict restriction of characters, they will leave out anything considered unnecessary, or substitute a lengthier form for one more economical. Therefore this study proposes that in order to respect this character restriction, users will be more likely to use zero

quotative constructions, which do not require a lexical verb of saying and therefore less characters.

(31)

3.0 Design and Methodology

This study aims to investigate the usage of quotatives on Twitter by UK users and whether the variation within quotatives is geographically and, or topically dependent. From a methodological point of view, Twitter is ideal to use due to the public display of statues and messages. The goal of the data collection is to gather tweets using a

quotative of some kind.

In order to build a geographical image of quotative use within the United Kingdom, the study filters quotes with regards to four discontinuous locations in the country, Glasgow, London, York and Cardiff. Within these four cities, three have been picked due to their involvement in previous quotative studies. The following studies look at quotative use in young speakers. Macaulay (2001) looked at quotatives in adolescent speech in Glasgow in 1997. Out of a corpus with 242 quotative tokens, the most popular quotative was go, making up 26 per cent (63) of the quotatives used, followed by say at 24 per cent (58) and in third place, be like at 14 per cent (33). Cheshire and Fox (2007) as mentioned in 2.4.4, looked at quotative distribution in both inner and outer London. Tagliamonte and Hudson (1999) collected quotative use of university students in 1996, which showed the most common quotative was say at 31 per cent, followed by joint second go, be like and think at 18 per cent (1999: 158). By 2006, Durham et al.’s (2012) results showed a significant preference within university students for be like, with an increase of 60 per cent and the most frequent quotative within the corpus. Below is a table showing these results alongside one another. Unfortunately this study was unable to find the 2006 data available for Durham et al.’s (2012) results. Therefore the York quotative distribution is based on Tagliamonte and Hudson’s (1999) 1996 results.

(32)

Table 4: Distribution of young speaker quotative use by city

Quotative / City %

(N) Glasgow (1997) London (2005-2006) York (1996)

Say 24 (58) 29.1 (679) 31 (209) Go 26 (63) 18.4 (429) 18 (120) be like 14 (33) 22.8 (532) 18 (120) be like that 7 (17) - -go like that 5 (13) - -be 3 (8) - 2 (10) Think 2 (6) 9.8 (228) 128 (120) zero 14 (33) 13.8 (322) 10 (66) This is + (S) - 2.6 (61) -Other 5 (12) 3.6 (83) 2.6 (17) Total 100 (242) 100 (2334) 100 (665)

As can be seen from the table, the fourth city, Cardiff, is absent from research examining quotatives and as a result it is insightful to include it. Furthermore as the closest city to London out of the remaining three, there may be similarities in quotative usage to that of the capital, although this is purely speculative. It is also important to consider the social makeup of each city within this study. I have put together a table containing metadata of population by age, density and occupation from several British surveys for each of the four cities. The occupation percentages from the UK 2011 census table KS611UK seem to suggest that a person was able to select several options in regards to occupation such as having a part-time job whilst still a full-time student.

Table 5: Sociolinguistic Factors by City

City York (2012 ) London(2012) Glasgo w (2011) Cardif(2013) Population Total 198,051 8308400 593245 351700 Age (%) (Wales 15-24) 18-24 14.0 9.6 5.9 18.5 25- 34 13.6 20.0 13.1 16.1 35-44 12.8 15.5 14.2 12.5 45-54 13.1 12.7 13.7 6.6 55-64 10.9 8.7 10.0 9.8 65+ 17.4 11.3 13.5 13.6

(33)

Density (no. persons p/km^2) 727 2548 3,298 2470 Occupation: Employed

(full-time/part-time) 93.8 93.5 85 88.5

Occupation: Full time Student (% out of geographical

constituency) 15.1 11.4 13.7 16.5

Occupation: Unemployed/Never

worked 5.6 16.5 18.2 13.0

(National Records of Scotland 2012, Neighbourhood Statistics 2012, Cardiff Council 2014)

Table 5 shows that out of the four cities, Cardiff has the highest percentage of young adults at 18.5 per cent, followed by York with 14 per cent. These two cities also have the highest percentage of full time students (16.5 per cent and 15.1 per cent

respectively), which corroborates these spikes in the younger population. In London, the age group 25-34 make up 20 per cent, the majority in comparison to other age groups. Interestingly in Glasgow, about 14 per cent of residents are aged 35-44, making it the largest age group in the city. Despite having the lowest percentage of people aged 18-24 at just under six per cent, Glasgow still has more full-time students at almost 14 per cent than London’s 11.4 per cent. Furthermore, Glasgow has the highest population density ahead of London with 3298 people per square km to London’s 2548 per square km. It is in this statistic that York is exposed as having an extremely low population density with only 727 people per square km, something that seems to be reflected in the frequency of quotatives collected from this city.

Twitter’s Application Programming Interface (API) delivers a sample from its overall stream of message and posts. Using a programming language, in this instance Python, as well as library Tweepy, it is possible to access the API and filter tweets accordingly. It was a recent study by Bamman et al. (2014) who utilised the Twitter API in collecting data regarding gender and lexical variation on Twitter. Using the API in this manner to collect data for linguistic analysis inspired me to replicate this part of the author’s methodology, tweaking it to meet my requirements.

3.1 Data Collection Method

Accessing the Twitter API, I collected a large stream of tweets for three weeks, from 8th _{– 29th May 2015 using the python library, Tweepy. I developed, tested and}

(34)

improved my script in order to avoid errors. Although no data is available for the number of tweets posted daily from the United Kingdom, 500 million worldwide tweets daily suggest a need to filter results accordingly. Firstly, only tweets in English were required, twitterStream.filter(languages=["en"]. Then the location needed to be filtered. Location is defined by stating the longitude and latitude of the area that should be covered, such as the coordinates of each city4_{. It was not possible to search all the data}

simultaneously. Therefore each city’s code ran in two hour blocks, and this cycle was repeated throughout the days, the order randomly shuffled each day to avoid an effect from the time of day.

Initially I had planned and carried out several CODERUNS inserting filter keywords within Python itself using the statement

“.twitterStream.filter( track=["QUOTATIVE 1, QUOTATIVE 2"])”. However, I found that this was detrimental to the collection of zero quotatives which were omitted from the search, as well as other undefined quotatives. Therefore, the data was collected raw, using only a geographical and language filter. It was then opened in Intellij IDEA, an integrated development environment (IDE), which allowed the resulting data to be displayed and filtered more conveniently than Python’s IDLE. I chose this rather than a traditional linguistic concordance program due to the retrieval within python of not only the Twitter message, but also its accompanying code. Within this program I then mined the tweets for both old and new quotatives and their variants as displayed below:

 Like

 Go/es, going, went

 This is me/you/him/her/us/them  Say/s, said, saying

 Think/s/ing, thought  “/” (Zero Quotative)

Although these filtered quotatives are not all semantically equivalent in that say reports predominantly speech and think mental states for example, this should not have any significant effect on the results. The key word “like” was selected without its several conjugations of be to avoid limiting tweets to contain one certain variant of a

4 Cardiff: -3.197536,51.457052,-3.158913,51.493832 Glasgow: -4.325523,55.827841,-4.196777,55.896816 London: -0.258179,51.452028,-0.105228,51.530499 York: -1.096230,53.95300,-1.078205,53.967843

(35)

quotative by pronoun or verb type. However, it is important to mention that even using these filter terms, tweets were returned containing undesired forms such as:

(21) It's like how Susan Boyle bugged out after she lost Britains got talent (22) Shit be like, 1 step forward 2 steps back

(23) At least one idiot a night shares some shit like this! Seriously like grow the fuck up!

(24) yeah, like 2 hours ago

(25) Weak at the fact that there's like 15 girls arguing over a boy that's said fk all the past 3 hours

In (21), like functions as a conjunction of similarity, comparing the predicate to an undisclosed phenomenon, indicated by third-person neuter. As mentioned in 2.3.3, (22) is an ‘inanimate’ third-person neuter quotative construction, with a lack of

counterparts in the other quotatives considered in this paper and therefore unsuitable for this analysis. Like in (23) functions as a semantically empty discourse marker, and in both (24) and (25), the elements proceeding the approximate like, are given importance,

like acting as a ‘focus’ marker.

All of the above examples do not introduce direct speech and were rendered unproductive within the analysis. As expected, when a word such as like has several different semantic meanings, automatic filtering is not productive, and therefore confounds were manually excluded from the analysis.

Due to mandatory keyword or coordinate filtering as part of the standard terms and conditions of accessing the Twitter API, it is not possible to intentionally collect results of undefined quotative use. However, in accordance with Labov’s (1972) Principle of Accountability, all tokens of direct speech, even if they are not specifically filtered for, were extracted from the corpus and included in the resulting dataset. These additional quotatives that did not fit any of the six specific quotative variants were categorised under other, just like the following example from the corpus.

(26) (302) Me to my mum 'maybe she doesn't like white boys' my mums reply to me 'She's not you helen'

(36)

In order to allow for an even analysis, the first 7,000 tweet messages per city were collected and excess data were filtered out. Overall, 28,000 tweets were examined for direct speech and evidential thought tokens within the corpus.

I chose six semantic topic areas by which to categorise the quotative corpus. Due to a lack of literature around semantic content and quotative use, these six categories attempted to account for as many semantic possibilities covered on Twitter as possible, although it is by no means exhaustive. These are as follows;

Coding by letter

Semantic Topic of Quotative environment within a Twitter message (A) News and Current Affairs

(B) Personal and Social Experience (Home, work, friends, family, socialising) (C) Entertainment, Music and Sport (television, celebrities, football)

(D) Technology and Social Media (Twitter functions, blog links) (E) Food, Drink, Lifestyle and Culture

(F) Ambiguous/Other

To further illustrate here are some examples taken from the corpus that were agreed by all three judges to belong to each category. The number indicates their position within the corpus appendix.

(A) (69) @nameremoved @ nameremoved \"Mugabe-like

landgrabs\". \"Stalinist logic to education\". What a bunch of drama queens the Spectator are.

(B) (84) "There really should be a Uni Course on Banter, we could be the lecturers providing top class bants" @ nameremoved

(C) (94) "You used your breath control when you-*breathes*-needed to...\" Thanks, Amanda! #BGT

(D) (239) @ nameremoved @ nameremoved tweeted a pic of it n i was like woah...,but im seein the #real thing rn?!! that's how i knew it was her Room (E) (273) Having th bartender wondering whether I am a bartender after ordering @Stoli orange coz "I know my shit" #MiniFistPump xxx

(F) (275) @____ Indeed they did, it is like the saying "I could have freed more slaves if they had only known they were"

As a topic can be subjective, three independent judges coded each quotative and its semantic environment and reliability was calculated. See section 4.2.4 for the reliability and resulting findings.