• No results found

Placement Report

N/A
N/A
Protected

Academic year: 2021

Share "Placement Report"

Copied!
26
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Placement Report

Temperature in language:

typology, evolution and extended uses

Towards the creation of the Collocation Database of Germanic Temperature Adjectives (freely accessible at http://tderoo.nl/gertemp)

Thomas de Roo (S2644355)

Supervised by Prof. Dr. Maria Koptjeskaja-Tamm (Stockholm University) Internal supervisor: Dr. S.A. Sprenger (University of Groningen)

July 1st, 2018 – May 1st, 2019

___________________________________________________________________________

Abstract

Temperature is universal and easily perceived by humans, but its conceptualization depends on a complex interaction between physical experience and external reality. This makes the ways in which language deals with temperature an intriguing subject. This project focuses on the salient temperature adjectives, such as hot, warm, tepid, cool and cold, which are compared according to their origin, function and especially their combinability with nouns, based on attributive and predicative collocations in six comparable corpora in English, Dutch, German, Danish, Swedish and Norwegian.

As such this project makes an effort to chart the distribution of both concrete uses in the tactile ( hot stove), ambient (hot air) and personal (cold shiver) frames of temperature evaluation, as well as that of the extended, metaphorical (cold heart) uses of temperature adjectives. The languages both resemble each other and differ in these patterns, especially the semantics of the extended usages. While most temperature adjectives are cognates, they are not always used in the same way, English stands out with hot as the salient antonym of cold for example. The research is embedded in a bigger typological

Placement report of:

Thomas de Roo (S2644355) – t.r.c.de.roo@student.rug.nl G. Borgesiuslaan 451

9722 VL Groningen +31637351289

(2)

project which tries to answer the question how semantic systems, such as the temperature system, emerge and develop.

Introduction

Within linguistics, my interests lie most within the more theoretical disciplines, especially semantics, historical linguistics and typology, most ideally combined in some fashion, which is what I found in my research internship which I carried out between July 2018 and May 2019 (prolonged from January 2019), helping in a very interesting typological project on the semantic domain of temperature.

The third semester in the 2nd year of the Research Master Language and Cognition at the University of Groningen is devoted to a research internship. I spent this internship helping Prof. Dr. Maria Koptjevskaja-Tamm, a professor at Stockholm University, who for her sabbatical research project works on the typology, evolution and extended (e.g. metaphorical) usages of the semantic domain of TEMPERATURE across the languages of the world, following her 2015 book The Linguistics of Temperature. During this internship my task was to mainly help out with the data collection towards a typological comparison of this semantic domain in the Germanic languages. This project was a very good fit, as it allowed me to more or less gain hands-on experience on all three disciplines of interest mentioned above.

This report will firstly provide some background information on the subject, describe the description of my tasks as well as the course of the internship and what is accomplished up until now. Finally, this report will conclude with an evaluation.

General Background

This section of the report will provide some essential background information on the semantic

domain of temperature, as well as other important notions.

(3)

Temperature is universal and easily perceived by humans, everyone normally perceives temperature but the experiences and perceptions of temperature and the distinctions within as to kind or degree are far from invariant (cf. Plank, 2003), however being able to talk about this semantic domain and to express distinctions of temperature is universal for human language.

The conceptualization of the TEMPERATURE domain relies on a complex interplay between “external reality, bodily experience and evaluation of the relevant properties with regard to their functions in human life in a particular setting” (Koptjevskaja-Tamm, 2011:1).

Lexical typology in closely related languages

In theoretical semantics temperature adjectives play a role in the discussion of linguistic scales and antonymy, where the question is how temperature is organized on one or multiple scales (cf. Lehrer, 1970; Sutrop, 1998). In cognitive linguistics temperature is touched upon as an example of a locational domain (cf. Clausner & Croft, 1999). Most research is focused on single languages and gives hardly any ideas of such lexical systems across languages.

An important task of lexical typology is the task of studying how languages categorize

or divide different cognitive domains among their expressions (cf. Koch 2001, Koptjevskaja-

Tamm 2011). Closely related languages, for example, the languages inside the Germanic

language family share a great deal of properties and common history with one another, which

is a commonplace. However, related languages likewise also differ from one another in

numerous ways: in the grammar, in the phonology and particularly in the lexicon, which is

why cross-linguistic comparison between related languages is relevant and upon which this

research builds.

(4)

Important concepts

Temperature domains

TEMPERATURE VALUE covers the distinction between warming and cooling temperatures as well as the distinction between excessive heat and pleasant warmth etc. This dimension is described by the use of semantic labels such as ‘cold’ (unpleasantly cooling), ‘cool’

(pleasantly cooling), ‘warm’ (pleasantly warming) and ‘hot’ (dangerously warming), (cf.

Plank 2003 as cited in Koptjevskaja-Tamm, 2008, 2011).

Temperature terms

Inspired by Berlin & Kay (1969), Sutrop (1998, 1999) and later Plank (2003, 2010) suggested a similar system of basic temperature systems as the one that exists for COLOR . Sutrops original definition “the basic temperature term is a psychologically salient, in most cases morphologically simple and native word, which generally denotes the quality of temperature at a basic level, and which is applicable in animate, inanimate and weather domains”

(1998:61), was criticized by Koptjevskaja-Tamm and Rakhilnia (2006) regarding his main criteria for morphological simplicity, especially supported by evidence from Russian where there are different basic temperature terms depending on the frame of TEMPERATURE EVALUATION (see the previous paragraph).

Plank’s (2003, 2010) definition of basic temperature terms are more elaborated, they should be (i) salient (ii) generally known (rather than only among experts) (iii) with their meanings generally agreed on (iv). morphologically simple are at any rate non-compositional (v) of regular grammar (vi) native or at any rate nativized (vii) specialized for this domain or at any rate, if shared with other domains, primarily used for this domain (viii) within this domain none-too-restricted in their application [among frames of TEMPERATURE EVALUATION ](cf. Plank 2010).

Frames of evaluation

The FRAMES OF TEMPERATURE EVALUATION (Koptjevskaja-Tamm, 2015) are the semantic

frames in which the meaning carried by a certain temperature adjective is evaluated, i.e

understood. In this project the term refers to the distinctions between three concrete frames

(5)

and two frames of usage outside the semantic domain of temperature. The three concrete frames are:

TACTILE

(The cold stones)

AMBIENT

(The cold room)

PERSONAL

-

FEELINGTEMPERATURE

. (The cold dog)

The two frames of usage outside the temperature domain are:

EXTENDED

, metaphorical usages, mapping temperature onto metaphors

CROSS

-

MODAL

, mapping temperature concepts to other perceptual modalities (vision, audition, olfaction etc.)

Evaluated entities

The evaluated entities, for instance water, may affect the usage of a particular subsystem of temperature values, with specific values for these entities. For water these include extreme values such as ‘ice-cold’ and ‘boiling hot’.

Koptjevskaja-Tamm (2015:19) hypotheses hierarchy in the number of temperatures available to specific frames: personal-feeling temperature is often expressed using a reduced subset of temperature terms, compared to the subset for tactile temperatures, which in turn is lesser than the subset for expressing ambient temperatures. This hierarchy is visualized below:

PERSONAL ≤ TACTILE AMBIENT

Figure 1: The hierarchy of number of temperature terms possible in temperature evaluation.

(following Koptjevskaja-Tamm, 2015)

The Semantic approach and the Distribution Hypothesis

(6)

An important aspect in this research is the meaning of temperature adjectives. According to the tradition within the Moscow school of Semantics and Corpus Linguistics (cf. Firth, 1957;

Wierzbicka, 1985; Sinclair, 1991; Apresjan, 2000; Stubbs, 2001; Kilgarriff & Tugwell, 2002) the meaning of words is revealed in specific contexts (Stathi, 2015: 360). This combinatory potential of adjectives with specific nouns is thus essential for grasping its meaning. In this I follow Stathi (2015), who researched the combinatory potential of Greek temperature adjectives with nouns which in extension denote ENTITIES . According to this approach the meaning of a word is “its intension rather than [its] denotation” (Koptjevskaja-Tamm &

Rakhilina, 2006).

According to the Distribution Hypothesis (Zellig Harris, 1968, cited in Koptjevskaja- Tamm & Sahlgren, 2014) there is a correlation between similarity of distribution and similarity in meaning. The Distribution Hypothesis in particular will be tested by the corpus- driven part of the study, i.e. the more similar the distribution of two temperature terms, the more similar is its meaning. This particularly comes in handy comparing the meaning of cognates in closely related language, as is the case for this research.

The goals of the internship

The main tasks of this internship were to collect, systemize and visualize temperature data from the Germanic languages within the framework of Koptjevskaja-Tamm’s earlier research on temperature typology and according to certain guidelines (Koptjevskaja-Tamm, 2007), elaborated upon later in the report.

The data should contribute to writing a chapter on the temperature systems as they are

found within the Germanic languages, time permitting the goal was to write said chapter at

(7)

least partly. The following six tasks were defined in the original internship application back in early 2018:

1. Collect data on the systems of temperature expressions across the Germanic languages and their etymological sources, based on dictionaries and elicitation

2. Carry out corpus searches on the uses of the temperature expressions, both concrete and extended, in several Germanic languages

3. Assist in systematizing the data

4. Assist in visualizing the results – both the linguistic data (e.g., as semantic maps) and their distribution across languages (geographic maps)

5. If time permits: Participate in a co-joint talk on the results of the research 6. If time permits: Participate in a co-joint article on the results of the research

The actual internship consisted more of data collection than intended beforehand and during the course of the internship the goal shifted towards the completion of a database of temperature expressions in use in the six major Germanic languages: English, Dutch, German, Danish, Swedish and Norwegian.

Aside from that, etymological and general descriptive data of the temperature adjectives and the divisions between those in those major, but also in other Germanic languages (Icelandic, Faroese, Yiddish, Afrikaans, West-Frisian and Dutch Low-Saxon) were collected and partly analyzed and described.

The collected data covers mostly temperature expressions across the Germanic

languages, their etymologies as well as corpus-driven data on temperature expression in their

concrete and extended-meaning (e.g. metaphorical) usages). The collected data is described

more in detail further in this report.

(8)

For an additional 5 ECTs (on top of the 20 ECTs for the internship itself) a co-joint article about the research’s methodology will be written and a talk will be held at a conference, these two things remain to be done as of now.

The course of the placement

First phase: familiarization

I officially started this internship at the start July 2018, to be able to start working on it in the summer. Because of the logistics of the summer vacation on all three sides of the internship agreement, I effectively started working in late July 2018, after already having familiarized myself with some relevant literature in the first half of July.

During this first phase I took notion of the most important concepts within the area of the typology of temperature expressions as well as read most relevant literature, directly or indirectly related to Maria’s project and discussed the plans for the next few months with Maria.

Second phase: initial data collection

During the course of the internship, two phases of data collection can be distinguished. The first phase consisted of collection typological and etymological data on the temperature terms in the Germanic languages, trying to at least answer the following questions: (i) what are the basic temperature terms in the Germanic languages?, (ii) to what extent are the basic and the more marginal temperature terms cognates, (iv) to what extent are cognate temperature terms treated differently across the Germanic languages, both in their concrete and extended usages?

I started working on the Germanic temperature system by collecting and describing

data on the Dutch system of temperature evaluation, especially on the constructions used

when expressing personal, tactile and ambient concrete temperature values. At this point, it

(9)

was not yet clear that this project would mainly focus on temperature adjectives, that was something that was decided later. The original plan was to later take this as a point of departure comparing it to the other Germanic languages. I started writing a text that eventually might turn into a chapter in the finalized typology project or at the very least would be suitable for multiple articles on the subject.

Data on Swedish, Icelandic, Frisian, German and Frisian were already, partly or extensively available within the project, and mostly collected by Maria herself consulting various experts on those languages (if needed). At this point it was also unclear which Germanic languages and dialects exactly would be included. During phase three, we decided to have a core, empirical, part, supported by data from the six major Germanic languages, completed by any descriptive / etymological data from the other Germanic languages collected.

I collected the Dutch and Low-Saxon data myself, based on the guidelines for the collection of temperature expressions (Koptjevskaja-Tamm, 2007) and on my own language intuition.

According to the earlier mentioned guidelines for the collection of temperature expressions, most data in this phase was collected with the same typological survey exploring various properties of the use of temperature expressions, including their exact meaning and use in context as well as their grammatical properties and validity to be used in the different frames of temperature evaluation. In order to decide the scope of various temperature terms, all possible frames were used in the original sentences, which then were translated from English to the native languages of the informants.

For Low-Saxon I preformed interviews in line with the temperature guidelines, with

my grandfather, a speaker of Low-Saxon, which I transcribed and made in accordance with

the guidelines. I also collected data on Faroese by intensive mail contact with a Faroese

(10)

acquaintance of mine, as well as Danish data by letting a Danish scholar fill out the survey for collection of temperature expressions.

During this phase, I even tried to complete most of the incomplete data already collected. I mailed to various experts, as well as native speakers among my friends and acquaintances on Icelandic, Frisian and Afrikaans, to get more insight on any unclarities in previous mail contact between Maria and experts on those languages. The Scandinavisch Vertaal- en Informatiebureau (Scandinavian Translation and Information Agency) in Groningen provided me with the contact information on various native speakers of Icelandic.

Most data collected in this phase relates to the concrete usages of temperature terms, and shows where distinctions among the temperature systems in the Germanic languages lie:

At this point it already seemed that most of the Germanic languages have a very limited use of

"hot" and are thus quite similar in that respect. At the same time, people share the intuition that "hot" is still important as a distinction to make and that "warm" is associated with something nice. The radical languages, however, are Icelandic and Faroese, Afrikaans and English: the first three have completely lost the contrast between warm and hot, while English has made "hot" to the salient antonym of “cold”, rather than “warm”. Icelandic and Faroese use “hot” rather than “warm” in most cases, Afrikaans has lost “hot” and uses “warm” in most cases.

During a meeting the 27

th

of October 2018, to get a more complete picture of the temperature terms in context we decided to collect collocations in corpora, based on and inspired by the work of Stathi (2015), who researched the combinatory potential combinatory potential of Greek temperature adjectives with nouns which in extension denote ENTITIES.

According to this approach the meaning of a word is “its intension rather than [its]

denotation” (Koptjevskaja-Tamm & Rakhilina, 2006).

(11)

Third phase: systemization of data

Corpus material

A corpus-driven study on how the temperature terms behave in context makes sense within the approach to meaning as explained above. For this corpus-driven analysis, it is desirable to have corpora that are as extensive and as comparable as possible. I chose to make use of the TenTen corpus family (Jakubíček et al., 2013), which is a set of comparable web text corpora, freely available for research through SketchEngine. The web texts are crawled from the internet and contain news articles, forum posts, openly accessible social media and blogs, as well as more formal texts found on the internet. As such, they are very representative of actual language usage.

The TenTen-corpora used

Corpus name Corpus language Corpus size (in tokens)

deTenTen German 16.526.335.416

enTenTen English 15.703.895.409 svTenTen Swedish 3.401.035.817

nlTenTen Dutch 2.538.714.434

noTenTen Norwegian (bokmål) 2.472.622.031

daTenTen Danish 2.170.994.053

The CDGTA

By Late October 2018 I started working on what we call The Collocation Database of Germanic Temperature Adjectives (CDGTA), based on Word Sketches (i.e. a sketch of collocations with a certain word) of the TenTen corpus family and queries of my own hand to extract the most frequent combinations of temperature adjectives with nouns, in attributive and predicative situations.

A typical entry in the CDGTA looks like (1) below (simplified) and contains

information about the collocation: the type (temperature term and modified noun) and tags for

the frame of temperature evaluation (ambient, tactile, personal, extended or cross-modal), a

(12)

semantic category (water, surfaces, time period etc.) as well as the number of tokens (absolute hits in the corpus).

(1) Language Value Term Modifies Semantic category

English warm warm winter ambient / time period

Collocation type Absolute hits

Attribution 632

The CDGTA contains 8778 collocations with temperature adjectives, both as attributes and predicates, of which the attributive part (5355) was semantically tagged, according to carefully chosen semantic categories (see below). The predicative part was mainly tagged automatically, based on earlier tags of the attributive part, the automated annotations were marked so that they could/can be manually checked.

Each collocation is tagged according to its meaning, using a list of categories especially compiled for this task. The categories used are based on Koptjevskaja-Tamm (2007), Schönefeld (2007), Kövecses (1995), Geeraerts & Grondelaers (1995), also Goossens (1998) and Lakoff & Johnson 1997:50 and on Koptjevskaja-Tamms notion of TEMPERATURE ENTITIES (Koptjevskaja-Tamm, 2007, 2011). The categories which we decided upon, per

FRAME OF EVALUATION , are given in table 2 below. Each category has an internal code, unique within frame, which was used in the tagging process.

Frame Internal code

Category name

tactile ne Tactile: Natural environment tactile ns Tactile: Natural surface tactile nw Tactile: Natural water

tactile hs Tactile: Household/manmade surface tactile s Tactile: Surface (general)

tactile hw Tactile: Household water

(13)

tactile w Tactile: Water (general) tactile c Tactile: Consumption tactile bp Tactile: Tactile body part tactile bl Tactile: Bodiliy liquid tactile ol Tactile: Other liquid

ambient pg Ambient: Geographical place ambient pi Ambient: Place indoors ambient p Ambient: Place (general)

ambient w Ambient: Weather and weather phenomime ambient se Ambient: Environmental or external source ambient si Ambient: Indoor source

ambient s Ambient: Source (general) ambient co Ambient: Conductor ambient b Ambient: Bed ambient cl Ambient: Clothing ambient tp Ambient: Time period

ambient tps Ambient: Seasonal time period personal p Personal: Person / human personal a Personal: Animal

personal tm Personal: Temperature manifestation cross-

modal

p Cross-modal: Perception: undefined

cross- modal

pso Cross-modal: Perception: sound

cross- modal

psm Cross-modal: Perception: smell

cross- modal

pc Cross-modal: Perception: colour

cross- modal

pv Cross-modal: Perception: vision / light

cross- modal

pt Cross-modal: Perception: temperature

cross- modal

pto Cross-modal: Perception: touch

cross- modal

pte Cross-modal: Perception: temperature

cross- modal

t Cross-modal: Temper

cross- modal

m Cross-modal: Movement

cross- modal

a Cross-modal: Action or process

cross- modal

l Cross-modal: Livelihood

extended asf Extended: Affection: states and feelings extended aahr Extended: Affection: active human reaction

extended amr Extended: Affection: manifestation of responsiveness extended ah Extended: Affection: human

extended am Extended: Affection: metonymy

(14)

extended aab Extended: Affection: abstraction extended h Extended: Hostility: undefined

extended hard Extended: Hostility: Anger / Rage / Disgust extended hdje Extended: Hostility: Disgust / Jealousy / Envy

extended hdjeh Extended: Hostility: Human (within Anger / Rage / Disgust) extended hdjem Extended: Hostility: Metonymy (within Anger / Rage / Disgust) extended harh Extended: Hostility: Active human reaction

extended hm Extended: Hostility: Metonymy (general)

extended hdjemr Extended: Hostility: Manifestation of responsiveness (within hdje) extended hdjeahr Extended: Hostility: Active human reactions (within hdje)

extended c Extended: Calm

extended irl Extended: Indifference, rationality and logic: undefined extended irla Extended: Indifference, rationality and logic: abstraction extended irlsf Extended: Indifference, rationality and logic: states and feelings extended irlahr Extended: Indifference, rationality and logic: Active human reactions extended irlmr Extended: Indifference, rationality and logic: manifestation of

responsiveness

extended irlh Extended: Indifference, rationality and logic: Human

extended irlhi Extended: Indifference, rationality and logic: Human Interaction extended irlm Extended: Indifference, rationality and logic: Metonymy

extended id Extended: Intensity and danger: undefined extended m Extended: Metonymy (general)

extended idl Extended: Intensity and danger: lust extended idp Extended: Intensity and danger: passion

extended idhi Extended: Intensity and danger: human interaction extended idh Extended: Intensity and danger: human

extended idahr Extended: Intensity and danger: Active human reactions extended idsf Extended: Intensity and danger: states and feelings extended idm Extended: Intensity and danger: metonymy

extended rt Extended: Relevance and topicality

extended cool Extended: Cool / Swift / Popular: undefined extended coolh Extended: Cool / Swift / Popular: human extended coola Extended: Cool / Swift / Popular: abstraction extended coolm Extended: Cool / Swift / Popular: metonymy extended coolt Extended: Cool / Swift / Popular: things

extended coolmr Extended: Cool / Swift / Popular: manifestation of responsiveness

The actual process of tagging the collocations was done between November 2018 and

April 2019 and consisted of extensive corpus searches, careful inspection of what a term

means in various samples in the context, consultation with native speakers and of own

language intuitions. Many annotations made by me were marked as unsure and required extra

attention and discussion in order to decide on the most fitting category. The process also

involved discussion with Maria on the applicability of the various categories we decided upon

(15)

and this led to some reconsideration in which some of the categories were merged with others, split into several or simply omitted because they did not seem to serve any purpose in the end.

The Temperature terms used in the database were chosen either based on earlier research (for Swedish (cf. Koptjevskaja-Tamm, 2006), English (cf. Rasulic 2015 &

Schönefeld 2007) and German (cf. Schönefeld 2007)), or on data collected according to Koptjevskaja-Tamm’s guidelines for temperature data collection (for Dutch, Norwegian and Danish) and comply to Plank’s (2003, 2010) definition of basic temperature terms.

Creating the CDGTA

From November onwards most work of the internship went into the creation of the collocation database, in order to get a good overview of the usage of the temperature terms using the 150 most common collocations types.

By late November 2018, I finished a first prototype of the collocation database, containing just warm and hot in the six languages earlier stated. I started experimenting with the data in R, and we came to the conclusion that even though the first prototype was a step in the right direction, it would give us much more useful insights if we based the data on absolute hits and a more excessive system of tagging and categorization than just a basic distinction between frames of evaluation, which is the only annotation used in the first prototype.

The first prototype was enough however to know for sure that the warm/hot distinction lies very differently among the six biggest languages (especially English), making it worth to continue with the cold/cool distinction and the neutral domain as well.

During November 2018, I had several meetings with Maria in order to come up with

the most elegant way to categorize and tag the collocations. We came up with that just

marking the collocations for their possible tactile, ambient, personal and extended usages was

(16)

not enough, and decided to mark the collocations using more semantically meaningful tags (see above), based on earlier literature.

In earlier visualizations I made the mistake to mix up the types represented by every entry in the collocation database with the actual tokens (the tokens are represented as actual hits inside the database), which resulted in preliminary results which were very odd while comparing frequencies of collocations across languages.

Prolonging the internship

By January 2019, it was clear that I would not succeed in finishing the collocation database by the end of the original final date of this internship: the 15

th

of January, mainly because the process of manual tagging had taken up too much time.

While the original goal of the internship, which basically was “assisting in data collection” was met, I was not satisfied with finishing the internship with neither a chapter written (as originally planned) nor another ‘final product’, which made me decide to prolong the internship until May 2019. As such, I had the time to work out the collocation database and get started on some preliminary data analysis as well.

Simultaneously, from February 2019 onwards, I started working on my research thesis, which will be a continuation of this project and in principal the chapter that was planned to be written for this internship.

When the first full version of the collocation database was mostly ready around April 2019, I also decided upon creating a visualization tool for the collocation database, which will be elaborated in the next section.

The current version of the collocation database used in the visualization is mostly

ready but there are still some doubly or unsurely tagged collocations which should be checked

and discussed, therefore the collocation database may be subject to change at this point.

(17)

Fourth phase: pre-liminary analysis and visualization

In order to best visualize the data as well as in order to get a good overview of all aspects of the data, and in order to avoid repetitive procedures, I decided upon building a web site on which one finds statistics and distributions on interactively changed subsets (so called filters) of the collocation database. In this way it becomes easy to compare a big variety of subsets (by domain, language, temperature term, syntactic construction and evaluation frames. This web site is freely accessible through http://tderoo.nl/gertemp and can be used to browse the collocation data base built during this internship.

A screenshot of the home page of the website of the Collocations Database of Germanic Temperature Adjectives

This part of the internship required quite a bit of programming from my side and was

in my opinion the most enjoyable, because it made it easy to directly get insights from the

database, which was rewarding. The tool was programmed by me over the course of Late

(18)

March-Late April, in productive PHP in addition with various jQuery libraries for data visualization. The collocation database itself was migrated to SQLite in order to work with this tool and make it easy to preform extensive queries on the database.

The website uses the most recent version of the collocation database and can easily be updated once the database gets changed, i.e. when wrongly tagged collocations or other potential errors get fixed.

The visualization tool that I made has, at the time of writing this report, the following features:

 Quick search, to easily find certain collocations, but also temperature terms and modified entities, to quickly get an overview of all collocations linked to those

 An overview of the distribution of semantic tags, frames of evaluation, domain and

language for each filter. Filters range from very wide subsets (all collocations or all

collocations within a certain domain, e.g. hot) to very narrow subsets (e.g. all

collocations with the noun coffee). Filters for each domain, frame, tag and language

are built into the tool. Any other combination of those parameters combined can be

created and saved by the user.

(19)

o The following screenshot shows the collocation overview of a filter (All

collocations in the hot domain in Danish, Swedish and Norwegian)

(20)

o The following screenshot shows the frame distribution, within the same filter

o The following screenshot shows the other distributions within the same filter,

note that hot is at 100% for this filter, because it only contains collocations

within that domain

(21)

 All collocations within a certain filter can be easily exported to CSV for further analysis, the exported csv-file contains the collocations exactly as they appear inside the database and can be used to import into R.

 Single collocation summaries show the syntactic construction (predication/attribution)

of a single collocation, all the tags linked to that collocation as well as distributive

(22)

statistics and an overview of the same collocation across other languages.

Reflection on the placement

Place in the Research Master program

As mentioned in the introduction of this report, the research internship is a mandatory part within the second year program of the Linguistics: Language and Cognition research master’s program.

Many of the courses that I took in the first year of the Research Master, both mandatory

courses and I courses I chose from other masters as well as courses at the LOT summer

(23)

school, were at least to some degree very good and compatible preparation for the tasks carried out in this internship. Especially helpful were the courses Design your research project, Corpus Linguistics, Basic Statistics and the LOT summer courses in Linguistic typology and language diversity and Mechanisms and methods in historical linguistics that I took.

Supervision

During the application process, it was determined that Maria’s supervision would consist of the following five points:

1. Providing the student with relevant literature and instructions to introduce the student to the project

2. Initial discussions for working out the framework for data collection 3. Regular meetings for discussing the different tasks of the internship 4. If time permits: Coordinating the preparation of a talk

5. If time permits: Coordinating the work on a paper

The supervision I received was broadly in line with the planned supervision and consisted mostly of excessive mail contact with Maria with any questions, unclarities and the need to discuss proposals, progress and potential issues, relevant literature and insights as well as practical information, but also of several meetings with Maria, in October and November.

Overall the supervision was not that much in terms of quantity when it comes to meetings, but I am highly satisfied with the quality of the supervision, because every single one of the mails as well as the meetings were rather productive and helped me resolve issues encountered and helped this project further along.

The October and November meetings were very decisive for the course of the internship and

the creation of the collocation database of Germanic Temperature adjectives.

(24)

Later on, in this project (which does not stop as an internship), the amount of the supervision will probably rise as we start to work on the planned article on the accomplishments of this internship.

Learning goals

Before starting the internship, four overall learning goals were set to acquire by means of this internship by me and Maria:

1. Independent research skills in different linguistics disciplines.

2. Experience with linguistic typology

3. Experience and competence in corpus data collection, data systematizing and visualization.

4. Experience in co-joint article writing / talks.

The first three goals are at this point most certainly met, and in regards to this particular project the fourth goal is planned to be met eventually by means of a talk on this project (and the larger framework it is a part of) that will be given at the TABU student conference in June 2019. During the internship we also worked on the abstract for the conference, which already was an essential step in the preparation of the talk, as this was my first experience in the genre I had to learn a lot regarding this.

Overall this internship allowed me to improve on the academic skills I gained during my

Bachelor as well as during the first year of this master but most of all it allowed me to obtain

experience with independent research, as most tasks of this internship were mainly executed

independently. I wish to thank Maria for her trust in me to mostly letting me carry out most

data collection and analysis done until this point fairly independently, but also for her critical

feedback throughout the whole progress of doing so. I feel that Maria left me plenty of space

to work independently but also helped me out sufficiently where needed.

(25)

New knowledge and skills obtained

This internship gave me an increased understanding of the meaning and importance of typological research: it allowed me to gain practical experience with working with corpora, extensive corpus searches, working with corpora data in order to create a comparative database. It also allowed me to familiarize myself with various methods of data collection in typology and to a lesser degree in historical linguistics, which information to record, for the descriptive part of the temperature cognates. It helped me gain hand-on experience with systematical collection of comparative data and the analysis and visualization thereof.

Conclusion

In this internship report, I evaluated he course of my research internship which took place from July 2018 up until May 2019. Initially the course of the placement was not set in stone.

The initial idea was to mainly help in the collection of data for the Germanic languages and analysis as well as visualization thereof within Maria’s project on typology, evolution and extended uses of the TEMPERATURE domain in the languages of the world, as well as to write a co-joint chapter in Maria’s upcoming book. During the course of the internship, the goal shifted towards the creation of a semantically tagged database of collocations with Germanic Temperature Adjectives. The original goals of data collection were met by the original final data of this internship (15

th

of January 2019) but I chose to prolong the internship in order to be able to deliver at least one adequate final product, which became the Collocation Database of Germanic Temperature Adjectives.

All in all, this internship was a valuable experience, and while I created a resource

which I hope to be of great value for Maria’s project, there still is much to do within the

project, and I would love to stay involved, even after having continued this internship in the

form of a thesis.

(26)

Literature

Firth, John R. 1957. Papers in Linguistics 1934–1951. London: OUP.

Geeraerts, Dirk & Grondelaers, Stefan. 1995. Looking back at anger: Cultural traditions and metaphorical patterns. In Language and the Cognitive Construal of the World, John R.

Taylor & Robert E. MacLaury (eds), 153–179. Berlin: De Gruyter.

Goddard, Cliff & Wierzbicka, Anna. 2007. NSM analyses of the semantics of physical qualities: Sweet, hot, hard, heavy, rough, sharp in cross-linguistic perspective. Studies in Language34(1):

675–800.

Goossens, Louis. 1998. Meaning extensions and text type. English Studies 79: 120–43. DOI:

10.1080/00138389808599120

Hoeksema, J. 1985. Categorial morphology. Garland Press, New York.

Hoeksema, J., 2012, Crosslinguistic comparison of intensified adjectives and adverbs. Oebel, G. (ed.).

Hamburg: Kovac Verlag, p. 97 - 142 (1st).

Jakubíček, M., Kilgarriff, A., Kovář, V., Rychlý, P., & Suchomel, V. (2013, July). The TenTen corpus family. In 7th International Corpus Linguistics Conference CL (pp. 125-127).

Kilgarriff, Adam & Tugwell, David. 2002. Sketching words. In Lexicography and Natural Lan-guage Processing: A Festschrift in Honour of B.S.T. Atkins, Marie-Élène Corréard (ed.), 125–137.

Stuttgart: Euralex.

Koptjevskaja-Tamm, M. 2007. Guidelines for collecting expressions for temperature concepts: Version 1.

http://www.ling.su.se/staff/tamm/Tempquest.pdf

Koptjevskaja-Tamm, Maria (ed.) 2015. The linguistics of temperature. Amsterdam /Philadelphia: John Benjamins.

Koptjevskaja-Tamm, Maria 2015b. Introducing “The linguistics of temperature”. In Koptjevskaja-Tamm (ed.), 1 – 40.

Koptjevskaja-Tamm, M. & Sahlgren, M. (2014). Temperature in the word space: sense exploration of temperature expressions using word-space modelling. Aggregating Dialectology, Typology, and Register Analysis. Linguistic Variation in Text and Speech. 231-267.

Koptjevskaja-Tamm, Maria. 2011. “It’s boiling hot!” On the structure of the linguistic tempera-ture domain across languages. In Rahmen des Sprechens. Beiträge zur Valenztheorie, Vari-etätenlinguistik, Kognitiven und Historischen Semantik, Sarah Dessì Schmid, Ulrich Detges, Paul Gévaudan, Wiltrud Mihatsch & Richard Waltereit (eds), 379–396. Tübingen: Narr.

Koptjevskaja-Tamm, M. In preparation. ” Talking temperature with close relatives: semantic systems across Slavic languages”

Koptjevskaja-Tamm, Maria. 2008. Approaching lexical typology. In: Vanhove, Martine ed., pp. 3–52.

(27)

Koptjevskaja-Tamm, Maria & Henrik Liljegren 2017. Lexical semantics and areal linguistics. In Hickey, Raymond (ed.), The Cambridge Handbook of Areal Linguistics. Cambridge: Cambridge University Press, 204 – 236.

Kövecses, Zoltan. 1995. Anger: Its language, conceptualization, and physiology in the light of cross-cultural evidence. In Languageand the Cognitive Construal of the World, John R.

Taylor & Robert E. MacLaury (eds), 181–196. Berlin: Mouton de Gruyter.

Lakoff, George & Johnson, Mark. 1999. Philosophy in the Flesh. New York NY: Basic Books.

Lehrer, Adrienne. 1974. Semantic Fields and Lexical Structure. Amsterdam: North-Holland.

Lehrer, Adrienne. 1970. Static and dynamic elements in semantics: Hot, warm, cool, cold.Papers in Linguistics (Carbondale) 3: 349–373. DOI: 10.1080/08351817009389153

Plank, Frans. 2003. Temperature talk: The basics. A talk presented at the Workshop on Lexical Typology at the ALT conference in Cagliari, September

Plank, Frans. 2010. Temperature talk: The basics revisited. A talk presented at the Workshop on Temperature in Language and Cognition, Stockholm University, 19–20 March.

Rasulic, Katarina (2015). What’s hot and what’t not in English and Serbian: A contrastive view on the polysemy of temperature adjectives. In Koptjevskaja-Tamm (ed.), 254–299

Schönefeld, Doris. 2007. Hot, heiß and gorjačij. A case study of collocations in English, German, and Russian. In Phraseology and Culture in English [Topics in English Linguistics 54], Paul Skandera (ed.), 137–177. Berlin: Mouton de Gruyter. DOI: 10.1515/9783110197860.137

Shindo, Mika. 2009. Semantic Extension, Subjectification, and Verbalization. Lanham MD:

University Press of America

Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: OUP.

Stathi, Katerina. (2015). Temperature terms in Modern Greek in The Linguistics of Temperature Koptjevskaja-Tamm (eds) . 10.1075/tsl.107.12sta.

Stubbs, Michael. 2001. Words and Phrases: Corpus Studies of Lexical Semantics. Oxford:

Blackwell

Sutrop, U. 1998. “Basic temperature terms and subjective temperature scale.” Lexicology 4, 60-104.

Sutrop, Urmas. 1999. Temperature terms in the Baltic area. In Estonian: Typological Studies III

[Publications of the Department of Estonian of the University of Tartu 11], Mati Erelt (ed.), 185–

203. Tartu: University of Tartu

Wierzbicka, Anna. 1985. Lexicography and Conceptual Analysis. Ann Arbor MI: Karoma.

Dictionaries, corpora, etc.:

ANS = Haeseryn, W. (1997). Algemene Nederlandse spraakkunst. Groningen: Martinus Nijhoff.

(28)

Bouma, G. (2015), N-grams Frequencies for Dutch Twitter Data, CLIN Journal, volume 5

EWN = Philippa, M., Debrabandere, F., Quak, A., Schoonheim, T. & Sijs, van der, N. (2009) Etymologisch Woordenboek van het Nederlands, Amsterdam

Jakubíček, M., Kilgarriff, A., Kovář, V., Rychlý, P., & Suchomel, V. (2013, July). The TenTen corpus family. In 7th International Corpus Linguistics Conference CL (pp. 125-127).

Kroonen, Guus (2013) Etymological Dictionary of Proto-Germanic (Leiden Indo-European

Etymological Dictionary Series; 11), Leiden, Boston: Brill

Referenties

GERELATEERDE DOCUMENTEN

Figure 2.2 depicts such a classification in a Distribution by Value analysis (for calculation see Section 3.2 or Silver et al., 1998, p. If all SKUs have more or less the

Cross-Line can form an alliance with a foundry of aluminum castings, as a result it is able to offer these types of components to its own customers in the automotive industry..

We predict that, JWST/MIRI observations of the thermal emission at the night side of WASP-43b and WASP-18b will unambiguously distinguish between a climate scenario with wind flow

Mix the package of cocoa mix with the hot water in a cup or mug using a spoon.. Super Hot Cocoa

In section §2.3 the line profiles are modeled using a Keplerian disk model where it is concluded that a model with a standard power-law temperature structure does not provide a good

which may be attributed to the relatively large presence of grain boundaries (discussed in the previous section). The solid line in Fig. Characterization of graphene samples

The 1.2–195 µm spectrum of the low mass protostellar object Elias 29 in the ρ Ophiuchi molecular cloud shows a wealth of absorption lines of gas and solid state molecules. The

In addition to sector 2, the dashed yellow sector corresponds to a region considered also separately (see text). Left panel: Temperature distribution along the merging axis of