• No results found

Internship Report

N/A
N/A
Protected

Academic year: 2021

Share "Internship Report"

Copied!
35
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Internship Report

Samuel H.K. Ha (s2644754)

Name of organisation: Groningen University Translation and Correction Service (UVC) Internship period: 2 June 2020 – 14 August 2020

Report submission date: 19 August 2020

Placement supervisor: Emily Howard e.l.howard@rug.nl

Faculty supervisor: Dr Hans Jansen j.p.m.jansen@rug.nl

(2)

Content

Introduction...3

Life during corona time...3

The UVC...4

Daily work...4

The internship project...6

Evaluation...8

Academic learning goals...8

Personal goals...9

Additional points...9

Conclusion...11

Appendices...12

Appendix A: record of work...12

Appendix B: the project...16

(3)

Introduction

A Bachelor’s in English Literature can lead to many things. Most of these things are careers in teaching. The MA in Writing, Editing, and Mediating (WEM), however, proves that this is not a foregone conclusion and that other careers are out there.

After finishing the BA in English Language and Culture in 2017, I returned to the UK to explore the murky worlds of minimum wage jobs and under-graduate philosophy. Arguably there is little difference between the two. Neither of these were quite as productive as I had hoped, so after two years, I came back to the Netherlands with the burning desire to do something that was employable and hands-on. Accordingly, I took up the WEM Masters’ at Groningen because it seemed practical, enjoyable, and it is a unique programme without any real counterparts in the Netherlands or the UK.

Having emerged from the other end, I see now that it has been all I had hoped, and this is especially true after this internship.

The WEM 2 module, taught by Tia Nutters, offered an excellent insight into the world of professional editing, and she offered particular insight into how the University of Groningen’s (UG) own

translation and correction department (the UVC) go about things. The description of the work and approach appealed hugely, so, I applied for the internship placement, and was lucky enough for that application to be successful. I am very grateful to Tia for the taking me on board at the UVC and for the enthusiasm with which she extolled it and its work. I would like to extend this thanks to the rest of the UVC team as well, for their passion and the warm welcome I was met with while working there, and especially to my supervisor, Emi, for acting like a Virgil figure and guiding me through the various circles of text translation and editing.

I interned at the UVC, four days a week from Tuesday 2 June, until Friday 14 August.

Life during corona time

Before detailing what happened during the internship, it is probably wise to point to its somewhat unusual circumstances. The internship was different than anticipated owing to the corona pandemic that prompted national lockdown. Under usual circumstances, I would have been in the office four days per week, 9:00-17:30. However, the office was shut. Consequently, all the UVC’s work, and therefore the internship too, was moved online. As of writing this, the lockdown has not been fully lifted and the UVC is still working remotely. Throughout the internship, communication primarily consisted of WhatsApp messages, GoogleMeet meetings, and emails. All translation software had to be downloaded onto personal computers, rather than onto university machines, and much of the work that I translated and edited was about the pandemic and how much trouble it has been causing.

As mentioned, I completed the almost all of the internship remotely, however there was one exception; I met with Emi for a three-hour stint in the office –cleared by the UG and the appropriate authorities– to get an impression of how the office is set up and how everything worked under normal circumstances. This was a useful experience, although not necessarily too representative of how office life usually was, since the corridor where the UVC office is located was almost empty aside from us.

Luckily, the UVC had been working under quarantine conditions for a while, so everything was much

smoother than it might have been. There were morning check-ins at 9:00 to make sure that everyone

was present and accounted for, coffee breaks (when wanted), and there was regular contact with all

the other members of the UVC over WhatsApp.

(4)

The UVC

The internship took place at the UG’s translation and correction department: the UVC. It is part of the UG’s Language Centre. It offers translation, correction, subtitling, and copywriting services to customers inside and outside the UG. It consisted of a relatively small team of three constant workers while I was working there, and at least two other workers every day, who did a mixture of editing, translating and project management. The UVC’s workflow is structured around a two-stage system, wherein a task is either translated or edited in the first stage, often by a freelancer, and then it is checked in a second stage by an in-house translator or editor. The UVC also has a native-speaker guarantee, ensuring that a native speaker will look at a work during either during the first or second round. The UVC works with five languages: Dutch, English, French, German, and Spanish.

The tasks for the internship can be split into two categories:

1. Participating in the daily work of the UVC, 2. The internship project.

These are detailed more in the following sections.

Daily work

I joined the UVC as a effectively as a team member, working on editing and translation, as the current employees do. The internal structure of the UVC lacks a great deal of formal hierarchy, so it is difficult to say exactly what position or role I filled. ‘Translator/editor’ would probably be most accurate. In this role, I assisted in completing the daily tasks for the UVC.

The daily work mainly consisted of editing texts, translating texts, and performing a second-round corrections on texts that had been edited or translated by freelancers. I was able to work on correction work from the outset of the internship and had work to correct on an almost daily basis despite the lockdown and summer holidays affecting the usual influx of work. For an overview of all the assignments that I completed for the UVC and their nature, see Appendix A.

First-round edits involved using Microsoft Word and the ‘track changes’ and ‘comment’ functions in order to amend errors or offer suggestions and questions to clients where the texts were unclear.

The point of second-round edits is to catch any errors the previous editor may have missed. As a result second-round edits took less time, with the recommended editing speed for these being 2,000 words an hour.

One issue that I found repeatedly with primary – and often secondary – edits was that I would often bump into logical or structural errors in the texts and would spend quite a large amount of time editing the texts. Often more than was necessary, strictly speaking. The main challenge I found was with limiting the amount of editing that I did, in order to stick within the UVC’s expected time frame for tasks. The issue as I saw – and continue to see it – was that it is only possible to clarify language if the message contained by it is coherent. Otherwise, it is difficult to make a the language on a text clear where its central message is still obscure. This resulted in some edits taking a disproportionate amount of time. This improved by the end of the internship, and did not pose an issue during, on account of the UVC having relatively few jobs to do.

On account of my lack of Dutch language skills, the translation work largely consisted of putting the

source texts through an online translator (DeepL- if there is any confusion about the name in

Appendix A), then editing the result in the UVC’s go-to software programme, SDL Trados. Trados was

an interesting programme to use because its ‘Concordance search’ function allowed me some insight

(5)

into how words had been translated previously, which in turn helped with fixing machine translations and the project.

Throughout the day, I also corresponded with freelancers and clients via email. This correspondence largely consisted of responding to freelancer’s with second edits of their work and returning clients’

work.

Owing to the lockdown, and not being able to get into the office, much of the project management

side of things was impossible, since checking jobs in on Synergy (the UVC’s administrative software)

required careful observation and instruction. Despite this, I was able to organise a meeting to discuss

project management with a colleague – Iris Neijmejer – and walk through the basics of a project

management at the UVC. We covered the basics of how a project manager prioritises tasks and

processes them in Synergy.

(6)

The internship project

While planning the internship, the plan from the outset was to look at if time could be saved by using post-edited DeepL translations. This way, using an online translator for doing translating work was an asset, rather than an active hinderance. This involved several things.

- Making machine translations from Dutch into English using DeepL.

- Post-editing these machine translations.

- Analysing PE of MT in terms of main problems in the terms of vocabulary, grammar, punctuation, spelling, technical components, and untranslatable elements, to try and map what machine translation still has trouble doing.

- Reporting on these findings.

- Relating these findings to whether a premium subscription to DeepL would be a worthwhile investment for the UVC.

On account of how much time there was to approach this project, I began by writing a theoretical background to machine learning. This proved to be substantially more time consuming than I had anticipated. Not only was the world of translation a dense thicket of academic jargon, discovering how deep learning worked and what neural networks were took a great deal of time. However, this project is to form part of a larger body of work that the UVC is performing; luckily, this made the project a precursory study, rather than the final word, so many of the gaps that I may have left can still be addressed.

I ended up splitting the project into three parts: conceptualising translation, methods of machine translation, and machine translators in action. The foremost addressed how many stages one can think of translation as having and how this works in the context of machine translation. In effect, there are between two stages and five, depending on how detailed you want the translation to be.

The second addressed the main methods of machine translation and gave a basic overview of how they work. This part was particularly lengthy, primarily because of the basic breakdown of neural networks and deep learning. With little prior experience and next to no knowledge about such technical matters, this fairly short section of the project ended up taking a disproportionate amount of time. This section also offered some insights into the general issues faced by machine translators, so that if the UVC considers more machine translators in future, the known characteristics are already sketched out. The final part examined two neural network translators, DeepL and Microsoft translator. Using a post-edit of one particular document, it examined vocabulary, grammar,

punctuation, spelling, technical components, and untranslatable elements, and I gave a

recommendation for the UVC at the end of both; do not invest in premium machine translators because in their current state, they are too unreliable to offer the quality that is the hallmark of the UVC.

On account of the time taken to cover neural networks and a non-ideal working environment –

namely 35 degree weather and no air circulation in my room – there were some parts of the project

that could have used a little more work than they received. Namely, I would have offered a more

complete background to DeepL and Microsoft translator and would have integrated a section on the

time taken for post editing machine translations versus regular translating. During my time at the

UVC, one of my colleagues, Iris Nijmeijer, performed a study on just this, however, owing to time

constraints, I referenced this study rather than actively integrating it. Similarly, I would like to have

included a bigger section on ‘other considerations’ for Microsoft translator. For DeepL, I included a

section that addressed legality, time taken for post editing and data privacy, however, Microsoft are

(7)

much less forthcoming with at least legality and data privacy than DeepL, so this section proved

substantially more challenging than it should have been and I did not include it, owing to how

insufficient the data that I was able to find was. For its shortcomings, however, the project achieved

its goals and I was able to provide reasoned recommendations regarding machine translations for the

UVC. For the an edited version of the project, see Appendix B.

(8)

Evaluation

Academic learning goals

I learnt a lot during my internship at the UVC. Now that I have finished it, I feel substantially more prepared for employment. As a result of the internship being remote, I feel better prepared for freelance work, however, I am confident that my experience at the UVC will be applicable and practical almost anywhere.

Formally, according to Ocasys,

the Master’s placement gives students practical experience in a social environment. The aim of the placement is three-pronged: to assess the competences acquired during the

programme in the practical situation of an organization, by playing an active role in the organization over a prolonged period of time; to gain academic-level practical experience that corresponds with the competences acquired during the programme; to discover more about possible future professions/fields of work.

1

I will break this down into its “three-pronged” composite parts to examine why I think the internship was successful.

1. ‘To assess the competences acquired during the programme in the practical situation of an organisation by playing an active role in the organization over a prolonged period of time.’

a. I assessed the competences acquired during the rest of the programme and found them to be useful. The situation was as practical as it is possible to get in the world of text editing, from what I understand. I played an active role in the organisation by completing work for it up to the necessary standard, corresponded with freelancers, and generally speaking, became part of the running of things. Two and a half months would, in most circles, be considered a ‘prolonged period of time.’

2. ‘ To gain academic-level practical experience that corresponds with the competences acquired during the programme.’

a. I am not too sure if the internship was ‘academic-level practical’, whatever that may mean, but it certainly was ‘practical’ and the skills it required corresponded with the competences acquired during the incontrovertibly academic WEM programme.

3. ‘ To discover more about possible future professions/fields of work.’

a. I liaised with freelancers and discussed the world of text editing with colleagues

during my time at the UVC, which meant I discovered more about the world of text

and all of the professions it involves.

(9)

Personal goals

In addition to the academic outcomes, I had three personal goals to work towards. I have listed them below and described to what extent I think I have achieved them.

1. To gain editing software proficiency.

a. I used both Microsoft Word and Adobe Acrobat in order to post edit documents.

These two programmes – Word in particular – are most used by the UVC, so I would consider this achieved.

2. To gain translation software proficiency.

a. I learnt how to use SDL Trados for translating documents, so I this has been achieved.

3. To gain competency in project management.

a. This one is a little more questionable. As mentioned, on account of the corona pandemic, it was harder to cover the necessary components for project

management. Despite this, however, I was able to learn how the UVC approaches project management, the software it uses, and its general approach towards prioritisation. Accordingly, I believe this personal goal was achieved, but less

successfully than the other two, on account of comparatively how much time I spent with the software than with project management.

Additional points

There are some other points that are probably worth mentioning that I had not formally considered when entering into the UVC internship, but are still relatively important.

The UVC’s attitude towards work was a bit of an eye-opener. After minimum wage jobs and an excess of academic deadlines, my impression of the ‘right’ work ethic has developed into something along the lines of ‘work as hard as you can for as long as you can or else.’ Luckily, the UVC involved a slightly healthier working day that is substantially more conducive to work. If I go into freelancing in the future, which is currently the plan, then the work-break balance that I learnt at the UVC will prove invaluable.

One part of the UVC that I had only faintly considered when starting was corresponding with freelancers. As it turns out, this is quite a large part of the job. Prior to my time with the UVC, emailing people has always made me somewhat paranoid; a definite fear of how people will react to my faceless text was always at the forefront in this dislike of the medium. During my time with the UVC, I am happy to say that this unease was cured – by fire, rather than coercion. I was in regular contact with the freelancers on a daily basis by necessity. There were no incidents, disregarding two reactions to overzealous editing. The result is that the emailing process now seems much less daunting than it did before. In any line of employment, this is quite a useful phobia to recover from.

On the subject of the two non-ideal reactions, these were quite valuable experiences for seeing how the UVC dealt with potentially difficult clients. In the former case, I did a second-edit and sent it back to the freelancer. They sent it back having undone many of the changes I made with the explanation

‘this is wrong.’ I went through the document again incorporating the changes that I thought valid. I

returned this edit again, and again received the response ‘this is wrong.’ After some discussion about

this with Emi, we decided that the best course of action was to thank the freelancer for the feedback

and not incorporate their changes into the version of the document we sent off to the client. For the

second, I did a second-edit of a document where the freelancer had changed active sentences into

passive ones, treated some words as plural that debatably were not, and taken a great deal of what

(10)

seemed like technical terminology in the original and changed it. I spent a great deal of time on this second-edit, as I considered many of these errors to actively undermine the legibility of the text.

Many of the changes I made were matters of personal preference, but ones that I thought resulted in a clearer piece of work. The freelancer did not feel this way. I received an email informing me, at length, about the purpose of a second edit and how errors had been introduced into the text. On account of being away from the office when this feedback came in, I was not able to respond to it. I discussed this feedback with Iris, since Emi was on holiday and was able to take some useful

feedback away from this encounter, even though the thing itself was less than pleasant. These two experiences and the UVC’s responses to them were highly informative of what the ‘correct’ approach to these situations is and will presumably prove highly valuable in future for dealing with

professional conflicts of interest.

My performance with the UVC overall, I would describe as decent and I hope that my colleagues would agree. As discussed in the project section, there were some parts that could have gone better;

primarily working on the project in my room during a heatwave, however, for the day-to-day running

of the UVC, I think things went well. If I were to do the internship again, I would pay closer attention

to the time taken on editing and translating jobs and try to keep closer to the UVC’s estimates.

(11)

Conclusion

Fundamentally, I enjoyed my time at the UVC and gained a huge amount of experience, not only in the editing side of things, but in the interactions that I had with the freelancers, and in the attitudes of those I worked with. The best part of working for the UVC was the atmosphere. The approach taken by the team and the way that this seemed to osmose into the freelancers created a warm and welcoming environment from almost all parties, despite the lockdown necessitating distance. I was able to achieve everything I set out to professionally and academically, so am pleased with the outcomes of the internship.

Given my experiences with the UVC over the last few months, I hope to be able to freelance for them

for the foreseeable future.

(12)

Appendices

Appendix A: record of work

The following table contains a list of all the documents I worked on during my time at the UVC. The file names can be used to look up the assignments in the corresponding assignment folders of the UVC on the university Y-disk.

Date File Name Time Taken (hrs)

Editing Type

02.06.202 0

52643 1.25 DeepL PEMT

03.06.202 0

52812 0.9 DeepL PEMT

52524/52526

1.25 DeepL PEMT

04.06.202 0

52609 0.8 Proof Read

Translation 05.06.202

0

53227 0.25 2nd Reader

Correction 09.06.202

0

52622 1.5 2nd Reader

Correction

53102 0.5 2nd Reader

Correction 10.06.202

0

52959 a &b 2.75 Proof Read Translation 11.06.202

0

52892 8.75 2nd Reader

Correction 12.06.202

0

53954 5 DeepL PEMT

15.06.202 0

53901 1.25 Proof Read

Translation

53820 0.2 Proof Read

Translation

53821 a 0.2 Proof Read

Translation

53821 b 0.2 Proof Read

Translation

53896 0.2 Proof Read

Translation

54143 0.6 2nd Reader

Correction

54138 0.75 Proof Read

Translation

54148 0.25 Proof Read

Translation

(13)

54137 0.6 Proof Read Translation 16.06.202

0

54159 0.25 Proof Read

Translation

53921 0.25 Proof Read

Translation

54247 0.75 Proof Read

Translation

54254 1 Proof Read

Translation

53544 2 Proof Read

Translation

54257 0.5 Proof Read

Translation 17.06.202

0

54424 0.75 Proof Read

Translation

53972 1.5 Proof Read

Translation

54487 0.6 DeepL PEMT

18.06.202 0

54481 0.8 Proof Read

Translation

54534 1.5 2nd Reader

Correction

54361 0.5 Proof Read

Translation

54573 0.5 DeepL PEMT

23.06.202 0

54486 0.75 Proof Read

Translation

54581 1.25 DeepL Translation

24.06.202 0

54667 0.5 Proof Read

Translation

54786 0.25 Proof Read

Translation

54564 4 Correction

25.06.202 0

55080 2 DeepL PEMT

55081 1 2nd Reader

Correction

55131 0.2 Proof Read

Translation

55140 Proof Read

Translation 26.06.202

0

55129 1.25 Second Check

55204 0.2 DeepL Translation

54498 0.75 Correction

54941 0.75 Correction

30.06.202 0

52610 10 Proof Read

Translation

02.07.202 55325 0.45 Proof Read

(14)

0 Translation

55328 0.25 Proof Read

Translation

55198 1.25 Proof Read

Translation

55727 0.3 Correction

03.07.202 0

54641 5.5 2nd Reader

Correction 07.07.202

0

55936 1 DeepL PEMT

55933 1 Proof Read

Translation

55963 1 Proof Read

Translation 08.07.202

0

55966 1 Proof Read

Translation

55936b

3 Proof Read

Translation 09.07.202

0

56093 0.25 Proof Read

Translation

56104 0.6 Proof Read

Translation

55986 0.75 2nd Reader

Correction 10.07.202

0

56380 1.5 DeepL PEMT

14.07.202 0

56370 0.55 Proof Read

Translation 15.07.202

0

56540 1.25 DeepL PEMT

16.07.202 0

56257 3.5 2nd Reader

Correction

56438 3.5 Proof Read

Translation 17.07.202

0

56542 1.75 Proof Read

Translation

56853 2.5 Correction

20.07.202 0

56476 1 2nd Reader

Correction

56487 0.6 2nd Reader

Correction

55196 3.25 2nd Reader

Correction 21.07.202

0

56910 1 Proof Read

Translation 22.07.202

0

56307 0.25 Proof Read

Translation

56294 6.5 2nd Reader

Correction

(15)

56992 0.1 2nd Reader Correction 28.07.202

0

57025 0.3 Proof Read

Translation 29.07.202

0

57171 0.5 Proof Read

Translation

57115 0.5 2nd Reader

Correction 30.07.202

0

57040 2 Proof Read

Translation

57199 0.2 Proof Read

Translation 04.08.202

0

57270 1.25 2nd Reader

Correction

57283 1.25 Proof Read

Translation 05.08.202

0

57245 4 2nd Reader

Correction 06.08.202

0

57347 0.75 Proof Read

Translation

57378 0.2 Proof Read

Translation 07.08.202

0

57368 0.75 Proof Read

Translation

57404 0.75 DeepL PEMT

57402 0.8 Proof Read

Translation

57368 0.75 Proof Read

Translation 11.08.202

0

57403 0.2 Proof Read

Translation 12.08.202

0

57490 0.8 Proof Read

Translation

57503 0.75 Proof Read

Translation 13.08.202

0

57382 2.2 2nd Reader

Correction

57528 0.2 Proof Read

Translation

57530 0.25 Proof Read

Translation

57533 0.25 Proof Read

Translation

(16)

Appendix B: the project

The document below is the project I completed for the UVC with some deletions made in order to preserve client confidentiality.

Introduction

This paper seeks to offer an overview of some of the advantages and disadvantages of two machine translation services: DeepL and Microsoft translation services. To explore the benefits and drawbacks of these machine translators, this paper first analyses the ways in which machine translators conceptualise translation. Part two is dedicated to the primary methods of machine translation (MT), and their benefits and drawbacks. Part three focuses on the specific benefits and drawbacks of DeepL and Microsoft translation services. This third part focuses breaks down translation into six parts. There are: vocabulary, grammar, punctuation, spelling, technical aspects, and

untranslatable parts of text. Finally, a recommendation is made that relates the advantages and disadvantages of the translation services to the UVC’s

requirements.

List of Abbreviations CAT: Computer Assisted Translation

CBMT: Corpus-Based Machine Translation EBMT: Example-Based Machine Translation

HT: Human Translation

MT: Machine Translation

PE: Post Edit

SMT: Statistical Machine Translation TMem.: Translation Memory

RBMT: Rule-Based Machine Translation

UVC: Universitaire Vertaal en Correctiedienst

(17)

Conceptualising Translation

When thinking about how translation works, one can imagine the process of translation as has having any number of different steps. A grasp on these is important when trying to untangle how MT works. For example, the most basic theoretical model of translation has only two steps: the source text and the target language. However, this can increase up to five steps, depending on how elaborate one wants to be. The point of these models is to try to illustrate how the translation works within the human mind; the closer a MT can reproduce the human process, the better translations it can potentially produce. In total, there are four different relevant ways of conceptualising translation, which vary in their levels of complexity and effectivity.

The Two Stage Model

In its most basic form, translation has two steps:

1. Source text comprehension.

2. Target text formulation.

For a human translator, this would work as a person reading a text,

understanding it, then being able to translate it directly (Chan 34). For MTs, since they are not particularly conscious, ‘comprehension’ is not possible, so the two-stage model (counterintuitively) has three stages:

1. Source text input.

2. Processing through a system dictionary.

3. Word-for-word translation.

Because no attempt is made to think about the text grammatically, idiomatically, or so on, the two-stage model often produces translations that are literal to the point of not being understandable, so although the translation process is simple and relatively easy to programme, its results are far from ideal (Chan 34). Only the earliest MTs used this method, and they were quick to move away from it.

The Three Stage Model

The three-stage model is very similar to the two-stage model, but these points can be subdivided to give a better idea of some of the substages involved at each part:

1. Source text comprehension.

a. Comprehension can be broken into text parsing, specialised knowledge, and intended meaning.

2. Meaning transfer.

a. Involves lexical, grammatical, and rhetorical meaning.

3. Target text formulation.

a. Involves writer’s intention, readers’ expectations, target language norms.

For MT, three-stage model translators do more to interpret and translate, which means that they create better overall translations (Chan 37).

1. Source text input.

2. Processing through a system dictionary.

(18)

3. Fuzzy match and term translation.

4. Target text output.

Fuzzy match and term translation (partially recognised phrases/words), as mentioned in stage three refer to terms that are not precise translations of the source text but have been previously defined as being similar. Term translation refers to idioms and the like (Chan 37). Three-stage translation is substantially better than two-stage, however, it can still be greatly improved on.

The Four Stage Model

This model examines the translator as a fallible part of the system, who needs to understand the text from a subjective standpoint. This model has overtones of structuralism that do not really fit into this paper, however, the existence of this model raises the justifiable comment on the translation process as a whole, that translators cannot offer a wholly objective perspective on the texts they are transferring. All knowledge must be processed through their personal frames of reference.

1. Author’s knowledge.

2. Familiarisation of text with author’s frame of reference.

3. Original text decoding.

4. Target language meaning encoding (Chan 38).

Comparatively, MTs are as objective as possible, possessing only aggregate knowledge collected from translation memory that are created collectively.

For MTs, the process looks like this:

1. Source text.

2. Creation of ‘Multilingual Maintenance Platforms’ (a collectively produced translation memory).

3. Processing the text through a translation memory.

4. Target language file generation (Chan 38).

Both stages 2 and 3 are contributed to by project managers, translators, revisers, and experts. This stage is the first of the more detailed models that take the internet and collective translation memories into account specifically.

The Five Stage Model

This stage is ostensibly the most common model for computer aided translation (CAT) (Chan 38). It consists of:

1. Source text, or The Initiating Stage 2. Data Preparation Stage

3. Data Processing Stage 4. Data Editing Stage

5. Final target-language file generation or Finalizing Stage (Chan 39).

What is happening here mirrors the process of human understanding and interpretation:

1. Editing the source text 2. Interpreting the source text

3. Interpreting the text in a new language

(19)

4. Formulating the translated text 5. Editing the formulation (Chan 38).

This is the most common aspect that MTs adopt because it creates the greatest number of checks for a translation, so is the closest to how people approach translation without inventing superfluous steps (Chan 38). There are other ways of conceptualising translation with more steps, however, these tend toward the unnecessarily elaborate, so are not widely adopted (40).

Chapter Two: Machine Translation

Bearing in mind these theoretical approaches, we can now turn to MT to see how they are implemented in different types of machine translation.

Differentiating between types of MT is valuable, since different methodologies have different strengths and weaknesses. To evaluate which machine

translators are best for the UVC, the best approach is to look at each MT type and look at the benefits and drawbacks of each. This way, when the individual translators are looked at, their methodology can be used as shorthand for these general pros and cons.

Rule Based Translation Systems

Rule-based machine translation (RBMT) can refer to several types of

translation. While all of these share some elements of their approach, there are some universal truths that apply to all of them. The most basic universal truth about RBMTs is that they all require linguists to code in a series of rules (hence the name) that dictate how the they treat the main semantic, morphological, and syntactic aspects of target and source languages; in other words, they need someone to programme grammar and dictionaries into them, and they construct their translations from there.

The first type of programme, the direct transfer system, does this by directly interpreting the input words, however, RBMT does involve can involve more complex mechanisms that allow for a certain level of analysis, which can result in a better standard of translation overall. There are three primary forms of RBMT: direct translation, transfer translation, and interlingua translation.

Direct Translation

This is the simplest form of machine translation and relies on the two-stage model of translation. There is relatively little analysis of the text since the words are translated individually (Chan 110). Because direct translation programmes do not account for the context of words, sentence structure, or so forth, direct translations are often faulty (Walters and Pattel 44). The result is that direct translation is valuable for translating individual words but little else. The central idea behind employing direct translations was originally to trade off comprehensibility for speed and efficiency, however, thanks to advancements in widely accessible computer processing power and people generally wanting a better standard of translation, direct translation has largely become obsolete, giving way for methods of translation that are able to interpret as

well[CITATION Bow19 \p "and Clro 38" \l 2057 ].

Transfer Translation

(20)

Transfer is based on the three-stage model of interpretation. It takes is divided as follows.

1. Source language sentence is analysed and converted into abstract representations.

2. The abstract representation of the source text is translated into equivalent target language-oriented representations.

3. The final text is generated.

Transfer to abstract values can take on two main forms, although there are others: syntactic transfer and semantic transfer. In the syntactic transfer

approach, the transfer occurs mainly at the syntactic level. This means that the system may try to analyse the structure of sentences, directly translate some, and generate fresh grammatical structures for others (Chan 111). Semantic transfer is similar but tries to do the same for the meaning of the sentence as a whole, so is a less exact reproduction of the original text (111). The analysis component makes transfer translation substantially harder than direct translation, since it requires the MT to analyse the sentence into composite parts, such as nouns and adjectives, as well as correctly being able to reproduce the relationship between the components (Bowker and Clro 40).

Both kinds of transfer can prove a challenge since not all languages convey information in the same way. For example, ‘Esmee likes swimming’ in German is translated as ‘Esmee schwimmt gern’, but this translates literally as ‘Esmee swims gladly’ (40). This results in literal translation being often inadequate because the whole syntax of the sentence needs to be overhauled.

Simultaneously, translating text into abstract syntactic constructs can lead to a variety of translations which either the computer or the translator needs to choose from. Consider the information contained in the sentence ‘the white dog chases the red car’ and how many ways it can be rephrased:

- The white dog chases the red car,

- The red car was chased by the white dog,

- The dog, which was white, chased the car, which was red, - There was a white dog, which chased a red car

- There was a red car. It was chased by a white dog (41).

Translation software therefore needs to be able to choose which syntactic and semantic options are going most likely to be desired by the end users. This can result in some strange, if not outright incomprehensible translations.

Interlingua and Pivot Language Translation

Interlingua translation technically relies on the two-stage translation model:

analysis and generation (Chan 111). Interlingua translation takes the original

text language and processes it. In processing the text, interlingua machine

translation transforms the source text into an abstract representation of the

language that has little to do with either language. The target language is then

generated from these abstract signifiers. This is similar to the transfer system,

however, there are substantially more signifiers in the interlingua system,

meaning that the translations transferred can be much more detailed. One way

to conceptualise this is to think of the source text being stripped down into its

(21)

source code- the interlingua- then this source code being rebuilt into the target language.

Being able to conceptualise languages as abstract concepts means that languages can be adapted for this type of system much faster than for other systems; without having to programme interactions between the languages, it becomes a much easier process. It also allows monolinguals to set up their languages for translation, since they only need to be aware of the abstracted programming language. Similarly, it can handle languages that are substantially different since they all get interpreted the same way. The problem, however, comes in creating the abstraction process itself: the breadth and extent to which one would need to find counterparts for words and set phrases is extreme, and for enough languages, would be nigh on impossible because it would incur such great complexity (111).

As a result, rather than inventing an abstracted interlingua, many modern machine translators rely on a pivot language instead. In principle, what happens in pivot languages is the same as an interlingua: the source is translated into the pivot, then the pivot is translated into the target language. English is a more common interlingua to use on account of its popularity on the global scale. For example, one might want to translate a document from Mongolian into Swahili, but there are comparatively more translations of Mongolian into English and English into Swahili. Since there is a lot more aggregate knowledge for English translation – both into and out of – one keeps a lot of the nuance and intricacies that may have been worked out between Mongolian/Swahili and English, but not just Mongolian and Swahili.

Pivot language systems can carry over many of the advantages of the

interlingua system without the need for a whole new language to be invented, which is always positive. Bing Translator uses pivot translation and Google Translate originally relied on pivot translation before changing to a neural network.

With the trouble that interlingua translation incurred when transferring

meaning and logical translations, RBMT was largely moved away from around the 2000s, to be replaced by corpus-based machine translation (CBMT) that took into account corrected translations, so that there was some human influence over the output of the MTs (Bowker and Clro 37).

Corpus-Based Approaches

In order to sidestep the complexity of interlingua and pivot based translation systems, while still providing accurate translations, there has been a shift towards MTs that have more direct human input and rely less on computers interpreting and abstracting away from pre-existing languages (37). The

fundamental idea behind corpus based approaches is that two pairs of texts- one in the source language and one in the target language- are linked together to create a parallel corpus; this means that “each sentence in the source language is linked to its equivalent sentence in the target language text” (42). The MT system can then consult this parallel corpus to review how words, phrases, or sentences, have been previously translated and, using this information,

formulate a new translation (42).

(22)

Many of the approaches that CBMT involves existed prior to the phasing out of RBMT. It was not really until 2000 when CBMT became mainstream because it worked better within the computer structures that had come to prominence (37). The primary conceptual difference between RBMT and CBMT is that the former is based on rational, inductive techniques, while the latter involves empirical, deductive techniques. Practically, this means that all the language used by CBMTs comes from previous, often real world uses, even if the

assemblage of the translation is something wholly artificial. There are two main types of corpus-based translation: example-based machine translation (EBMT) and statistical machine translation (SMT), each with their own benefits and pitfalls.

Example-Based Machine Translation

When outlining the grounding of EBMT, Makoto Nagao established the idea of translating by analogy: his theory was that translation is not done through deep linguistic analysis, but by breaking up a sentence into fragments, translating these fragments, and then putting them back together into a sentence that makes sense [CITATION Har99 \p 116 \l 2057 ]. In practice, this means that EBMT consists of two main parts: firstly, finding either full sentences or

fragments in the sentence-aligned corpus; and secondly, replacing, modifying, or adapting these fragments to generate a translation[CITATION Abd12 \p 25 \l 2057 ]. These fragments can consist of whole sentences or parts of sentences;

they are, in effect, individual sentence components (e.g. adverbial phrases and verb phrases), which makes the process a little bit more understandable.

EBMTs offer some distinct advantages over RBMTs, however, these advantages are often double edged – they bring improvements, but a whole new set of disadvantages. The sections that EBMTs work with are highly compatible with the way that people approach translations (Somers 119). Consider how Trados breaks texts up into sections – EBMTs are similar, but with smaller ‘chunks.’

The chunks, for example, are often too small to be practical for non-MTs to understand. Sentences and phrases end up in so many pieces, that making sense of a source text can become an impossible task. When setting up a corpus to begin with, programmers must decide how small they want their text chunks to be, and how the corpora align. The challenge with this is that the bodies of text within each corpus must be broken up into one, two, or more words. The chunks need to be broken up into small fragments so that when they are used for translation, the MT can recognise them. There is a balance that needs to be found between retaining the comprehensibility of a phrase and its usefulness in an MT setting.

Finding parallel corpora is easier than one might initially anticipate, which is an

advantage of EBMT. Bodies of parallel texts can be drawn from a variety of

sources: one of the most popular sources of parallel corpora is the Canadian

Hansard, which consists of English and French transcripts of parliamentary

debates (Bowker and Clro 42). This is an extensive set of parallel texts owing to

its size and range of topics. The trouble with it is that the translations drawn

from corpora are often worded and formatted like the substance of the corpora

themselves. In the case of the Hansard, the corpus is – surprisingly – highly

political, so not applicable for every type of translation: it uses a different

register, and the frequency of certain words (namely ‘bravo’) is

(23)

unrepresentative of French and English at large (43). The result is that

translations using this system can end up sounding political whether they are political or not. The solution to this is to combine this corpus with another, so that it has a variety of language. This widens the applicability of the corpus but brings in its own set of problems. With an expanded corpus, any translation put into an EBMT is faced with having a variety of translations of a source language input. Once again, the ambiguity of language strikes and there is little an EBMT can do to fight against it. Given these issues, statistical approaches precipitate from the EBMT approach, since these equip the translation software with the ability to differentiate and choose between different potential translations.

EBMT can create more authentic sounding translations, however, owing to difficulties in finding appropriate corpora, aligning them correctly, fracturing sentences correctly, and all of the work that needs to go into them, EBMTs are not the best of systems and, as far as is obvious, there are no commercially available exclusively example-based machine translators.

Statistical Machine Translation

To grossly oversimply Statistical Machine Translation (SMT), it works by

creating parallel corpora and relying on probability calculations to work out the rest (43). The source text is first decoded into phrases, like EBMT, however in SMT, these fragments are not dependent on linguistic modelling – i.e.

conventional sentence structure – so are substantially less digestible for human translators. These phrases are translated into the target language by using the parallel corpora, then reordered in the target language, resulting in the final translation. At every point, the MT relies on algorithms to indicate the sequence of words and the most probable translations of each fragment, so statistically, this is the most likely of all MT systems so far (44).

SMT has a few advantages that distinguish it from EBMT, however, these advantages are often double-edged so come with issues of their own. SMT draws its data from parallel corpora, so are particularly good at creating texts similar to these corpora. For example, if an SMT uses legal documents to make up its corpora, then it will be better for translating legal documents, much like EBMT. On the flipside, however, this means that they have particular

weaknesses too; SMTs drawing from legal corpora will not be as strong with technical documents and so forth (44).

Like EBMT too, SMT draws from pre-existing texts rather than creating word formulations themselves. This means that they rely on real world language. This means that technically, SMTs should give better, more human translation than non-statistical approaches. However, owing to the way that SMTs need to break up sentence fragments, this advantage is not always guaranteed.

The algorithms that go into SMT are non-language specific, so can be

transferred between different corpora and languages to translate between

various languages. This means that it is technically easy to create multiple SMT

systems for multiple languages, or multiple systems that can each deal with one

specific area (Nguyen and Shimazu 149). The downfall of this is requiring

multiple translation systems running simultaneously and it does not solve the

biggest problem SMT faces: its corpora.

(24)

Like EBMT, SMT requires carefully curated corpora that are representative of the target language are necessary. The trouble with SMT is that it requires substantially larger corpora than EBMT, which equates to substantially higher investments being necessary at start up and difficulties with creating matched corpora (Bowker and Clro 44). SMTs, like EBMTs, also have the issue that they require matching corpora: this means that rare language mixes are still an issue and can take a great deal of time to assemble and curate. Similarly, SMT cannot translate rare or unusual words. Since SMT focuses on the most common

matches within its corpora, any specialised or unusual words from the source language will not be translated properly into the target language but will be replaced with a more statistically regular alternative (44).

Neural Networks

At the centre of neural networks is ‘deep learning.’ In deep learning, you take your raw data is taken and converted into vectors. In the case of MT, this

involves the computer splitting the text up into fragments, so words or phrases, much like SBMT, then converted into a vector. Once the raw data is in vector form, it passes through several of checks (or ‘hidden layers’) until it reaches the final set of nodes (the ‘output layer’). At these final nodes, the vectors are

converted back into data; the target language in this case. The output language is then compared to pre-defined ‘correct’ answers.

In the case of MT, this means that the output language is compared to a pre- existing translation. If the output matches the translation, the network ‘learns’

that the path the data took through the nodes, for that one specific vector is

‘correct’.

If it is translated incorrectly, the network ‘learns’ what not to do, so if the same vector travels through the same path, it is unlikely to be correct and it is not used as an output. If translated correctly, vectors are more likely to be fed through a particular path and to a particular output, up until there is a full match between the original vector, the path between it takes through the hidden layers, and the correct output.

In order for the network to learn, the same raw data is fed through the network

repeatedly, until that data is all processed ‘correctly’ every time it is fed through

the network. This then needs to be done for every word in the language with as

many parallel corpora as possible.

(25)

Fig. 5: An example neural network (McDonald)

Neural translation has a number of advantages over its counterparts. Firstly, neural networks are capable of constantly learning from the data that is put into them: this means that the longer they are in use for, the better the translations (Bowker and Clro 45). NMT also requires relatively less human input than SMT, since one can just fill them with parallel corpora then leave them alone to

‘learn’ from them.

One great advantage that NMT has over SMT is that when inputting data, programmers do not need to tell the machine how to break up the text: this means that the text can be broken up into words, fragments, or full phrases (45). Since the NMT learns repeatedly running the text through its systems, it can store both individual word translations and translations of phrases of which the word is part. This means that NMTs can ‘pick up on’ contextual clues within sentences, allowing for more accurate translations than SMTs can (45).

NMTs can also be ‘taught’ to translate in between two languages that do not have parallel corpora by using an interlingua (45). This works in a very similar way to RBMT interlinguas, however, rather than translating directly into an interlingua, the NMT will take the source language, run it through its network to get to the interlingua stage, but not convert the vectors back into text. These vectors are then run through the NMT again to get to the target language, where the vectors are transferred back into text (45).

Despite its advantages, however, NMT is still not perfect. There are a few main problems. Much like SMT, it struggles with rare words because it has less instances of them to learn from, and thus less understanding of the context around them (45). NMT also struggles with longer sentences. Because long sentences are unlikely to have direct equivalents in the corpora that NMT relies on, the translations it offers in this context are often of quite low quality

(Pouget-Abadie, et al 1). The issues surrounding double meanings and puns still exist too: even with context, these can be problematic for machine translations, so there still needs to be advancement in that field too (Zhao and Zong 1).

Fundamentally, NMT is a step in a different direction from SMT. It brings with it a set of new advantages and drastically different advantages, but disadvantages too, that need to be dealt with in turn.

Neural networks are complex things, so a list of useful links to webinars has been included that should help with dealing with them- these works were not cited as such but are good supplementary pieces.

Supplementary works

3Blue1Brown. But What Is a Neural Network? | Deep Learning, Chapter 1, 2017, www.youtube.com/watch?v=aircAruvnKk.

The first part in a series about neural networks and how they work.

Possibly the clearest explanation that integrates technical components

without them being too painful.

(26)

Code Parade. Avant-Garfield - Creating New Comics With Neural Networks, 2018, www.youtube.com/watch?v=wXWKWyALxYM.

Offers a bit of insight into how deep learning algorithms work by showing how they can be used to ‘create’ Garfield comics. Not necessarily too useful for NMT, but it is useful for seeing how deep learning works in a more visual context.

Fullstack Academy. Neural Machine Translation Tutorial - An Introduction to Neural Machine Translation, 2017,

www.youtube.com/watch?v=B8g-PNT2W2Q&t=500s.

Excerpt from a seminar about the benefits of NMT and a basic overview of how they work.

Rohrer, Brandon. How Deep Neural Networks Work, 2017, www.youtube.com/watch?v=ILsA4nyG7I0&t=951s.

A longer introduction to neural networks that contains more of the

mathematics that underlie the deep learning process. Useful, although a little technical if you are not properly braced for it.

Unbabel. Cutting through the Hype of Neural Machine Translation | Professor Andy Way, 2017, www.youtube.com/watch?

v=5RGgPHyCu94.

Fifteen-minute interview discussing the origins of machine translation and some of its shortcomings. It offers quite a critical perspective in comparison to many of the other NMT oriented videos so helps to put things into perspective.

Viau, Greer. Neural Network Learns to Play Snake, 2018, www.youtube.com/watch?v=zIkBYwdkuTk&t=86s.

A short, non-verbal video detailing how AI can learn to play snake. Less

translation oriented, but very useful for figuring out how neural networks can

be ‘taught’ correct information inputs and wrong ones.

(27)

Chapter Three: MTs in Practice

The next section looks at two machine translations and analyses them in relation to the UVC’s current needs. There are seven elements considered in total: vocabulary, grammar, punctuation, spelling, technical aspects,

untranslatable aspects, and other concerns. After looking at all of these, a conclusion is given stating whether the MT is fit to replace human translators.

It is clear from the outset of reading the results of both translators that neither is of human translator standard: the reasons for this should be clear in the analysis. As a result, I have not focused on human parity, but have instead looked at internal inconsistencies and issues that prevent the work from

approaching it and what a post edit (PEMT) would need to involve to make the translations of passable standard. This mainly involves analysing internal consistency (or lack thereof), that would not be an issue with human translators, and considering what might trigger these issues, so that clear strategies can be developed to deal with these issues ahead of time. The main issue that both DeepL and Microsoft translator have is that they are both inconsistent in how they approach translation, which results in any strategies beyond ‘just beware’ being quite challenging. This is made doubly so by the fact that both translators use neural networks; because the inner workings of

machine learning are so vague, it is difficult to say exactly where the translators go wrong, only that they do.

For both translators I used document number 51611 as the primary text to look at the behaviour of the translators because the document involved both

(relatively) technical components and rhetorical prose too. For DeepL, there are some other documents involved too, because across the course of this

internship I have used it more, so have more experience with it going wrong than Microsoft translator. Similarly, owing to DeepL’s compatibility (relatively speaking, but more on that later) with Trados, it has been possible to cite exact segment numbers for the DeepL translation of 51611, whereas only page

numbers have been possible for Microsoft translator.

(28)

DeepL

In order to analyse whether DeepL is a worthwhile investment for the UVC, I will look at six criteria: technical aspects, vocabulary, grammar, punctuation, spelling, and untranslatable aspects. The one consistent theme throughout all of these components is DeepL’s inconsistency. Each section is organised with the most prevalent/worst issue at the top, and the more occasional, less detrimental issues at the bottom.

Vocabulary

The biggest problem with DeepL is its inconsistency. It struggles with brevity as well as consistent translation. For the UVC this could prove problematic

because most of the translations are departmental emails and academic papers, which are largely quite formulaic.

Tone

DeepL is largely effective at retaining the tone of its source texts because it tends to translate the words themselves with adequate accuracy. For example, the introduction of 51611 conveys the somewhat dramatic rhetoric of its source text. This being said, it occasionally fails to capture the tone of the source language, and this is mostly noticeable on the single sentence level, rather than the overall tone of a sentence. This happens most often when DeepL uses overly harsh translations. In 51611, for example, “de domeinen versterken elkaar niet”

(section 99) is translated as ‘the domains fail to mutually reinforce each other.’

Superficially, these mean the same thing, however, there is a gulf of difference between not doing something and actively failing at it.

Similarly, occasionally the tone is a bit too literal, which results in the

translation coming across as stilted. Again in 51611, “meerjarenvisie” (section 100) is translated as ‘multi-year vision’, which is technically correct, whereas

‘long-term vision’ would perhaps sound more fluent. Alternatively, “de

universiteit staat in een nationale en internationale context” (section 127) is translated as ‘the university is in a national and international context’, where

‘the university functions in a national and international context’ might be more appropriate.

As a result, for translating documents where the wording needs to be precise, such as in scientific articles, legal documents, and university documents, this can be problematic, so any DeepL translation requires a careful PEMT to ensure that the correct information is actually retained. Similarly, it takes quite a close PEMT to convert the document into something that sounds ‘fluent.’ It is difficult to put an exact reason on what makes DeepL’s translations sound stilted outside of tone, but it often comes down to matters such as those mentioned above. It is impossible to put an exact strategy fixing these either, since the way in which these words are translated can be greatly different depending on how DeepL feels.

Brevity

Occasionally, DeepL will find a shorter form of a word where a longer one is

needed. For example, “Inhoudsopgave” (section 7) is translated as ‘Contents’,

rather than ‘Table of Contents’ in the translation of 51611. This does not create

a problem with meaning per se, but it is an issue because these papers need to

(29)

be formulaic and to produce works reliably and consistently. Interestingly (and impractically), DeepL translates ‘Inhoudsopgave’ differently when it is

translated as part of a segment copied from Trados, than when it is copied as part of a block of text (as ‘Contents’ and ‘Table of Contents’ respectively).

Sometimes this verboseness can also be a result of too literal translation of the Dutch source text too. For example, “de wijze waarop” (section 150) is

translated as ‘the ways in which’, where ‘how’ would do. The opposite is also occasionally true: in the case of ‘tegelijkertijd’ (section 304), DeepL translates it as ‘at the same time’, rather than ‘simultaneously’, or ‘concurrently’. This is not incorrect, but it can result in wordy translations. Similarly, “met enorm veel vertrouwen” (section 50) is translated as ‘with great confidence’, whereas making an adverb ‘confidently’ would do. For the UVC, this could present a problem because DeepL cannot be relied on to treat documents with

consistency.

The issue here is that the use of the more simplistic equivalent lowers the register of the piece from academic to something more informal. This is not necessarily an issue if the document being translated is something that allows for that kind of informality, but if it is, then this needs substantial reworking in PE.

Contractions

Another recurring issue with DeepL is its problems with contractions: most of the time, it contracts, but this is not always the case. For example, in the DeepL translation of document 51611, ‘kon het niet’ is translated as ‘couldn’t’ (section 245), or in the translation of the internship handbook, ‘ze dit niet’ is translated as ‘don’t’ (page 3). Similarly, in the handbook, ‘will’ is often translated as the suffix ‘-’ll’ (e.g. pgs. 3, 5, 13). The result is that in any PEMT of DeepL, one must be aware that this is a problem, but an irregular one. It also seems to have become less of an issue too: the handbook DeepL translation had multiple instances, but of late, there have not been as many contractions. For the UVC, this is not an absolute dealbreaker, but something that should be of concern because it does affect the register of any translation, and with it being irregular, anyone doing a PEMT has to be constantly aware that it might be there. There does not seem to be a clear reason why DeepL does this intermittently.

Consistency

It also translates the same word as different things, which again, is an issue in formulaic, prescriptive pieces, and it can be an issue for understanding. For example, from 51611, “Aanleiding” (sections 93 and 94)is translated as

‘Background’ and ‘Introduction’. These translations occur within separate Trados segments, so it is not the weakness innate of NMT with longer

sentences, but inconsistency in DeepL itself. Again, for highly structured pieces that require consistency, this is wholly non-ideal for the UVC. It requires PE to fix these, and overreliance on PE does sort of defeat the point of exclusively relying on MT.

Foreign phrases

DeepL does a reasonable job of using foreign language loan phrases in a

relatively native way. For example, in 51611, “bestaansrecht” (section 73) is

(30)

translated as ‘raison d'être’, which is quite a nice way of approaching it.

However, this can also be a bit of an issue as well, if one does not want such loan phrases. The challenge with phrases like these is that they can come across a little pretentious, so it is a relief that DeepL only seems to use them in

moderation.

Mistranslations

Occasionally, DeepL translates words completely incorrectly. The worst offender came from document number 53954, explaining a room shortage with the Dutch word ‘zalentekort’. This was translated as ‘salmon shortage’. This type of issue is not common at all, but understandably, is detrimental to the credibility of a translation if it slips through the net.

Grammar

Much like its approach to vocabulary, DeepL deals with possessives

inconsistently too. From 51611, DeepL translates “reacties medewerkers”

(section 374) as ‘employees’ reactions’, with the possessive apostrophe, but translates “reacties klanten” (section 387) as ‘reactions of customers’, using ‘of’

instead. The difficulty here is that it requires PE to ensure that changes are made consistently and reliably. This means that for the UVC, DeepL’s approach to possessives cannot be relied on.

Similarly, DeepL does not consistently translate determiners in the case of words that it is not entirely familiar with. 51611 addressed the UG’s IT

Department and used the acronym ‘CIT’ throughout, usually referring to it as

‘het CIT’. DeepL mostly translated ‘het’ into ‘the’, however, it did not do so reliably, requiring a PE to add in all the instances of the word that it left out. For words where the algorithm ‘knows’ the terms, this is less of a problem, and it translates determiners consistently. This could be an issue for the UVC, when papers that use a lot of acronyms (e.g. science or technical papers) need to be translated.

Pluralisation

Pluralisation can be odd. In English where things might be considered singular, they are translated as plurals. For example, in 51611, ‘doelen transitie’ is

translated as ‘goals transition’. This is technically correct; however, it sounds a little bit clunky. ‘Goal transition’ would sound better, or at least more fluent.

Mostly, however, pluralisation is not an issue.

Capitalisation

Capitalisation is largely reliable, some words it capitalises unnecessarily. This is

a relatively uncommon problem and was only encountered once: in 51611, it

capitalised its translation of ‘commissie’ (section 95( (‘Committee’) without any

prompting or seeming call for it. One potential explanation for why it did this is

that was drawing from a ‘committee’ in the corpora that it learnt from. If this is

the case, there are likely to be a number of other terms that the algorithm has

learnt to format incorrectly, so, any documents translated by the UVC will

require PE to find and fix these errors. Much like many of DeepL’s other issues,

this is not detrimental, because it does not actively impair meaning, however, it

does result in potential inconsistencies and typos.

Referenties

GERELATEERDE DOCUMENTEN

This command tells the translator package, that at the beginning of the document it should load all dictionaries of kind hkind i for the languages used in the document.. Note that

The tasks and responsibilities that I undertook during my placement with Crimson were related to research and communication for the ‘International New Town Institute’ and

For instance, Nuñez-Velasco and colleagues in their Mexican guideline for awake surgery, only provides the recommendation of certain type of tests paradigms

Jensma to do an internship under his supervision for the Centrum Groninger Taal en Cultuur (Center Groningen Language and Culture) focusing on the regional language

The module description of Beginners’ Dutch states that the module “will introduce students to the Dutch language both in its written and spoken form so that by the end of the

Once the article was selected, we created reading questions to assist in comprehension (see Appendix D), and a three-week lesson plan consisting of a discussion seminar, a writing

I learnt that I can work very well with many different people, I have a very organised working style and like to keep track of all my tasks, I have learnt that my writing skills

[r]