Blog Post

An experiment with Google Translate in introductory-level language courses

The Internet offers innumerable resources for teaching and learning a foreign language. It is difficult to imagine a second-language classroom today without YouTube videos, digital galleries, virtual tours, and up-to-date information about the communities we study. Additional tools such as video conferencing, wiki creation, and discussion forums provide further opportunities for students to create meaning in real-life contexts, to become producers of knowledge rather than consumers of packaged materials. Although educators remain divided over to what extent technology should form a part of classroom activities, few would deny that the web has enriched traditional teaching methods and has encouraged us to imagine new ways of exploring language and culture with our students.

 

Certain technologies, however, have proved more difficult to integrate than others. Perhaps Google Translate (http://translate.google.com/) poses the greatest challenge to language instructors. Although other automated online translation services exist, Google Translate is the most popular option for students. Instructors understandably fear that the use of Google Translate fosters laziness and impedes the development of reading comprehension skills, as students will simply generate an English-language version of texts that they wish to understand. Nuances of meaning will, naturally, be lost, but the gist of the text generally shines through despite machine imperfections. Consider the following paragraph, which I obtained through Google Translate:

 

Poland - a unitary state in Central Europe [5] [6], situated between the Baltic Sea to the north and the Sudeten and Carpathian Mountains to the south, in the basin of the Vistula and the Oder. Polish administrative area is 312 679 km ² [b] [1], which gives it 70 place in the world and ninth in Europe. Population of over 38.5 million people [1], is in terms of population 34 place in the world [7], and the sixth in the European Union.

 

This isn’t impeccable English, but one understands quite clearly that we are discussing the basic characteristics of Poland. In fact, this paragraph was translated from the beginning of the article on Poland in the Polish Wikipedia (https://pl.wikipedia.org/wiki/Polska).

 

Language instructors’ main fear, however, is that students will use Google Translate for text production, writing their compositions in English and then translating them into the language of study. Alongside the concern that this will hinder the development of writing skills, instructors understandably feel that students are cheating. I know of courses that consider the use of automated translation on written assignments a violation of academic honesty, subject to a grade of zero and potential harsh penalties from the institution. This is probably a common practice throughout the country.

 

In conversations with my fellow language teachers, two conclusions emerged most frequently: (1) Google Translate produces nongrammatical or unnatural language and, as a result, (2) Instructors have no difficulty recognizing when students have used Google Translate on an assignment.

 

The seeming self-evidence of these statements inspired me to build an experiment to test them. I have attempted to determine to what degree, and due to which features of language, instructors of an introductory-level course would be capable of detecting the use of Google Translate. First, I created a paragraph in English that reflects, according to my experiences, the content and linguistic sophistication of what a first-year language student might want to say. Second, in collaboration with colleagues who generously shared their time and expertise, I Google-translated this English text into the language of study and examined the result as if a student had submitted it for an assignment. In every case but one, instructors were asked whether they would suspect the use of machine translation and what features stood out. The languages considered here are Spanish, Portuguese, Italian, French, Haitian Creole, and Russian. I have listed the contributors’ names in the corresponding sections, and I thank them very much for having made this project possible. Any errors in this post, however, are my sole responsibility. 

 

Here is the prompt that I created and responded to, a common assignment in introductory language courses:

 

“You are writing an e-mail to a penpal in [insert relevant country name here]. Tell your penpal about yourself and your hobbies. You should also ask your penpal a couple of questions about his or her favorite activities.”

 

Here is the paragraph that I composed in English, which students would then presumably translate into their language of study:

 

ENGLISH

My name is John Learner and I’m from Columbus, Ohio. I am twenty-one years old and I study Biology and Chemistry. My hobbies are listening to music, playing football, and hanging out with friends. My best friend’s name is Mike. He studies Economics and Communications. On the weekends, we go to a couple of parties or catch a game on TV. What about you? What do you like to do in your free time?

 

Comments

I attempted to confuse the system by using a common noun as the fictional last name. Although the language is rather simple throughout, I also tested Google Translate by using the idiomatic phrases “hanging out with friends” and “catch a game.”

 

Here are the results in the languages sampled, followed by brief comments in response to my guiding question. The goal was not to point out errors, but rather to determine where language use deviated from common production and thus suggested the use of machine translation. The contributor’s name and, if applicable, HASTAC page are included in parentheses after each language.

 

SPANISH

Mi nombre es John Learner y soy de Columbus, Ohio. Tengo veinte y un años y estudio la Biología y Química. Mis aficiones son escuchar música, jugar al fútbol, ​​y pasar el rato con los amigos. El nombre de mi mejor amigo es Mike. Estudia Economía y Comunicaciones. Los fines de semana, vamos a un par de fiestas o disfrutar de un partido en la televisión. ¿Qué hay de ti? ¿Qué te gusta hacer en tu tiempo libre?

 

Comments

No egregious features. Google handled “hang out” and “catch” remarkably well. The use of the infinitive “disfrutar” (“to enjoy”) instead of a conjugated verb is odd but could be interpreted as the student’s oversight. The phrase “Qué hay de ti?” is the only place in the text where a teacher could suspect that a machine had been used.

 

 

PORTUGUESE

Meu nome é John Learner e eu sou de Columbus, Ohio. Tenho vinte e um anos de idade e eu estudar Biologia e Química. Meus hobbies são ouvir música, jogar futebol, e sair com os amigos. O nome do meu melhor amigo é o Mike. Ele estuda Economia e Comunicações. Nos fins de semana, vamos a um par de partes ou pegar um jogo na TV. E você? O que você gosta de fazer no seu tempo livre?

 

Comments

Very successful overall, but there are some terms that suggest the use of machine translation. The infinitive verb “estudar” (to study)shortly after the conjugated “tenho” (I have) is more difficult to view as an oversight than in the Spanish translation. Google did well with “hang out,” but “pegar um jogo” is not natural. The clearest evidence of an automated translator, however, is the translation of “parties” as “partes” (as in a legal dispute), when students will likely have seen “festa” many times in class.

 

 

ITALIAN

Il mio nome è John Learner e vengo da Columbus, Ohio. Io sono 21 anni di età e studio Biologia e Chimica. I miei hobby sono ascoltare musica, giocare a calcio e uscire con gli amici. Il nome del mio migliore amico è Mike. Studia Economia e Comunicazione. Durante i fine settimana, andiamo a un paio di feste o di vedere una partita in tv. Che cosa dici di te? Cosa ti piace fare nel tuo tempo libero?

 

Comments

As before, Google successfully translated “hang out.” Similarly to the Spanish and Portuguese translations, the Italian translation fails to conjugate “we catch” (here as the infinitive “vedere”), but at least it picked up the idiomatic notion of “viewing.” The most obvious error in this translation is “io sono 21 anni,” as Italian uses the verb “to have” with age. However, this is a common mistake, and teachers could easily attribute it to the interference of English, rather than to the use of an automated translation.

 

 

FRENCH

Mon nom est John apprenant et je suis de Columbus, Ohio. Je suis âgé de vingt et un ans que j'étudie la biologie et de la chimie. Mes hobbies sont à l'écoute de la musique, jouer au football, et sortir avec des amis. Le nom de mon meilleur ami est Mike. Il étudie l'économie et des communications. Le week-end, nous allons à un couple de parties ou assister à un match à la télévision. Qu'en pensez-vous? Qu'aimez-vous faire dans votre temps libre?

 

Comments (Danielle Picard: http://www.hastac.org/users/drpicard)

Successful, but some obvious glitches that stand out. The first sentence indicating name is grammatically correct, but “Je m’appelle John” is more commonly used and is typically the very first lesson of a new language learner. The last name “Learner” was translated as “apprenant” when it should have stayed “Learner”. This is something that is obviously done by a translation service and not a beginning learner. (side note: I have been in French classrooms before where someone whose name was Bill was google translated to “Cheque” and it became very obvious that a translator was used as the student read the story aloud). The construction of age is incorrect; most beginners incorrectly use “Je suis 21 ans” without the “âgé”(the correct formation is “J’ai 21 ans.”). The “que” following “ans” stands out to me as an abnormal beginner mistake. The colloquial “to catch a match” was translated in a strange way as is the noun “parties” (soirée, fête, and boum are learned early); these two mistakes are not done in ways common to new learners in my experience.

 

HAITIAN CREOLE

Non mwen se John elèv k ap aprann ak mwen se soti nan Columbus, Ohio. Se mwen menm ven-yon sèl ane fin vye granmoun ak mwen etidye Biyoloji ak Chimi. Pastan mwen an ap koute mizik, jwe foutbòl, ak pandye soti ak zanmi yo. Non pi bon zanmi mwen an se Mike. Li te etidye ekonomi ak kominikasyon. Nan wikenn yo, nou ale nan yon koup la pati yo oubyen kenbe yon jwèt sou televizyon. Ki sa ki sou ou? Ki sa ou renmen fè nan tan lib ou a?

 

Comments (Annette Joseph-Gabriel: http://www.hastac.org/users/annette)

The translation is a bit of a mess. You threw the program a curveball with the name Learner, which it tried to translate literally. Expressions of age were also literally translated and therefore incomprehensible. For example, in Haitian Creole (as in French) you would use the verb "to have" rather than "to be" for age. Oddly enough google translator doesn't seem to have this problem when translating English to French. The literal translation of "old", i.e 21 years old, results in something like 21 aged/elderly person. Finally, colloquial expressions like “hanging out” also came out so literally that the image was rather gruesome.

On the whole, it did an abysmal job with grammar but a decent job with vocabulary, e.g. the different majors mentioned. I think the translator has a better time with some languages than others.

 

RUSSIAN

Меня зовут Джон учащихся и я из Колумбус, штат Огайо. Мне двадцать один год и я учусь биологии и химии. Мои хобби слушать музыку, играть в футбол, и гулять с друзьями. Имя моего лучшего друга Майк. Он изучает экономику и связи. По выходным, мы идем в нескольких партий или поймать игру по телевизору. А как насчет вас? Что вы любите делать в свободное время?

 

Comments (Jason Strudler)

NOTE: Jason was the only contributor who had not been notified in advance that the sample had been machine translated. (The other participants volunteered after I had tweeted my plan for the experiment.) Jason immediately concluded that Google Translate had been used, and he stated that “for Russian it's quite easy to tell.” He kindly provided a list of what features had led him to this conclusion:

 

1) Inconsistencies in grammar usage.  "Меня зовут Джон" is correct, but "Имя моего лучшего друга Майк" is much more of an anglicism in this context.  A student of Russian should use the first construction both times.  Similarly, "Он изучает экономику" is correct, but "я учусь биологии и химии" uses the wrong study verb for the context.  Here, the constructions are more complex, so more mistakes are possible, but inconsistency of usage is again a tip-off.

 

2) Radically incorrect grammar.  The phrase "мы идем в нескольких партий" employs two different cases after the preposition "в," and both are very incorrect.  A student of Russian should easily be able to avoid this.

 

3) Clear machine translation.  The word "связи" can indeed mean "communications," but never as a subject of study, and machine translation is much more likely to offer such a translation as the first option than a normal dictionary.

 

4) Occasional nonsense.  The word "учащихся" plays no comprehensible role in the first sentence.  It has to be a machine translation failure to understand context.

 

5) Extreme anglicisms.  The phrase "поймать игру" is not possible in Russian.

 

6) The combination of sophisticated grammatical structures with a total failure to employ them properly.  This is especially suspect given the large number of errors (grammatical and otherwise) that you encounter throughout.

 

 

 

CONCLUSIONS
This experiment used an extremely small sample. Real students would produce a much greater variety of texts, which would test the system in ways not contemplated here. Even so, two conclusions seem possible. First, that the accuracy of the text, which here might be understood as the “degree of invisibility of machine translation,” varied among languages. Perhaps this has to do with linguistic features: Russian’s case system constituted a particular difficulty. However, I suspect that Google’s methods are the principal cause for such variance. As Google explains:

 

“The more human-translated documents that Google Translate can analyse in a specific language, the better the translation quality will be. This is why translation accuracy will sometimes vary across languages.”

https://translate.google.com/about/intl/en_ALL/

 

It would seem logical for there to be a greater number of available documents in Spanish and French than in the other languages considered here.

A second conclusion is that, when things go wrong, instructors have no difficulty identifying the use of machine translation. Experience in the classroom teaches us what errors to expect students to make. When errors fall outside those expectations and are otherwise unexplainable, machine translation becomes the primary suspect.

However, at least in the languages that I evaluated for this experiment, the availability of online dictionaries, a legitimate resource for many instructors, complicates the problem. In the Spanish example copied above, the phrase “pasar el rato” strikes my ear as strange (I would say “pasar tiempo”). Even so, a visit to WordReference (http://www.wordreference.com/es/translation.asp?tranword=hang%20out) provides “pasar el rato” as the translation for “to hang out,” and therefore it is easy to imagine a student visiting the website, finding this translation, and including it in the assignment.

For me, it is a question of reasonable doubt. Instructors’ “gut feelings” may often be correct, and it is easy to suspect that a student has used Google Translate; but it is difficult to prove it, especially when other resources, such as dictionaries, may intervene. Key pieces of evidence are gross inconsistencies, as in the Russian example, or the inappropriate translation of proper nouns, as in the humorous anecdotes from the French example.

Yet, if we try to teach students to produce knowledge for the real world and to use language to complete tasks in everyday life, we need to accept that they will inevitably recur to Google Translate, as I did when attempting to understand Polish. The key to fostering skill development is to show students the points where machine translation fails and, when appropriate, to design tasks that discourage its use. If the assignment is no longer to write a paragraph that only the instructor will read but rather to contribute to a class project or participate in an Internet community, students may decide not to run the risk of using machine translation and looking ridiculous.

 

 

 

 

 

157

4 comments

Steve--a great question and what seems to be a really useful approach to the issue.  

I don't teach language, but I have a possible idea based on what you said.  Since (in general) your findings are that the machine translations are getting things usually right, but not exact, it seems that assigning students to analyze something that has been Google translated (in much the same way your reviewers did) could make that aspect of technology into a benefit.

That is, if the assignment provided an English paragraph AND its Google-translated counterpart, then in order to check/analyze the machine translation, students would not only have to make their own (hopefully correct) translation but also to understand where and why certain errors have happened.  So in that way the learning benefits might actually be greater (or at least different) than simply translating on one's own.

 

117

Your idea would definitely work, Amy! We could view it as a way of encouraging students to "notice" features of the language and pay attention to how form creates meaning. Some instructors give their students samples of the language with "errors" and ask them to correct them. One problem with that approach is that it doesn't have too many real-life applications. However, the activity that you have suggested allows for many connections to authentic situations. It's very common, for example, to find Wikipedia pages that have been auto-translated from a different language, and students could be asked to edit Wikipedia in the language that they are studying.

158

I really like the ideas that you and Amy are coming up with on this, especially the connection to "correcting" Wikipedia. It seems to me an authentic expression of language study as many who study languages are asked to translate and correct written communications. It's a difficult task, and one I know I would have been fearful of as an early language learner, but especially in later years of language study, it seems to have a valuable component of "signagutre pedagogies" (see http://www.creighton.edu/sites/www.creighton.edu/files/TL-Signature%20Pe...). 

167

Thank you for the link! Yes, I think that one of the benefits of students correcting linguistic errors or awkward machine translations on Wikipedia is that it provides a real-life context for translation. In the past, language courses included translation as an exercise void of context or applicability. Here, students can feel that they are contributing to collective knowledge with the skills that they acquired in the foreign-language classroom. The role of the instructor here would be to find articles in the language of study that require improvement and then to prepare students to make the edits.

Another possibility, even more interesting in my opinion, would be to have students write their own articles in the language of study. This happens frequently in English. For instance, little-known towns in other cultures often have a Wikipedia entry. The reverse, however, is much less common. Here's a random example of a page that might be fun to create in another language: https://en.wikipedia.org/wiki/Bluebird_cafe.

134