Google Neural Machine Translation


Google Neural Machine Translation is a neural machine translation system developed by Google and introduced in November 2016, that uses an artificial neural network to increase fluency and accuracy in Google Translate.
GNMT improves on the quality of translation by applying an example-based machine translation method in which the system "learns from millions of examples". GNMT's proposed architecture of system learning was first tested on over a hundred languages supported by Google Translate. With the large end-to-end framework, the system learns over time to create better, more natural translations. GNMT is capable of translating whole sentences at a time, rather than just piece by piece. The GNMT network can undertake interlingual machine translation by encoding the semantics of the sentence, rather than by memorizing phrase-to-phrase translations.

History

The Google Brain project was established in 2011 in the "secretive Google X research lab" by Google Fellow Jeff Dean, Google Researcher Greg Corrado, and Stanford University Computer Science professor Andrew Ng. Ng's work has led to some of the biggest breakthroughs at Google and Stanford.
In September 2016, a research team at Google announced the development of the Google Neural Machine Translation system and by November Google Translate began using neural machine translation in preference to its previous statistical methods which had been used since October 2007, with its proprietary, in-house SMT technology.
Google Translate's NMT system uses a large artificial neural network capable of deep learning. By using millions of examples, GNMT improves the quality of translation, using broader context to deduce the most relevant translation. The result is then rearranged and adapted to approach grammatically based human language. GNMT's proposed architecture of system learning was first tested on over a hundred languages supported by Google Translate. GNMT did not create its own universal interlingua but rather aimed at commonality found in between many languages, considered to be of more interest to psychologists and linguists than to computer scientists. The new translation engine was first enabled for eight languages: to and from English and French, German, Spanish, Portuguese, Chinese, Japanese, Korean and Turkish in 2016. In March 2017, three additional languages were enabled: Russian, Hindi and Vietnamese along with Thai for which support was added later. Support for Hebrew and Arabic was also added with help from the Google Translate Community in the same month. In mid April 2017 Google Netherlands announced support for Dutch and other European languages related to English. Further support was added for nine Indian languages: Hindi, Bengali, Marathi, Gujarati, Punjabi, Tamil, Telugu, Malayalam and Kannada at the end of April 2017.

Evaluation

The GNMT system is said to represent an improvement over the former Google Translate in that it will be able handle "zero-shot translation", that is it directly translates one language into another. Google Translate previously first translated the source language into English and then translated the English into the target language rather than translating directly from one language to another.
A July 2019 study in Annals of Internal Medicine found that "Google Translate is a viable, accurate tool for translating non–English-language trials". Only one disagreement between reviewers reading machine-translated trials was due to a translation error. Since many medical studies are excluded from systematic reviews because the reviewers do not understand the language, GNMT has the potential to reduce bias and improve accuracy in such reviews.

Languages supported by GNMT

The following 101 languages are supported by Google Translate's Neural Machine Translation model as of . Kyrgyz, Latin, and the Belarusian, Maltese and Sundanese to other languages pairs are currently not supported yet.
  1. Afrikaans
  2. Albanian
  3. Amharic
  4. Arabic
  5. Armenian
  6. Azerbaijani
  7. Basque
  8. Belarusian
  9. Bengali
  10. Bosnian
  11. Bulgarian
  12. Burmese
  13. Catalan
  14. Cebuano
  15. Chichewa
  16. Chinese
  17. Chinese
  18. Corsican
  19. Croatian
  20. Czech
  21. Danish
  22. Dutch
  23. English
  24. Esperanto
  25. Estonian
  26. Filipino
  27. Finnish
  28. French
  29. Galician
  30. Georgian
  31. German
  32. Greek
  33. Gujarati
  34. Haitian Creole
  35. Hausa
  36. Hawaiian
  37. Hebrew
  38. Hindi
  39. Hmong
  40. Hungarian
  41. Icelandic
  42. Igbo
  43. Indonesian
  44. Irish
  45. Italian
  46. Japanese
  47. Javanese
  48. Kannada
  49. Kazakh
  50. Khmer
  51. Korean
  52. Kurdish
  53. Lao
  54. Latvian
  55. Lithuanian
  56. Luxembourgish
  57. Macedonian
  58. Malagasy
  59. Malay
  60. Malayalam
  61. Maltese
  62. Maori
  63. Marathi
  64. Mongolian
  65. Nepali
  66. Norwegian
  67. Pashto
  68. Persian
  69. Polish
  70. Portuguese
  71. Punjabi
  72. Romanian
  73. Russian
  74. Samoan
  75. Scots Gaelic
  76. Serbian
  77. Sesotho
  78. Shona
  79. Sindhi
  80. Sinhala
  81. Slovak
  82. Slovenian
  83. Somali
  84. Spanish
  85. Sundanese
  86. Swahili
  87. Swedish
  88. Tajik
  89. Tamil
  90. Telugu
  91. Thai
  92. Turkish
  93. Ukrainian
  94. Urdu
  95. Uzbek
  96. Vietnamese
  97. Welsh
  98. West Frisian
  99. Xhosa
  100. Yiddish
  101. Yoruba
  102. Zulu