Neural versus Phrase-Based Machine Translation

What is Machine Translation?

Machine Translation is the process of translating content from one language to another, without the intervention of any human being.

Throughout history, there has always been a need for automatic translation without human intervention. The first experiments in machine translation date back to the late 1950s, when IBM, in collaboration with Georgetown University, translated more than 60 words from Russian to English.

About 60 Russian phrases related to political, legal, mathematical, or scientific topics were entered into the machine, which automatically translated them into English.

It wasn’t until the early 2000’s that the necessary hardware and software for more consistent translation became available.

Why is MT hard?

Israeli mathematician and machine translation pioneer Bar-Hillel presented a problem of how a translation system would deal with the phrase “The Box is in the Pen.”

The problem here is clear: The word “pen” has more than one meaning. It can mean “pen,” a writing tool, and at the same time, it can mean “playpen” for children.

To make a correct translation from one language to another, the system must determine which of the two uses of pen is the most appropriate.

«Little John was looking for his toy box. Finally, he found it. The box was in the pen. John was very happy»

A human reader will understand that the word “pen” refers to the playpen and not to a pen for writing.
The first sentence indicates “box” as a “box” that contains the toys. The reader is already aware that the box is much larger than a pen, so the first interpretation is automatically excluded without the reader having to think about it.

Where do we stand today?

The development of the Internet, together with globalization, produced a great demand for translation services and machine translation.

Global businesses, as well as economic growth in emerging markets, fueled the need for practical and decent business products that allow content to be translated into different language pairs.

Neural Machine Translation, Statistical Machine Translation, and a Little Bit of History

Neural Machine Translation (NMT) uses artificial intelligence to learn the rules of different languages and constantly improve. It works like a neuron that learns from specific materials and can predict the probability of a sequence of words.

Why is NMT so popular?

Improvements in learning algorithms, the ease of obtaining data to train the translation engine, as well as having the computational power necessary to train computers with a massive amount of information, have popularized NMT in recent years, to the point that is becoming a standard in MT, being adopted by different companies such as Google and Microsoft.

In many scenarios, NMT performs better, yields better results, and is much easier to maintain than a rule-based engine.

Statistical Machine Translation (SMT)

This model was promoted by IBM in the early 1990s. It evolved from word-level translation to phrase-based translation.

It’s training is based on creating a model that contains a sentence in a source language and its corresponding translation in the target language, creating a multilingual database.

Some MT advantages to think about…

Some CAT (Computer-Assisted Translation) Tools allow the major MT providers to be integrated into the tool, either through a plugin or API (Application Programming Interface).
Being able to translate many words, from many language pairs in a matter of minutes can drive down costs and increase delivery time.
Machine translation is very fast. Like really fast. Thousands of words can be translated into multiple language pairs in a matter of minutes.
In an MT workflow, the human translator does not disappear, but rather participates in the post-editing process, allowing the result obtained from the MT to be refined.

Conclusion:

Machine translation has come a long way, from the first experiments in 1946 to be able to translate a large volume of text in a matter of seconds using an engine that imitates human neurons.
Even so, MT is still in constant evolution, with improved algorithms and greater computational power.

GPI’s Machine Translation (MT) implementations ensure that NMT is a good candidate for the client’s needs. Firstly, carrying out a test project to determine the human translation effort required to edit the output, as well as the creation of a custom engine based on the content and the desired language pairs. GPI’s NMT solutions ensure savings and greater productivity.