Guide to Translation Memory (TM) | GPI Translation Blog

A translation memory is a database that keeps track of translations per language.

Translation memory (TM) is a software application that gives translators the opportunity to reuse existing translations. Translation Memory, or TM, is a simple database of translated strings or sentences. All previous translations are accumulated within the translation memory (in source and target language pairs called translation units) and reused so that you never have to translate the same sentence twice. The more you build up your translation memory, the faster you can translate content.

How does a translation memory work?

Segmentation

Translation memory works at the sentence level. When using TM, the source document is broken down into its component sentences or segments. The term segment is used because in some cases, a chunk of text may not be a complete sentence, for example, in the case of headings. The segment is the smallest unit of text that can be reused when working with TM. Smaller units of text, such as individual words, are not used, because they may occur in different contexts and thus require different translations, and word-for-word translation generally does not produce usable results.

Repetitions, 100% Matches, and Fuzzy Matches
As the translator works, each segment for translation is compared to what is stored in the TM, and matches are presented to the translator automatically. A segment in the TM that is identical to the segment for translation is considered a 100% match. At some point in the past, this exact segment was encountered and a translation was provided and stored in the TM. In theory, it can be used exactly as is. If there is no exact match, but there are segments in the TM that are similar to the one being translated, then these are presented as fuzzy matches. Each is ranked by a percentage ranging from 0% to 99%, where the higher percent matches are closer in content to the sentence being translated. A 99% match might differ only in a single letter or punctuation mark, where a 75% match might have several different words. Generally, matches below the 70% mark are not useful.

When a document contains several identical segments that are not currently in the TM, these segments are known as repetitions. Most translation memory tools can identify potential repetitions before translation begins. The advantage of repetitions is that after the first occurrence is translated, the rest will become 100% matches. As the translator works, each newly translated sentence is added to the TM. Thus, that new sentence can become a 100% match or even a fuzzy match for other sentences in the document. Repetitions are those segments that have the potential to become 100% matches.

How we use the TM during the localization process?

Before translation begins, the file will be analyzed against the TM. This process gives us a status about the file that was provided on the total word count, and the number of words that make up repetitions, 100% matches, and fuzzy matches in the file. This status is also called a TM breakdown, as the total word count of the file is broken down in the several TM match categories. For example, a common breakdown used is:

New Words
Repetitions/ 100% matches
95% – 99% matches
85% – 94% matches
75% – 84% matches

Each category may have its own price or discount. 100% matches and repetitions are generally less expensive, as they need little effort to be translated, while the lower percentage matches require more.

Why keep a translation memory?

We use translation memories so we can reduce the translation costs for our clients and we can get them a faster turnaround time. Also it will ensure a translation’s consistency. Finally, since the translation memory is automatically suggesting matches to the translator while they work, the translator is more likely to use terminology and phrases consistent with previous translations, which increases quality.

How does a translation memory work?

Further GPI Resources on Software Translation Topics