Skip to content

The Role of XML in Localization

Software Translation

The Role of XML in Localization

XML is a language designed to simplify the transfer of data between systems. It allows users to exchange information without having to worry about incompatibility between the systems. It’s important to mention that XML does nothing on its own. It’s designed to carry data, not display it. It is a standard for everyone. It is safe, convenient and interoperable.

What rules must be followed to ensure the quality of an XML file?

  • All XML elements must have a closing tag.
  • Tags are case-sensitive.
  • All elements must be nested.
  • All XML documents must have a root element.
  • Attributes must be quoted.

XLIFF Files

XLIFF files are frequently used in the localization industry because of their simplicity and interoperability. Most CAT tools support them. XLIFF files are basically XML files that contain a collection of translation units. A translation unit contains a paragraph that is extracted from the source document and is inside the <source> element. The translated text will go inside the <target> element. Alternatively, legacy translations from other projects can be added to <alt-trans> to be used as a guideline or reference.

Below are various XML files and implementations of the XLIFF standard:

  • TMX: an open XML standard used for exchanging translation memory information.
  • XLIFF: created to standardize the way localizable data is passed between tools.
  • TTX: Trados Tag Editor files created from different file formats (HTML/JSP/ASP/RT/DOC).
  • SDLXLIFF: XML based file format designed to work with Trados Studio. Compliant with XLIFF version 1.2.
  • MQXLIFF: similar to SDLXLIFF but used in MemoQ.

XML Benefits

  • Encoding: XML provides a clear mechanism to identify the type of encoding used in the document.

                <?xml version=”1.0″ encoding=”UTF-16″ standalone=”no” ?>

  • Metadata: XML can provide relevant information for localization.

<source-language=”en-US” target-language=”es-AR”..>
<source xml:lang=”en-us”>GPI is a localization company established in the USA</source>
<target xml:lang=”es-ar”>GPI es una empresa de localización establecida en los Estados Unidos.</target>

  • Escape Mechanism: XML offers a way to escape extended characters. This allows the document to contain any character, even if it is not supported by the encoding used by the document. For example, Japanese characters can be escaped if a given document uses an encoding where Japanese symbols are not supported.
  • It’s human-readable.
  • It separates formatting from the actual content.

Scenarios where XML is not appropriate for localization:

  • When it is poorly formed.
  • When there is translatable content in the attributes.
  • When there is not enough metadata to distinguish the different segments.
  • When there is HTML content in the CDATA section.
  • When there are bad implementations of other standards, such as XLIFF.

XML Parsing

To correctly translate an XML file, it must be well parsed.

When parsing, you should consider:

  • Which elements and attributes are translatable/untranslatable.
  • Structural elements, like tags, which can break a segment.
  • Elements that have pre-formatted content, such as white spaces.
  • Any other element that must be parsed with other specific rules (ex: script section).

When you should consider writing a customer parser:

  • When tags in the CDATA section are counted as translatable and shown in the editor.
  • When specific elements are not translatable and are meant to be excluded.
  • When a generic parser doesn’t adjust to the XML you throw at it. So, there will be no contextual information for the segments. Is it a list? A paragraph? An item? Writing a custom parser allows you to differentiate this.
  • When length restrictions are needed.
  • When the validity of XML cannot be assured.

Some CAT tools such as Trados Studio come with a generic file type called “AnyXML”. Although this file type is suitable most of the time, it is recommended to tweak it or create a new one if your source file is not displaying properly.

Summary

When conducting a software globalization project you will need to work with XLIFF files, which are XML files containing a collection of translation units. In this blog, we covered which rules must be followed when working with XML files, the benefits of XML files and what role they play in software localization.

Whether you are localizing a mobile application into Arabic or complex security applications into Spanish, it is vital to work with a development team familiar with software internationalization who can localize, test and publish language versions of your software.