Skip to content

Translation Services Need To Be Data-Driven

The localization industry has been experiencing a huge increase in content to be translated. With volumes of eLearning content being translated during the heights of the pandemic to businesses translating more marketing content on what is hopefully the tail end of the pandemic.  With larger content volumes to translate and manage, it’s more efficient to apply a data-driven analysis and process in the translation and localization workflow.


The Meaning of “Being Data-Driven” in an Enterprise

Translation Services Need To Be Data Driven

Being data-driven in an enterprise means it promotes data literacy and encourages its staff to set more appropriate goals, make better decisions, and make more improvements by utilizing data. Employees should be able to value and access the data and have a basic understanding of data science—which also means the company should evangelize data literacy.


To be specific, in a data-driven enterprise, people normally use metrics, KPIs (key performance indicators), OKRs (objective and key results), analysis, and evaluation to measure their success and goals in order to optimize their decisions and processes. During these practices, data is frequently applied to check the status, progress, impact, and value of different business operations.


How Localization Can Be Data-driven

A more data-driven localization team can help an enterprise to achieve the shared goals that they set with higher efficiency, better quality, and less cost. There are many aspects that need to be taken into consideration to make localization head toward a data-driven model.


First, it’s important to promote data literacy within the localization team and let everyone understand and value data. For example, the localizers should be able to do reports using data on quarterly business review meetings. In addition, understanding the user groups by analytics and traffic metrics is also necessary. This is usually demonstrated in the page view – how many visitors view the page per language and locale.


Second, with all the supporting data, localizers can better decide what, when, where, and how to localize their content. For example, adding a new locale to their target locales. It will not only produce higher potential ROI but also optimize the scope of the localization projects and programs for efficiently using all existing resources. Additionally, the localization process can be monitored with an automated process, which forms a competitive mechanism among everyone and helps produce the localized content with higher quality and efficiency.


Data Science for Localization Professionals

Translation Services and Data Driven

Data is a medium to store and interpret information. Data science is the field where people use data to understand and evaluate various business problems by extracting and mining the useful parts from historical data and building different data models via a business intelligence system. From all kinds of business contexts, data scientists can collect data for analytics and publish it as new data with data models and visualization.


As for localization professionals, there will be many business challenges that we may encounter. For example, predicting the return on investment of adding a new language among other target languages is a challenge for clients and translation companies advising and supporting them. Devising a strategy of collaborating with suppliers for improving better performance on both sides is another example of a challenge data (surveys, grading, feedback opps, etc…) can help with.  And launching a new tool, such as training a neural machine translation engine and post-editing for enhancing the localization efficiency and quality while reducing the cost at the same time is an opportunity to use data to guide how this is best completed. To better diagnose these problems and apply effective approaches towards them, we need to dive into data analytics and data science. Additionally, it is necessary to use a business intelligence platform to record all the performance data, quality data (quality metrics and error ratios), linguistic data, and financial data, etc. With more data available in the system, the result of the data hypothesis and analysis would become more effective.


Types of Localization Data

Localization data includes all data that is valuable for localization production. Although there may be variations in localization data in different companies, generally there are three types of localization data: operational data, strategic data, and data specifically for localization (such as CLDR. (See


  1. Strategic Data

Strategic data helps localization data professionals to decide when, where, what, and how to localize the products. Normally, strategic data include demographic data, economic data, language data, locale market data, and company data. For example, strategic data is collected to help the localization team figure out how many people in a certain area are using a certain language, what income level they are in, what is the demographical structure in that region, and how large is the market. In many companies, cross-functional collaboration is necessary, and strategic data is a great assistance for different business units to refer to in cross-functional engagement meetings. For example, strategic data like market data and economic data can be used for business review and decisions, which can also point out the direction of the localization operation in a company.


  1. Operational Data

Operational data, as it is called, is produced during the process and operation of localization. Operational data is used for evaluating the value and impact of localization, and it includes data in terms of volume, workflow, efficiency, cost, quality, usage, and impact. For example, during the quarterly business review, localization professionals can make a chart about the volume of localization in different business units for a certain period of a fiscal year. This shows how much localization work each had completed, and which product from which business unit spends most of the budget on localization. By analyzing this data, the localization project and program managers will be able to diagnose the cost and efficiency of the localization process and therefore think about approaches to optimize the workflow.


  1. Common Locale Data Repository (CLDR)

CLDR from the Unicode consortium is common for software developers and localization engineers during internationalization. It includes locale data, currency, date, time zone, language and regional names, measurements, and text direction, etc. On top of CLDR, many programming languages have their own ways of importing globalization and internationalization data, such as Babel for Python and CultureInfo.CurrentUICulture in C#. Globalization engineers can develop an internationalization protocol, which can help the internationalization engineers to generalize the locale data and cultural variation facing different locales and audiences. On top of enabling automatic internationalization, data can also be used for checking the bugs and defects of the product with higher personalization tailoring for the customers.



In Summary

There are many types of data that should be analyzed and acted on in the translation and localization business.  This data can help promote efficiency and decision-making across all departments from engineering and desktop publishing, to project management and quality assurance. Having a data-driven mindset can help you make better decisions, set up more specific goals, and advance your translation business. In the translation and localization industry, we are moving forward to harness data for the improvement of our services and for the benefit of our clients.