As we pointed out in a previous article on Facebook’s artificial intelligence, the social media giant is having a difficult time translating user generated content. People tend to post as they talk on social media, and no machine translation software has been able to adequately understand and translate this, prompting Facebook to look further for a better solution.
On August 28th, Facebook filed a patent for “Machine Learning Dialect Identification” with the U.S. Patent & Trademark Office. This smart language dialect identification system will create classifiers for language dialects. These rules categorize how different words are used and as the machine recognizes them, it creates a dialect-specific language module which will allow it to more accurately translate slang and colloquialisms.
Previously we discussed how Arabic, specifically, presented problems due to the many dialects spoken across the Arab world, and also the poetic nature of the language.
Stepconference.com, a tech and interactive group in the Middle East and North Africa (MENA) region, says the current Facebook translation button for Arabic cannot translate any Arabic dialects other than Modern Standard Arabic (MSA).
In the patent application, it is noted that traditional speech recognition and machine translations systems for Arabic focus on MSA and don’t account for other Arabic dialects, which differ from MSA syntactically, morphologically, lexically and phonologically. The patent author notes that speech recognition and machine translation systems cannot adequately recognize or translate content items to or from non-MSA dialects.
A way to better translate Arabic dialects is to identify the Arabic country the comment or web entry is posted in, linking the post to a specific dialect. Or, an online article or post can be identified as a specific dialect based on user interaction with the content. For example, if an article is rated by users that are known as using an identified Arabic dialect, the module can determine that the online article is in that dialect.
Facebook is also hoping to engage crowdsourcing to augment the training data set. The system will send content items and classification results to users who can respond to confirm whether the classification is correct, or rank it on accuracy.
As companies expand globally, leveraging international social media is vital but will only be useful if the content is accurately localized. Let’s see if Facebook’s new patent gets this right.