Google Translate just added Kurdish. Why is that significant?

Earlier this month Google added 13 new languages to its online automated translation tool, Google Translate, bringing the number of available languages to just over 100, which apparently covers 99 percent of the wolrd’s online population, according to the company.

Google-Translate-Kurdish-languageThe new languages added include Amharic, Corsican, Frisian, Kyrgyz, Hawaiian, Kurdish (Kurmanji), Luxembourgish, Samoan, Scots Gaelic, Shona, Sindhi, Pashto and Xhosa (see the link above for some facts about these languages, including where they are spoken).

There are over 30 million Kurdish speakers in the world. Kurds are possibly the largest ethnic group in the wold without a country of their own – although there are now two de facto autonomous Kurdish regions in Iraq and Syria. Their traditional homeland, known as Kurdistan, was divided up by colonial powers at the beginning of the 20th century between Turkey, Syria, Iraq and Iran (there are also some Kurds in Azerbaijan and Armenia, and a growing diaspora community in Western Europe). In all four countries, Kurds have been repressed and discriminated against and not allowed to use their language. But the language has survived, mostly through popular culture.

There had been many petitions over the past few years demanding that Google adds Kurdish to its Translate service (see for example this petition and this Facebook page). So why was it not done until now?

The company says that, in order to add a new language, this must be a written language with “a significant amount of translations” already available on the web so that Google’s machines can “learn” the new language. Launched in 2006, Google Translate uses a combination of machine learning and human volunteers (up to three million apparently) who help improve its automatic translations.

Google has only added the Kurmanji dialect of Kurdish to its Translate service. Kurmanji is spoken mainly in the Turkish and Syarian parts of Kurdistan, and is written with Latin letters since 1932. The other main Kurdish dialect, Sorani, which is spoken in the Iraqi and Iranian parts of Kurdistan, is still written with Arabic script.

Google translations are not always accurate, sometimes comic or ridiculous. Nonetheless, Kurmanji speakers can now join millions of people around the world who have been using Google Translate to communicate and understand things in other languages that they would not have otherwise been able to understand.