Google has announced that it is rolling out 110 new languages to Google Translate, the technology’s translation tool, in what it calls its “biggest expansion ever,” including Portuguese from Portugal.
By 2022, Google has added 24 new languages using “zero shot” machine translation, where a machine learning model learns to translate into another language without ever seeing an example, and announced the “1,000 Languages Initiative,” a commitment to building AI models. [inteligência artificial] Which will support more than 1,000 spoken languages in the world,” Google says.
“Now, we’re using AI to expand the range of supported languages,” he says in an online post. “Thanks to our amazing PaLM 2 language model, we’re rolling out 110 new languages to Google Translate, our biggest expansion ever,” including Portuguese from Portugal.
In other words, Google Translate will now differentiate between Portuguese variants (Portugal vs. Brazil).
“From Cantonese to Kekchi, these new languages are represented by more than 614 million speakers, allowing translation for about 8% of the world’s population,” Google says.
He adds that about a quarter of the new languages “are originally from Africa and represent our largest expansion of African languages to date, including Fon, Kikongo, Luo, Ga, Swati, Venda and Wolof.”
Among the languages now supported in Google Translate is Afar, a tonal language spoken in Djibouti, Eritrea, and Ethiopia. “Of all the languages in this launch, Afar has received the most volunteer contributions from the community,” he highlights.
Then continues Cantonese, which has long been “one of the most requested languages on Google Translate.”
Other examples include Manx, a Celtic language of the Isle of Man, which was nearly extinct when its last native speaker died in 1974, but “thanks to an island-wide revival movement, there are now thousands of speakers,” and Nkwo, a standardized form of the Manding languages of West Africa that unites many dialects into a common language.
“Its unique alphabet was invented in 1949 and has an active research community today developing the resources and technology for it,” Google says in its post.
There is also Punjabi (Shahmukhi), a variety of Punjabi written in the Perso-Arabic script (Shahmukhi) and which is the most widely spoken language in Pakistan, Tamazight, a Berber language spoken in North Africa, and Tok Pisin, a “creole language of English origin” and the lingua franca of Papua New Guinea.
Languages ”have enormous diversity: regional variations, dialects, and different spelling patterns”, and in fact, “many languages do not have a uniform format, so it is impossible to choose the ‘correct’ variety”.
“But our approach was to prioritize the most commonly used varieties in each language,” he adds.
“PaLM 2 was a key piece in this puzzle, helping the translator learn languages that are closely related to each other more efficiently, including languages closely related to Hindi, such as Awadhi and Marwadi, and French Creoles, such as Seychellois Creole and Mauritian Creole.” He explains.
As technology evolves, “we continue to partner with expert linguists and native speakers, and over time, we will support more language variations and spelling conventions.”
“Friendly zombie fanatic. Analyst. Coffee buff. Professional music specialist. Communicator.”