Google announced it is implementing 110 new languages into Google Translate, the technology’s translation tool, as its “largest expansion ever,” which includes Portuguese from Portugal.
By 2022, Google has added 24 new languages using “zero shot” machine translation, where a machine learning model learns to translate into another language without ever seeing an example, and announced the “1,000 Languages Initiative,” a commitment to building AI models. [inteligência artificial] Which will support more than 1,000 spoken languages in the world,” Google says.
“Now, we’re using AI to expand the range of supported languages” and “thanks to the amazing language model PaLM 2, we’re starting to roll out 110 new languages to Google Translate, our biggest expansion ever,” including Portuguese from Portugal,” he says in a post. on the Internet.
In other words, Google Translate will now differentiate between Portuguese variants (Portugal vs. Brazil).
“From Cantonese to Kekchi, these new languages are represented by more than 614 million speakers, allowing translation for about 8% of the world’s population,” Google says.
About a quarter of the new languages, he adds, “are of African origin and represent our largest expansion of African languages to date, including Fon, Kikongo, Luo, Ga, Swati, Venda and Wolof.”
Among the languages now supported by Google Translate is Afar, a tonal language spoken in Djibouti, Eritrea, and Ethiopia. “Among all the languages in this launch, Afar received the largest number of volunteer contributions from the community,” he highlights.
Then there’s Cantonese, which has long been “one of the most requested languages on Google Translate.”
Other examples include Manx, the Celtic language of the Isle of Man, which was nearly extinct when its last native speaker died in 1974, but “thanks to an island-wide revival movement, there are now thousands of speakers”, and Nkw. , a standardized form of the Manding language of West Africa that unites several dialects into a common language.
“Its unique alphabet was invented in 1949 and has an active research community today developing the resources and technology for it,” Google says in its post.
There is also Punjabi (Shahmukhi), a variety of Punjabi written in the Persian-Arabic script (Shahmukhi) that is the most widely spoken language in Pakistan, Amazigh, a Berber language spoken in North Africa, and Tok Pisin, a “Creole language of English origin”. The lingua franca of Papua New Guinea.
Languages ”have enormous diversity: regional variations, dialects, and different spelling patterns”, and in fact, “many languages do not have a uniform format, so it is impossible to choose the ‘correct’ variety”.
“But our approach was to prioritize the most commonly used varieties in each language,” he adds.
“PaLM 2 was a key piece in this puzzle, helping the translator learn languages that are closely related to each other more efficiently, including languages close to Hindi, such as Awadhi and Marwadi, and French creoles, such as Seychelles Creole and Mauritian Creole,” he explains.
As technology evolves, “we continue to partner with expert linguists and native speakers, and over time, we will support more language variations and spelling conventions.”