What is 'No Language Left Behind'?
Meta AI's PR Stunt of incredible inclusion and open-source brilliance.
JULY 7TH, 2022. 19:55 PM - MONTREAL.
Towards a Universal Speech Translator
This is AiSupremacy Premium,
This summer I’m offering a 15% discount coupon on a one year subscription for premium membership. The main benefit is additional locked content.
Today I wanted to write a quick note about Meta AI’s latest offering. Meta is open sourcing early-stage AI translation tool that works across 200 languages.
As you may have heard, their researchers just open-sourced a new AI model capable of translating across 200 different languages — many of which aren’t supported by current translation systems.
They call this project ‘No Language Left Behind’, and it's part of the AI system already powering 25 billion translations every day across our platforms.
An important step towards our goal to create ’Universal Speech Translator’, an AI system capable of delivering instantaneous speech-to-speech translation across all languages in the world. I suspect Alphabet and Meta have a race on their hands for a UST-tech.
Meta even has a fancy landing page for this project, or this momentous announcement.
No Language Left Behind: Scaling Human-Centered Machine Translation
Read the Paper on Meta.
What does it take to break the 200 language barrier while ensuring safe, high quality results, all while keeping ethical considerations in mind? In No Language Left Behind, they took on this challenge by first contextualizing the need for low-resource language translation support through exploratory interviews with native speakers.
Then, they created datasets and models aimed at narrowing the performance gap between low and high-resource languages. More specifically, they developed a conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages.
Meta AI propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. Critically, they evaluated the performance of over 40,000 different translation directions using a human-translated benchmark, Flores-200, and combined human evaluation with a novel toxicity benchmark covering all languages in Flores-200 to assess translation safety. Their model achieves an improvement of 44% BLEU relative to the previous state-of-the-art, laying important groundwork towards realizing a universal translation system. Finally, they open source all contributions described in this work, accessible at
Facebook seems to be in PR damage control under its new incarnation and Meta AI seems to be doing it best to uplift the image of the company, where as its stock plummeted so far in 2022, it’s no longer quite as sexy a BigTech firm to go to, but don’t worry the salaries are still high!
More than 7,000 languages are currently spoken on this planet and Meta seemingly wants to understand them all. - Engadget.
I am a bit perplexed about these efforts.
Sure we all expect Universal Language Translation soon, to be seamless and ambient computing to be everywhere, just not necessarily in the Metaverse!
You can watch their cute video here.
Image from the paper.
The new model beat existing benchmarks by as much as 70 percent.
Meta’s ambitions to build a ‘universal translator’ continue. I am excited, but not so enthusiastic. Meta did after all cancel their planned AppleWatch competitor. I guess Mark decided this would be some kind of “lure” to the Metaverse in VR worlds full of Facebook Ads.
Worth noting that Meta’s AI researchers trained this new model using their new Research SuperCluster, one of the world’s fastest AI supercomputers. Meta’s PR people were bragging about this on LinkedIn.
It’s hard to come to grips with what Silicon Valley built and became. They have censored content and now their walled gardens drip with PR about themselves! I’ve noticed this with Microsoft on LinkedIn, it’s like seeing Google products at the top of Google Search, it just feels icky.
Meta is making Yann LeCun into a real influencer, it’s good times.
Vedanuj Goswami @vedanujgExcited to share our No Language Left Behind project. A single multilingual model capable of translating between any pair across 200+ languages. This long 🧵attempts to discuss some of the technical contributions of NLLB. (1/n) https://t.co/hWqH4F2ZEN
As Louis Bouchard explains, first, they built an appropriate data set. Meta created an initial model able to detect languages automatically, which they call their language identification system. It then uses another language model based on Transformers to find sentence pairs for all the scrapped data. These two models are only used to build the 200 paired-languages datasets we need to train the final language translation model: NLLB200.
Image from the paper.
You can download the 190 page PDF here.
Thankfully, nobody is being left behind.
Meta’s Awe-Inspiring Video
To learn more about Meta’s Supercomputer go here.
Vedanuj Goswami did good work on this project. His Twitter explanation is quite good.
The July 6th, 2022 announcement and PR though was a huge team’s success. By: Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco (Paco) Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Jeff Wang.
To serioulsy approach improving Universal Translation, it’s such a collaborative endeavor by the very nature of the work, many of these researchers live all around the world. Meta AI is right about something here, the incredible globalization of AI Research.
I think we can say that by 2032 we’ll have a Universal Language translation in real-time that will make life a lot easier in many ways. 10 years isn’t too long to wait? Just don’t make me work in the Metaverse in order to experience it.
Meta is a social media conglomerate powered by advertising profits and deep pockets, but they will never be the saviors we need.
Meta though just wants to be a good corporate citizen, with its ‘No Language Left Behind’ initiative, they just hope to one day provide everyone in the world with truly universal translation tools.
Join 82 other paid subscribers who support the channel and get access to bonus content. For the price of a good cup of coffee support an independent voice on A.I.
Thanks for reading!