A team of researchers at Meta has unveiled the SEAMLESSM4T (Massively Multilingual and Multimodal Machine Translation) system, offering a versatile range of translations. Capable of translating speech in 101 languages almost instantaneously, delivering the output in 36 target languages via a voice synthesizer, the artificial intelligence use cases include speech-to-speech, speech-to-text, text-to-speech, and text-to-text translations.
The Babel Fish from The Hitchhiker’s Guide to the Galaxy—a fish that can efficiently speak various languages used for translations—may finally be a reality.
Addressing Data Scarcity in Machine Translation
The development of SEAMLESSM4T builds on Meta’s previous work in speech-to-speech translation and the No Language Left Behind project, which aimed to provide text-to-text translation for approximately 200 languages. However, a recurring problem in machine translation is the lack of training data for languages that are not as widely spoken. While training data for important languages like English is plentiful, it is still scarce for many others, especially those with little online visibility.
According to Cornell University computer scientist Allison Koenecke, this data inequality has impeded the extension of machine translation capabilities to less common languages. Although the precise causes of this enhancement are yet unknown, researchers have discovered that multilingualization improves translation systems’ performance even for languages with little training data.
A Comprehensive Translation System
To develop SEAMLESSM4T, Meta’s team gathered millions of hours of audio recordings from reputable sources like the United Nations archives and the internet. These audio recordings served as a strong training dataset, as did transcripts and human-generated translations. The algorithm successfully paired almost half a million hours of audio fragments with corresponding text in several languages.
“Meta has done a great job having a breadth of different things they support, like text-to-speech, speech-to-text, even automatic speech recognition,” said Chetan Jaiswal, a Quinnipiac University professor of computer science who was not involved in the research. “The mere number of languages they are supporting is a tremendous achievement.”
Meta, headquartered in Menlo Park, California, offers SEAMLESSM4T as an open-source tool, allowing researchers worldwide to build upon its framework. This follows the company’s successful release of its LLaMA large language model, which is widely adopted by developers globally. The SEAMLESSM4T system marks a significant advancement in multilingual communication and can potentially change how people interact across language barriers.
Learn more about other fascinating use cases for artificial intelligence across different industries.