Why Meta’s project to translate automatically between 200 languages will be stymied by copyright

Meta’s AI division has announced two exciting new projects in the field of machine translation:

The first is No Language Left Behind, where we are building a new advanced AI model that can learn from languages with fewer examples to train from, and we will use it to enable expert-quality translations in hundreds of languages, ranging from Asturian to Luganda to Urdu. The second is Universal Speech Translator, where we are designing novel approaches to translating from speech in one language to another in real time so we can support languages without a standard writing system as well as those that are both written and spoken.

The No Language Left Behind technology could have a major impact on how people around the world use the Internet, particularly in the way they access key scientific and medical resources. It would allow people to translate material in one of the more prevalent languages used online, such as English or Spanish, into their own local language once it has been included in the No Language Left Behind project. There’s a crying need for this, for reasons the following Wikipedia article makes clear:

Slightly over half of the homepages of the most visited websites on the World Wide Web are in English, with varying amounts of information available in many other languages. Other top languages are Russian, Spanish, Turkish, Persian, French, German and Japanese.

Of the more than 7,000 existing languages, only a few hundred are recognized as being in use for Web pages on the World Wide Web.

Unfortunately, Meta’s grand vision is unlikely to be realised – because of copyright. Unless online material is released under a permissive licence such as the ones devised by Creative Commons, it will be necessary to obtain permission from the copyright holder before a full translation can be made using Facebook’s new tools. It will only take a few high-profile lawsuits from bullying publishers to frighten people away from daring to translate mainstream online articles into their own, poorly-served language without a licence.

And so, once again, copyright maximalism will throttle an exciting chance to make the world a better, fairer place by improving access to knowledge – and all to preserve the sanctity of an outdated intellectual monopoly.

Featured image by Glyn Moody.

Follow me @glynmoody on TwitterDiaspora, or Mastodon.