Will publishers try to shut down the free General Index of 107 million scientific journal articles?

One of the exciting possibilities opened up by the digital world is that access to all human knowledge could be made freely available to everyone with an Internet connection. Sadly, most publishers prefer boosting their profits to helping humanity, and have done everything they can to make sure that this possibility is never realised. Despite that, the activist Carl Malamud is well on the way to creating a key part of this universal access to knowledge: a 38 terabyte index of 107 million scientific articles. The first release of what he calls the General Index can be downloaded by anyone with a fast enough Internet connection, and has no restrictions on its use.

Publishers have made producing even something as uncontroversial as an index of articles difficult. Generally, researchers must pay again to index articles that they have already paid for, typically as part of a subscription. The subscriptions themselves are in any case unjustified, since most articles report on research funded by taxpayers. The result of that work should therefore be freely available to all. Making researchers pay a second time to index publicly-funded articles adds insult to injury.

Malamud is unwilling to say where he obtained his 107 million articles, but the suspicion is that many of them came from Sci-Hub, an unofficial online repository of nearly 90 million academic papers, all of which are freely available. Although publishers are pursuing Sci-Hub for alleged copyright infringement, it’s important to note that the vast majority of the papers were funded by the public. Sci-Hub is a convenient way for ordinary people to read the research they have paid for. Some of Sci-Hub’s most enthusiastic users are actually the researchers themselves. As Malamud told Nature in 2019:

where he got the articles from shouldn’t matter anyway. The data mining, he says, is non-consumptive: a technical term meaning that researchers don’t read or display large portions of the works they are analysing. “You cannot punch in a DOI [article identifier] and pull out the article,” he says. Malamud argues that it is legally permissible to do such mining on copyrighted content in countries such as the United States.

Malamud has taken the extra precaution of building his index in India, where a court ruled that photocopying textbooks for educational purposes is generally allowed. The hope is that the General Index of scientific articles would also be regarded as similarly acceptable by the Indian courts if a case were brought by publishers.

However, it is truly absurd that in addition to financing the huge task of indexing 107 million articles, the non-profit General Index project must also take these extreme measures to minimise the risk of being sued by academic publishers. If merely producing an index to taxpayer-funded scientific papers is illegal without explicit permission from all the publishers involved, there is something seriously wrong with copyright law.

Featured image by Piqsels.

Follow me @glynmoody on TwitterDiaspora, or Mastodon.