Sharing material used to be the norm for newspapers, and should be for LLMs

Even though parents insist that it is good and right to share things, the copyright world has succeeded in establishing the contrary as the norm. Now, sharing is deemed a bad, possibly illegal thing. But it was not always thus, as a fascinating speech by Ryan Cordell, Associate Professor in the School of Information Sciences and Department of English at the University of Illinois Urbana-Champaign, underlines. In the US in the nineteenth century, newspaper material was explicitly not protected by copyright, and was routinely exchanged between titles:

Nineteenth-century editors’ attitude toward text reuse is exemplified in a selection that circulated in the last decade of the century, though often abbreviated from the version I cite here, which insists that “an editor’s selections from his contemporaries” are “quite often the best test of his editorial ability, and that the function of his scissors are not merely to fill up vacant spaces, but to reproduce the brightest and best thoughts…from all sources at the editor’s command.” While noting that sloppy or lazy selection will produce “a stupid issue,” this piece claims that just as often “the editor opens his exchanges, and finds a feast for eyes, heart and soul…that his space is inadequate to contain.” This piece ends by insisting “a newspaper’s real value is not the amount of original matter it contains, but the average quality of all the matter appearing in its columns whether original or selected.”

Material was not only copied verbatim, but modified and built upon in the process. As a result of this constant exchange, alteration and enhancement, newspaper readers in the US enjoyed a rich ecosystem of information, and a large number of titles flourished, since the cost of producing suitable material for each of them was shared and thus reduced.

That historical fact in itself is interesting. It’s also important at a time when newspaper publishers are some of the most aggressive in demanding ever stronger – and ever more disproportionate – copyright protection for their products, for example through “link taxes”. But Cordell’s speech is not simply backward looking. It goes on to make another fascinating observation, this time about large language models (LLMs):

We can see in the nineteenth-century newspaper exchanges a massive system for recycling and remediating culture. I do not wish to slip into hyperbole or anachronism, and will not claim historical newspapers as a precise analogue for twenty-first century AI or large language models. But it is striking how often metaphors drawn from earlier media appear in our attempts to understand and explain these new technologies.

The whole speech is well worth reading as a useful reminder that the current copyright panic over LLMs is in part because we have forgotten that sharing material and helping others to build on it was once the norm. And despite blinkered and selfish views to the contrary, it is still the right thing to do, just as parents continue to tell their children.

Featured image from Library of Congress.

