The EU has brought back opt-in copyright for text and data mining: let’s build on that foundation

The central theme of Walled Culture the book (free digital versions) is the clash between copyright, devised for an analogue world, and the Internet, which is inherently digital. There are many manifestations of the the bad fit of the two, but if I had to choose one step that doomed copyright in the online world it would be the 1886 Berne Convention, for the following reason, as explained by Wikipedia:

The Berne Convention introduced the concept that protection exists the moment a work is “fixed”, that is, written or recorded on some physical medium, its author is automatically entitled to all copyrights in the work and to any derivative works, unless and until the author explicitly disclaims them or until the copyright expires. A creator need not register or “apply for” a copyright in countries adhering to the convention.

That might have seemed like a good idea at the time, saving creators from the bother of registering their new works. Fast forward a hundred years, and the arrival of the Internet meant that the default for everything placed online is that it is under copyright, whether or not it is needed. Moreover, this also means that copyright laws designed to protect physical works like books and DVDs from large-scale piracy by organised criminal gangs also applies to the most innocent act of digital copying by ordinary Internet users. This absurd over-reach has led to numerous cases of people being prosecuted and fined huge sums for actions that were accidental or trivial. Many of these were discussed in the early chapters of the book Walled Culture.

The situation might seem hopeless, since the copyright industry never allows even wrong-headed laws to be repealed while they operate to its benefit. But an excellent post by Paul Keller on the Open Future blog has spotted something rather interesting. The main thread of the post is about text and data mining (TDM), machine learning (ML) and the EU copyright framework. In particular, he considers the thorny question whether authors, creators, and others need to give permission before their works can be used as input for generative machine-learning systems, something that Walled Culture has discussed several times. Keller’s whole analysis is valuable, but here I’d like to focus attention on his closing point:

the EU legislator has ensured that in the context of TDM/ML, copyright protection will only accrue to those creators and rightholders who actually want it enough to signal their intent. This approach addresses one of the most fundamental problems with copyright: that it applies by default to all creative output — both by creators who wish to control the use of their works and by those who do not. The EU framework for TDM limits copyright protection to those creators who want it, without covering the rest of human expression on the Internet with the suffocating blanket of default copyright protection that would lock those works away for many decades.

That is, for all its many faults, the EU Copyright Directive has done one thing that is rather innovative: in the context of TDM and ML, it has made copyright opt-in, rather than automatic. This goes against 100 years of the Berne Convention, and creates an important precedent. As Keller notes:

this opt-in approach to copyright is limited to TDM, but it is not inconceivable that this approach could be expanded if it proves to work in practice, especially in the ongoing discussion about ML training.

Assuming it does function, TDM’s opt-in will be something that can be cited as an example of an area where there is no automatic copyright. That fact can then be used when pushing for a wider opt-in approach that would be more suitable for the digital world.

