FEP position paper on Artificial Intelligence
FEP position paper on Artificial Intelligence (AI) 15 June 2023
Introduction
AI has for a long time been the subject of literature, whether scientific or science-fiction, and captivated people’s imagination regarding its potential. We are now reaching a point where AI is becoming an everyday reality and the driver of another potential industrial revolution. The recent attention around off-the-shelf “generative AI”, and these models’ arguably impressive capacity for producing, more and more independently, various types of content in a way that can mislead an average person into believing it is the work of a human being, is testament to both the potential and risks associated with this technology and its many applications.
While not an entirely new topic for publishers, AI’s recent advances make it a subject of particular interest as well as concern for the entire book sector. The EU’s 2019 DSM Directive provides the essential right for publishers, as rightsholders, to opt out of certain Text and Data Mining (TDM) regarding works they own, thus preventing their works from being used to train AI. An important question concerns the basis upon which the operators of AI have access to copyright-protected works. Another crucial consideration is transparency regarding what data is used to train an AI and where it was harvested, especially when artistic and literary works are mined.
The potential of AI for a more efficient publishing sector
AI clearly provides opportunities for the book value chain to optimise production and distribution processes, as well as to provide new insights to the benefit of all.
The potential uses in the industry are many including identifying market trends, demand forecasts, stock management, detecting plagiarism or copyright infringements, supporting translation of some types of texts, assistance with editing, improving accessibility, and metadata generation. Only part of this potential depends on generative AI. Some of these uses are already commonly deployed (notably, but not limited to, in academic and legal publishing), while others are still being researched and tested by publishers.
AI could reduce the environmental footprint of the sector by better anticipating the required print run for a given title and avoiding excess production, it could increase the discoverability of relevant texts and of authors by readers and speed up time-consuming and repetitive tasks (such as transcription from audio). Publishers should be free and encouraged to explore the use of AI within their business, taking into account the interests of the other parts of the ecosystem and involving them in discussing fair principles for its use.
However, access to AI might pose a challenge for SMEs (which constitute the majority of businesses in the publishing sector) who do not necessarily have the financial capacity, skills, or access to big data sets to fully benefit from its potential.
Therefore, FEP calls for more research in the field of AI and its applications in the book sector, and for support for publishers to use these new tools, particularly when it helps businesses to fulfil environmental sustainability and accessibility objectives.
While AI represents an opportunity for the publishing sector, there is also a risk that the AI space becomes dominated by a small number of powerful technology companies, who would be the only ones able to gather and control the resources and data necessary to create powerful AI. While the Digital Markets Act constitutes a first step towards improving the access to data, these rules require effective implementation and enforcement. In addition, it should not be forgotten that originality, and thus creativity remain fundamental for creative sectors. AI is by definition unable to provide these essentially human contributions.
AI must comply with copyright and intellectual property rules
Publishers and the overall book value chain rely on an effective copyright regime that allows rightsholders to determine how their works are used, authors and others to be remunerated, and investment in new works to be sustained for the long term.
AI is not exempted from copyright rules, particularly when it uses pre-existing works. Large Language Models (LLMs) are trained by using texts, including books and other copyright- protected publications, collected from the internet and other sources. These materials, as well as their sources, should be clearly identified by the AI operators. It has already been documented that the training datasets used to develop leading AI models have included large numbers of ebooks which were accessed illegally. Such methods cannot be silently condoned.
There are two main questions regarding AI which are relevant to copyright: the input (the data used to feed or train an AI), and the output (the content an AI produces).
- To shape the input phase, the European Union already provides a clear framework for AI in its 2019 Copyright in the Digital Single Market (DSM) Directive which introduced two mandatory exceptions on TDM, a technical process that is part of AI training or creation. These exceptions allow the reproduction of copyright-protected works for scientific research or for other However, both exceptions require the operator to have legal access to the work before it may be mined. In addition, where TDM is carried out for purposes other than non-commercial research, the rules provide rightholders with the choice of opting out in order to prevent their works being mined (e.g. because they choose to licence this use or would consider doing so if they became aware that their works were of interest to miners). AI actors should fully respect the copyright framework in Europe, including cooperating with rightholders to adopt joint solutions for machine-readable opt outs – whether via technical tools or Terms and Conditions – and licensing. Thanks to these safeguards, which must be rigorously enforced, the TDM exceptions provide a suitable legal framework at the input level. However the enforcement of this legal framework cannot be effective without stronger accountability by AI providers and transparency rights for right-holders whose content is used.
- At the output phase, the copyright status of content produced by generative AI should follow the same rules for copyright eligibility as any other content: if content was created
by an AI without the original expression of an author’s (i.e. a human’s) free and creative choices and personality, it should remain ineligible for copyright protection. However, if AI is used merely as a tool by an author in the creation of a work which still expresses his or her own creativity in an original way, then this new work should enjoy copyright protection.
While style is not subject to copyright protection, it is important to consider the possible need for application of moral rights (as well as other relevant rights such as personality rights) when generative AI is instructed to create content mimicking the style of a specific author, as such AI-generated content could mislead consumers, potentially compete with the original author’s past or future work, or be prejudicial to the author’s honour or reputation.
We are already witnessing cases where AI was developed under the guise of “scientific research” in order to rely on the copyright exception meant to cover only non-commercial beneficiaries, but was in fact funded by private entities with clear commercial purposes and then turned into commercial products. To avoid a “data laundering” effect and prevent copyright infringements when an AI is transitioned from a research project to a commercial product, any data collected under the scientific research TDM exception to train the AI must be deleted, and the AI retrained with legal data.
Transparency as a safeguard
As AI is evolving into a technology able to deceive consumers into thinking they are either interacting with a human or being shown genuine original creative works, it is fundamental that proportionate transparency obligations should apply when AI is deployed.
To allow rightholders to effectively enforce their rights, and verify that their works were not illegally mined (e.g. despite an opt-out, or by an ineligible actor), AI developers should guarantee the transparency of their training dataset, including regarding the works that were mined and where they were collected, and collaborate with rightsholders to identify and exclude from training set any source providing illegal access to copyrighted works. They should also be under an obligation to remove any illegally accessed or reproduced works from AI training datasets, including in collaboration with rightsholders who wish to support such removal. AI models which have benefited from the use of works without rightsholder consent and without the benefit of an exception should be re-trained without these works.
Consumers should be clearly informed when a content was fully generated by AI, both to avoid confusion and to avoid unjustified claims to copyright protection. However, such information should not be mandatory when AI was used merely as a tool in the creative process (see previous section) or AI was used in an ancillary manner or for purposes unrelated to the generation of the content itself. Indeed, as AI is becoming more and more intertwined in production processes, a transparency obligation extending to the disclosure of the methods of creative processes could lead to disproportionate and counterproductive effects.