UK introducing Text & Data Mining rights for AI development

11-07-2022 | By Robin Mitchell

In order to train AI algorithms, large amounts of input data are required, and acquiring this data has continued to become more challenging. Now, the UK government has announced plans to change the law to give new data-mining rights to AI developers in both commercial and non-commercial spaces. What challenges does data mining present to AI developers, what’s changing in the law, and how will this help accelerate AI development?

What challenges does data mining present to AI developers?

Like human intelligence, artificial intelligence can improve itself over time when exposed to new scenarios that it can learn from (unlike traditional computer programs that work within a strict set of boundaries). While this means that AIs can take a long time to train, it also means that once trained, they can be highly efficient at complex tasks too difficult to code.

For example, it is possible to create a C++ program to detect faces in pictures by looking for key similarities such as the distance between eyes, nose, ears, and smiles. However, going beyond face detection (such as image recognition) can become extremely challenging when considering the vast number of permutations that can exist in a photo. One example would be recognising tattoos; a computer can be coded to identify one specific tattoo, but what about any tattoo?

To solve this, an AI would be shown millions of pictures of different tattoos and then begin to identify similarities between those images. Eventually, the AI will be able to recognise tattoos even if it has never seen that specific tattoo before. 

But this need for large quantities of data to learn from presents AI developers with significant challenges. It has only been the past decade that large amounts of data have become available to developers, and it was around this time that privacy and data protection laws came in. Furthermore, content created and shared online would often be under copyright protection, meaning that researchers were forbidden to use such data to train from.

To try and help get around this, some governments (including the UK and EU) specifically allowed for data mining for the purpose of scientific research. However, like any kind of research, those funded by large companies with a financial interest will often be the most successful. As such, AI development in the UK and EU has stagnated compared to other countries will fewer restrictions, such as the US and China.

UK Government planning to change AI data-mining laws to allow commercial purposes

Recognising the benefits of commercially driven AI, the UK government has recently announced plans to change data mining laws for AI development. Changes to the law will allow Text and Data Mining (TDM) systems to search, collect, and use legally obtainable data through the internet for AI learning. This includes using website data, hosted images, social media, and any other data that can be accessed through a browser using authorised means.

This change to TDM would also change the nature of copyright law and how license-holders can authorise the use of their works. Currently, copyright owners have the right to prevent their content from being shared or stored (except for cache and web browsing), which includes companies making copies of said work and using it for third-party purposes. If the planned changes are made law, this would no longer be the case as long as the data is solely used for AI development. Therefore, a database could be made from downloaded music, TV, films, and articles without any need for paying the original copyright holder.

The introduction of database rights also suggests that those creating a database of data for the sole purpose of AI learning have full rights over the database and its content regardless of the original copyright owner. The only condition that would be placed on data miners is that data is gathered lawfully; thus, any data available through a web search would be considered fair game.

While some may feel that this is a violation of copyright law, such databases can be thought of as memories. Whenever a human being watches a film or TV show, the resulting memories created internally in the brain are not subject to copyright. As such, an AI witnessing the contents of a database can also be thought of as memories that can be reused for training.

How will the planned changes improve AI development?

Allowing commercial entities to use legally gathered data for training AIs without being beholden to original copyright holders would lift major barriers in AI training. The internet has become an incredibly valuable source of data, and freeing this data up to commercial interests dramatically reduces the financial pressures AI developers face.

It is hoped that by freeing up this data, the UK will become a superpower in AI technology like the US and China but considering that these nations are arguably a decade ahead, it will be tough for the UK to catch up. However, the UK leaving the EU is starting to show some signs of relaxed regulations as the EU still forbids the use of copyright data in Text and Data Mining systems, and this could see the EU significantly fall behind other nations. 

Of course, there could be a reason why such laws were introduced in the first place, and this move by the UK government could be problematic for future AI development. Only time will tell.


Robin Mitchell is an electronic engineer who has been involved in electronics since the age of 13. After completing a BEng at the University of Warwick, Robin moved into the field of online content creation developing articles, news pieces, and projects aimed at professionals and makers alike. Currently, Robin runs a small electronics business, MitchElectronics, which produces educational kits and resources.