In between meetings today, I had a quick read of the Hargreaves Review – an investigation into intellectual property reform (prompted by this article). The final report, titled “Digital Opportunity: A Review of Intellectual Property and Growth” seems very positive, which is not something I can say about many government reviews into copyright law.

(c) Jeff Kurbina
Aside from some interesting stances on legalising copyright usage for parodies, personal replication, and a u-turn on the policy of forcing ISP’s to block sites accused of copyright infringement, the report highlights the need for a change of law to allow data mining.
Some key quotes:
“There should be a change in rules to enable scientific and other researchers to use modern text and data mining techniques, which copyright prohibits.”
“Researchers want to use every technological tool available, and they want to develop new ones. However, the law can block valuable new technologies, like text and data mining, simply because those technologies were not imagined when the law was formed.”
“According to the Wellcome Trust, 87 per cent of the material housed in UK’s main medical research database (UK Pub Med Central) is unavailable for legal text and data mining.”
The key recommendation is that the Government should press at EU level for the introduction of an exception to current copyright law, allowing “non-consumptive” use of a work (ie a use that doesn’t directly trade on the underlying creative and expressive purpose of the work). In the process of text-mining, copying is only carried out as part of the analysis process – it is a substitute for a human reading the work, and therefore does not compete with the normal exploitation of the work itself – in fact, as the paper says, these processes actually facilitate a work’s exploitation (ie by allowing search, or content recommendation).
This all sounds like a great step in the right direction. Companies like idio can create great value out of mining, categorising, interlinking and recommending content, especially when the content volumes are large and in disparate archives (and from various sources).
Data scientists, rejoice.