Blog News Report: The ContentMine Scraping Stack — Literature-scale Content Mining with Community-maintained Collections of Declarative Scrapers

News, Reports

Report: The ContentMine Scraping Stack — Literature-scale Content Mining with Community-maintained Collections of Declarative Scrapers

By: the EditorsNovember 20, 2014November 19, 2014

From the post:

Successfully mining scholarly literature at scale is inhibited by technical and political barriers that have been only partially addressed by publishers’ application programming interfaces (APIs). Many of those APIs have restrictions that inhibit data mining at scale, and while only some publishers actually provide APIs, almost all publishers make their content available on the web. Current web technologies should make it possible to harvest and mine the scholarly literature regardless of the source of publication, and without using specialised programmatic interfaces controlled by each publisher. Here we describe the tools developed to address this challenge as part of the ContentMine project.

Standard Copyright line here

Source: D-Lib: The ContentMine Scraping Stack: Literature-scale Content Mining with Community-maintained Collections of Declarative Scrapers