Report: The ContentMine Scraping Stack — Literature-scale Content Mining with Community-maintained Collections of Declarative Scrapers
From the post: Successfully mining scholarly literature at scale is inhibited by technical and political barriers that have been only partially addressed by publishers’ application programming interfaces (APIs). Many of those APIs have restrictions that inhibit data mining at scale, and while only some publishers actually provide APIs, almost all publishers make their content available […]