Announcement: HathiTrust Research Center adds 5 billion pages to help scholars see farther

From the post:

Partnering with close to 100 research libraries from around the world, HathiTrust holds about 595 terabytes of digitized textual data — that’s about 157 miles, or 10,000 tons of text. In 2010, HathiTrust launched the HTRC to help researchers around the world accomplish tera-scale data mining and textual analysis. The HTRC is a collaborative effort among Indiana University; the University of Illinois, Urbana-Champaign (UIUC);and the University of Michigan.

Until recently, the HTRC had access to less than a third of the full HathiTrust repository. That all changed this year, and now the HTRC is working with the University of Michigan to enable analysis of the entire 5 billion pages of textual data in the HathiTrust repository.

“This will be the first time that a researcher could analyze, as data, a collection that is equivalent to some of the largest research libraries in the world,” says Robert McDonald, associate dean of libraries at Indiana University.

Source: HathiTrust Research Center adds 5 billion pages to help scholars see farther | iSGTW