Editors' Choice

Editors’ Choice: Opening the black box of EBBO

Digital archives that cover extended historical periods can create a misleading impression of comprehensiveness while in truth providing access to only a part of what survives. While completeness may be a tall order, researchers at least require that digital archives be representative, that is, have the same distribution of items as whatever they are used as proxies for. If even this representativeness does not hold, any conclusions we draw from the archives may be biased. In this article, we analyse in depth an interlinked set of archives which are widely used but which have also had their comprehensiveness questioned: the images of Early English Books Online (EEBO), and the texts of its hand-transcribed subset, EEBO-TCP. Together, they represent the most comprehensive digital archives of printed early modern British documents. Applying statistical analysis, we compare the contents of these archives to the English Short Title Catalogue (ESTC), a comprehensive record of surviving books and pamphlets in major libraries. Specifically, we demonstrate the relative coverage of EEBO and EEBO-TCP along six key dimensions—publication types (i.e. books/pamphlets), temporal coverage, geographic location, language, topics, and authors—and discuss the implications of the imbalances identified using research examples from historical linguistics and book history. Our study finds EEBO to be surprisingly comprehensive in its coverage and finds EEBO-TCP—while not comprehensive—to be still broadly representative of what it models. However, both of these findings come with important caveats, which highlight the care with which researchers should approach all digital archives.

See full post.