This is a pre-print version of this article. The final, edited version will appear in the online edition of American Literary History 27.3 (August 2015)…The Viral Texts Project is an interdisciplinary and collaborative effort among the authors listed here, with contributions from project alumni Elizabeth Maddock Dillon, Kevin Smith, and Peter Roby. In the first iteration of the project, we focused on pre-Civil War newspapers in the Library of Congress’s Chronicling America online newspaper archive, in large part because its text data is openly available for computational use. The pre-1861 holdings comprise 1.6 billion words from 41,829 issues of 132 newspapers. Many of the 132 newspapers included in this study are, in fact, iterations of continuously-published entities that changed names or other qualities during their runs; we describe the way we grouped these publications into newspaper “families” in Section IV below. We chose 1861 as our cut-off date not as a periodizing statement, but instead simply to demarcate a limited set of newspapers for our initial tests. The later in the nineteenth century one looks, the richer the Chronicling America archive; while our corpus includes some issues from the 1830s, then, the bulk of it comes from the 1840s and ’50s.
See the full post here: Computational Methods for Uncovering Reprinted Texts in Antebellum Newspapers | Viral Texts