The Peril and Promise of Historians as Data Creators – Perspective, Structure, and the Problem of Representation

[This is a working draft of a chapter in progress for an edited collection.]

Data-Driven History

Digital historians are well-familiar with notion that the larger community of historians generally has been skeptical of and cautious about data-driven scholarship. The controversies surrounding Robert Fogel and Stanley Engerman’s 1974 work, Time on the Cross: the Economics of American Slavery continue to haunt computational work.[1] Regularly, historians who are suspicious of digital methods inquire as to how contemporary digital work can avoid reproducing the interpretive missteps of the era of “cliometrics.” While Time on the Cross often stands in for a whole range of historical scholarship based on quantitative methods, it undoubtedly continues to be a focus point for conversation precisely because those quantitative methods were used to argue that people enslaved in the United States had willingly collaborated with the system of slavery to make it an efficient and productive economic institution. In doing so, Fogel and Engerman made arguments about the interior life and motivations of human beings based on the material conditions and outcomes of their circumstances. In effect, they mistook correlation for causation. The combination of quantitative methods and a history of wrenching human rights violations strikes a discordant tone that hinges on the reduction of human pain and suffering to columns and rows of numbers that can be processed and calculated with an algorithm.

No one pushed back more strongly against Fogel and Engerman’s conclusions than Herbert Gutman. No stranger to quantitative methods, Gutman revisited both the materials that the authors worked with and the conclusions that they drew from that data. He argued that though the system of slavery Fogel and Engerman examined might have seemed efficient, that efficiency was achieved through the pervasive presence and threat of violence rather than through voluntary cooperation or through an adoption of the enslaver’s worldview. An analysis of the economic systems surrounding slavery could not yield knowledge about the inner thoughts, feelings, and motivations of the enslaved as they performed their labor, regardless of how productive they were.[2]

In the wake of the widespread reaction against cliometrics, historians generally have been private about their work with data—presenting only end products, narratives, and summaries, even when that work is data-driven, but not all that computationally sophisticated. Often a small part of a much larger interpretive process, many who do minor work with data never even note that they have a set of spreadsheets or a database that they used to organize and analyze their source materials. This tendency has worked to mask the role that data collection and analysis plays in contemporary historical scholarship.

