Fred and I got some fantastic comments on our Hermeneutics of Data and Historical Writing paper through the Writing History in the Digital Age open peer review. We are currently working on revising the manuscript. At this point I have worked on a range of book chapters and articles and I can say that doing this chapter has been a real pleasure. I thought the open review process went great and working with a coauthor has also been great. Both are things that don’t happen that much in the humanities. I think the work is much stronger for Fred and I having pooled our forces to put this together. Now, one the comments we got sent me on another tangent. One that is too big of a thing to shoe horn into the revised paper.
On the Relationship Between Data and Evidence
We were asked to clarify what we saw as the difference between data and evidence. We will help to clarify this in the paper, but it has also sparked a much longer conversation in my mind that I wanted to share here and invite comments on. As I said, this is too big of a can of worms to fit into that paper, but I wanted to take a few moments to sketch this out and see what others think about it.
What Data Is to a Humanist?
I think we have a few different ways to think about what data actually is to a humanist. I feel like thinking about this and being reflexive about what we do with data is a really important thing to engage in and here is my first pass at some tools for thought about data for humanists. First, as constructed things data are a species of artifact. Second, as authored objects created for particular audiences, data can be interpreted as texts. Third, as computer processable information data can be computed in a whole host of ways to generate novel artifacts and texts which themselves open to interpretation and analysis. This gets us to evidence. Each of these approaches, data as text, artifact, and processable information, allow one to produce/uncover evidence that can support particular claims and arguments. I would suggest that data is not a kind of evidence but is a thing in which evidence can be found.
Data are Constructed Artifacts
Data is always manufactured. It is created. More specifically, data sets are always, at least indirectly, created by people. In this sense, the idea of “raw data” is a bit misleading. The production of a data set requires a set of assumptions about what is to be collected, how it is to be collected, how it is to be encoded. Each of those decisions is itself of potential interest for analysis.
In the sciences, there are some agreed upon stances on what assumptions are OK and given those assumptions a set of statistical tests exist for helping ensure the validity of interpretations. These kinds of statistical instruments are also great tools for humanists to use. However, they are the only way to look at data. For example, most of the statistics one is likely to learn have to do with attempting to make generalizations from a sample of things to a bigger population. Now, if you don’t want to generalize, if you want to instead get into the gritty details of a particular individual set of data, you probably shouldn’t use statistical tests that are intended to see if trends in a sample are trends in some larger population.
Data are Interpretable Texts
As a species of human made artifact, we can think of datasets as having the characteristics of texts. Data is created for an audience. Humanists can, and should interpret data as an authored work and the intentions of the author are worth consideration and exploration. At the same time, the audience of data is also relevant, it is worth thinking about how a given set of data is actually used, understood and how data is interpreted by audiences that it makes its way to. That could well include audiences of other scientists, the general public, government officials, etc. In light of this, one can take a reader response theory approach to data.
Data are Processable Information
Data can be processed by computers. We can visualize it. We can manipulate it. We can pivot and change our perspective on it. Doing so can help us see things differently. You can process data in a stats package like R to run a range of statistical tests, you can do like Mark Sample and use N+7 on a text. In both cases, you can process information, numerical or textual information, to change your frame of understanding a particular set of data.
Data can Hold Evidentiary Value
As a species of human artifact, as a cultural object, as a kind of text, and as processable information data is open to a range of hermeneutic processes of interpretation. In much the same way that encoding a text is an interpretive act creating, manipulating, transferring, exploring and otherwise making use of a data set is also an interpretive act. In this case, data as an artifact or a text can be thought of as having the same potential evidentiary value of any kind of artifact. That is, analysis, interpretation, exploration and engagement with data can allow one to uncover information, facts, figures, perspectives, meanings, and traces which can be deployed as evidence to support all manner of claims and arguments. I would suggest that data is not a kind of evidence; it is a potential source of information which could hold evidentiary value.