Nominations

Defining Data for Humanists: Text, Artifact, Information or Evidence? by Trevor Owens

 Subscribe to Comments for this Post  

Fred and I got some fantastic comments on our Hermeneutics of Data and Historical Writing paper through the Writing History in the Digital Age open peer review. We are currently working on revising the manuscript. At this point I have worked on a range of book chapters and articles and I can say that doing this chapter has been a real pleasure. I thought the open review process went great and working with a coauthor has also been great. Both are things that don’t happen that much in the humanities. I think the work is much stronger for Fred and I having pooled our forces to put this together. Now, one the comments we got sent me on another tangent. One that is too big of a thing to shoe horn into the revised paper.

On the Relationship Between Data and Evidence

We were asked to clarify what we saw as the difference between data and evidence. We will help to clarify this in the paper, but it has also sparked a much longer conversation in my mind that I wanted to share here and invite comments on. As I said, this is too big of a can of worms to fit into that paper, but I wanted to take a few moments to sketch this out and see what others think about it.

What Data Is to a Humanist?

I think we have a few different ways to think about what data actually is to a humanist. I feel like thinking about this and being reflexive about what we do with data is a really important thing to engage in and here is my first pass at some tools for thought about data for humanists. First, as constructed things data are a species of artifact. Second, as authored objects created for particular audiences, data can be interpreted as texts. Third, as computer processable information data can be computed in a whole host of ways to generate novel artifacts and texts which themselves open to interpretation and analysis. This gets us to evidence. Each of these approaches, data as text, artifact, and processable information, allow one to produce/uncover evidence that can support particular claims and arguments. I would suggest that data is not a kind of evidence but is a thing in which evidence can be found.

Data are Constructed Artifacts

Data is always manufactured. It is created. More specifically, data sets are always, at least indirectly, created by people. In this sense, the idea of “raw data” is a bit misleading. The production of a data set requires a set of assumptions about what is to be collected, how it is to be collected, how it is to be encoded. Each of those decisions is itself of potential interest for analysis.

In the sciences, there are some agreed upon stances on what assumptions are OK and given those assumptions a set of statistical tests exist for helping ensure the validity of interpretations. These kinds of statistical instruments are also great tools for humanists to use. However, they are the only way to look at data. For example, most of the statistics one is likely to learn have to do with attempting to make generalizations from a sample of things to a bigger population. Now, if you don’t want to generalize, if you want to instead get into the gritty details of a particular individual set of data, you probably shouldn’t use statistical tests that are intended to see if trends in a sample are trends in some larger population.

Data are Interpretable Texts

As a species of human made artifact, we can think of datasets as having the characteristics of texts. Data is created for an audience. Humanists can, and should interpret data as an authored work and the intentions of the author are worth consideration and exploration. At the same time, the audience of data is also relevant, it is worth thinking about how a given set of data is actually used, understood and how data is interpreted by audiences that it makes its way to. That could well include audiences of other scientists, the general public, government officials, etc. In light of this, one can take a reader response theory approach to data.

Data are Processable Information

Data can be processed by computers. We can visualize it. We can manipulate it. We can pivot and change our perspective on it. Doing so can help us see things differently. You can process data in a stats package like R to run a range of statistical tests, you can do like Mark Sample and use N+7 on a text. In both cases, you can process information, numerical or textual information, to change your frame of understanding a particular set of data.

Data can Hold Evidentiary Value

As a species of human artifact, as a cultural object, as a kind of text, and as processable information data is open to a range of hermeneutic processes of interpretation. In much the same way that encoding a text is an interpretive act creating, manipulating, transferring, exploring and otherwise making use of a data set is also an interpretive act. In this case, data as an artifact or a text can be thought of as having the same potential evidentiary value of any kind of artifact. That is, analysis, interpretation, exploration and engagement with data can allow one to uncover information, facts, figures, perspectives, meanings, and traces which can be deployed as evidence to support all manner of claims and arguments. I would suggest that data is not a kind of evidenceit is a potential source of information which could hold evidentiary value.

5 thoughts on “Defining Data for Humanists: Text, Artifact, Information or Evidence? by Trevor Owens
  1. Trevor, Fred–

    Bravo, this is important stuff, particularly for the ways in which you demystify the power of data while also noting that it (they? data is plural after all) becomes an important source of evidentiary value even though it is not transparent evidence in of itself.

    I’m curious to see and learn about more examples in relation to your central points, which i take to be (1) that data are created, they are not one and the same as reality and that (2) data can be used, heuristically and semiotically, like any text, for qualitative analysis. How does this play out in particular cases in addition to Mark Sample’s work?

    Finally, I’d like to understand better how you are using the term “artifact” here in comparison to evidence. It’s intriguing to contrast the two, but could you clarify further what the distinction is that you are drawing?

    Thanks! Hope this open source review is helpful for the important work you are doing.

    Best,
    Michael

  2. Dear Trevor and Fred,

    I’m so glad you’ve addressed this topic, since the combination of “data” and “humanist” continues to be unfathomable for many in the academy – and yet, given the growth of digital scholarship and the attention that the concept of the digital scholarly workflow is receiving, thinking in terms of data will be but inevitable for many humanists.

    I’m particularly taken by your notion of applying reader-response theory to understanding data. First, it’s a framework that those of us who teach and research literature will quickly “get.” Second, you bring up an aspect of of humanities scholarship that, in my opinion, doesn’t get a lot of attention: how will someone *use* my scholarship (whether that scholarship is manifested as a data set, an article, or a monograph)? Who will use my data, and how can I capture that? What are the different ways my data could be understood (and how will I know that unless I share them)? What are the potential *reuses* for the data – be it annotations, a bibliography, a concordance, a text corpus marked up in TEI? What new kinds of research can emerge by giving data a broad audience? It seems to me, too, that a consideration of audience with respect to data ties in with the concept of evidentiary value because of demonstration of use.

    It’s been a pleasure to read this and to have ideas percolate as a result. Thank you!

    –Patricia

  3. Thanks for the comments. Here is a quick attempt to respond to a few of the questions here.

    Difference between evidence and artifacts:
    In terms of artifact I was trying to appeal to the thingy-ness of a thing not any given bit of the thing that has evidentiary value. So I would say that the idea of evidence already presupposes an argument where artifacts have a more endless set of qualities and properties that we can always go back to. So in this case, data sets, in their original format, how they are organized how things are named, the medium they are on, where copies end up, etc, are all potential things to explore for potential evidentiary value. There are a few different tensions in this that I think are interesting, these all don’t line up with each other but there is Heidigger’s difference between ready-to-hand and present-at-hand, and formal and forensic notions of materiality, and informational or artifactual components of objects. I think there is actually plenty of ground in here to parse out the differences between each of these valances as potential ways for humanists to talk about and probe into data as an artifact and through that process to find evidentiary value.
    Examples of Processing and Interpreting Data as a Humanist
    For your second question, I think there is a wealth of good stuff going on in work on both processing texts and data and the visualization as a hermeneutic tactic. In terms of processing data, I would point to Jerome McGann and Lisa Samuels ideas about deformance, particularly in how Stephen Ramsay operationalizes them in Reading Machines. The key idea here is that deforming a text can create a novel mode to unpack the meaning of the original. In the same spirit, the visualization of data, in everything from word clouds, to maps, to charts and graphs, to scatter plots, is itself something that can be thought of as part of a hermeneutic research process – a process that is generative and iterative, capable of producing new knowledge through the aesthetic provocation. Here Johanna Drucker’s notion of Graphisis is a relevant point of consideration. I like how moretti talks about this in terms of maps, that they can be more than the sum of their parts, that these kinds of abstractions can “possess emerging qualities, which were not visible at the lower level.” But for the humanist these abstractions “are not itself an explanation, of course: but at least, it offers a model of the narrative universe which rearranges its components in a non-trivial way.”
    I should also mention that there is a tradition of working with data in the humanities, namely in the history of science, which is itself enlightening. For example, Coluomb faked his data. We know this because historians of science went back and recreated his experiments and were able to sus out the fact that his instruments were not sensitive enough to get the data he said he got. Interestingly, he was actually right, he just didn’t have instruments that were sensitive enough to really detect what he was looking for. What is interesting here is that historians took scientific data, in this case, data that supported ideas about electricity which we know to be sound scientific laws, and were able to show that he must have tweaked his data to fit his theory. For a humanist the fact that this is the case opens a door to a host of questions about what the implications of this kind of thing are, how does it change our understanding of science in this context? What does it tell us about the relation between theory and evidence in scientific research? All of this is great fodder for historians and philosophers of science, and in this case they make hay by actually gathering their own data through the recreation of experiments.

  4. Thanks for the comments. Here is a quick attempt to respond to a few of the questions here.

    Difference between evidence and artifacts:
    In terms of artifact I was trying to appeal to the thingy-ness of a thing not any given bit of the thing that has evidentiary value. So I would say that the idea of evidence already presupposes an argument where artifacts have a more endless set of qualities and properties that we can always go back to. So in this case, data sets, in their original format, how they are organized how things are named, the medium they are on, where copies end up, etc, are all potential things to explore for potential evidentiary value. There are a few different tensions in this that I think are interesting, these all don’t line up with each other but there is Heidigger’s difference between ready-to-hand and present-at-hand, and formal and forensic notions of materiality, and informational or artifactual components of objects. I think there is actually plenty of ground in here to parse out the differences between each of these valances as potential ways for humanists to talk about and probe into data as an artifact and through that process to find evidentiary value.

    Examples of Processing and Interpreting Data as a Humanist
    For your second question, I think there is a wealth of good stuff going on in work on both processing texts and data and the visualization as a hermeneutic tactic. In terms of processing data, I would point to Jerome McGann and Lisa Samuels ideas about deformance, particularly in how Stephen Ramsay operationalizes them in Reading Machines. The key idea here is that deforming a text can create a novel mode to unpack the meaning of the original. In the same spirit, the visualization of data, in everything from word clouds, to maps, to charts and graphs, to scatter plots, is itself something that can be thought of as part of a hermeneutic research process – a process that is generative and iterative, capable of producing new knowledge through the aesthetic provocation. Here Johanna Drucker’s notion of Graphisis is a relevant point of consideration. I like how moretti talks about this in terms of maps, that they can be more than the sum of their parts, that these kinds of abstractions can “possess emerging qualities, which were not visible at the lower level.” But for the humanist these abstractions “are not itself an explanation, of course: but at least, it offers a model of the narrative universe which rearranges its components in a non-trivial way.”
    I should also mention that there is a tradition of working with data in the humanities, namely in the history of science, which is itself enlightening. For example, Coluomb faked his data. We know this because historians of science went back and recreated his experiments and were able to sus out the fact that his instruments were not sensitive enough to get the data he said he got. Interestingly, he was actually right, he just didn’t have instruments that were sensitive enough to really detect what he was looking for. What is interesting here is that historians took scientific data, in this case, data that supported ideas about electricity which we know to be sound scientific laws, and were able to show that he must have tweaked his data to fit his theory. For a humanist the fact that this is the case opens a door to a host of questions about what the implications of this kind of thing are, how does it change our understanding of science in this context? What does it tell us about the relation between theory and evidence in scientific research? All of this is great fodder for historians and philosophers of science, and in this case they make hay by actually gathering their own data through the recreation of experiments.

Comments are closed.