I had the opportunity to return to Carnegie Mellon this week to lecture on archives and access, which was a real pleasure. My friend and former colleague, Jon Klancher, heard the talk and asked if I had plans to publish it. Unfortunately, I’m not in a position to do that now. But I thought it would be useful to lay out one strand of the talk in this post, in which I attempt to isolate what I take to be three basic critical gestures that are performed in both “traditional” and “iterative” literary criticism. These gestures are: pointing, circling, and naming.
Before giving examples of these three gestures, a few words about the procedures that make them possible, at least in the digital context: establishing a corpus, defining the limits of a single text, saying what will count as a token, and dividing these tokens (usually words or strings of words) into types.
As an example, I take some of the work we’re doing on the Mellon grant, Visualizing English Print (VEP), which is currently focused on a corpus of 1080 EEBO/TCP texts. We have begun to see some interesting patterns in this corpus, but the preparatory work can be outlined as follows. A group of texts is established: 1080 texts from the EEBO/TCP corpus, each over 500 words long, representing a selection of the full available contents in that corpus. The years represented in the sample range from 1530-1699.
The boundaries and contents of individual texts must also be established: we take each of the items in the corpus and treat the transcribed contents as stable, relying on the manual process of transcription to render the wayward glyph stream that is early modern print into an iterable, stable series of characters. The boundaries of the physical document are now the boundaries of a digital text that, among other things, can be addressed as a container of words.
Next we consider each text as a collection of tokens – words or strings of words – that can be aggregated into subgroups of types. This is a motivated process, since no text will tell you how to group its contents. The decision to group certain tokens into types is just that: a decision, or motivated judgment based on a set of interpretive criteria whose result – words grouped into types – is a tokenization scheme. (One can use each word as a type, but that is simply a limit case.) If the tokenization scheme is that of Docuscope, we will be working with the familiar series of types that Jonathan Hope and I advert to in our analyses: FirstPerson, CommonAuthority, etc.