A common problem in search and exploration interfaces is the vocabulary problem. This refers to the great variety of words with which different people can use to describe the same concept. For people exploring a text collection, this makes search difficult. There are only a limited number different queries they can think of to describe that concept, but they may be missing many other instances that use different words. This is an important issue for humanities scholars. Often, the very first step of a literature analysis is to comb through text, trying to find  thought-provoking examples to study later.

In this post, I give an example of how our project WordSeer, a text analysis environment for humanities scholars, can be used to overcome this problem.

Just as THATCamp challenges attendees to set and steer the agenda, Startup Weekend leaves a lot up to the participants, who have 54 hours to pitch a product idea (typically tech-related), form teams, validate their idea, develop a business model, and put together a demo and a longer pitch.

Some might wonder what entrepreneurship training has to do with the digital humanities (DH), but I believe that the two communities have much in common and can learn from each other…. While DH projects typically don’t form companies and don’t aim to make a profit, most do need to consider how to define their value, find users and sustain themselves.

Searching for information might seem like one of the most routine and commonplace activities of university life.  However, as students work within an information environment that is increasingly open and dynamically changing, research assignments also represent a complex and potentially daunting task, and one that is fraught with embedded social and cultural processes and relationships.

The Ethnographic Research in Illinois Academic Libraries (ERIAL) Project was a two-year study of student research practices involving a collaborative effort of five Illinois universities….  Using a mixed-methods approach that integrated nine qualitative research techniques and included over 600 participants, the ERIAL project sought to gain a better understanding of undergraduates’ research processes based on first-hand accounts of how they obtained, evaluated, and managed information for their assignments.

There are lots of tools out there that aggregate existing information and even organize it for users to interpret. Since the early Hypercities, GIS tools, for instance, have been very much the rage among humanists who wish to add geographical and census data to enhance the “lived experience” of a text. But there are fewer tools that actually build an archive of live interpretation—as opposed to facts layered and ready for interpretation–around a stable text. And that’s where what I call “Reading with the Stars” comes in.

And this brings me to the danger inherent in Culturomics. First, machine-readable texts do not, and will never, represent the totality of the human experience. What about paintings, illustrations, and photographs, statues and figurative art, architecture, music, material culture, and ecology? What about oral history? What about economic, statistical and demographic evidence? Although there are millions upon millions of books, magazines, newspapers, and other printed material, they represent only the visible, privileged, literate tip of a vast store of human culture.

The CUNY Digital Humanities Initiative has released video from two recent events:

Digital Humanities in the Library, November 18, 2011

  • Ben Vershbow (NYPL) on “NYPL Labs: Hacking the Library”

Digital Humanities in the Classroom, October 18, 2011

  • Mark Sample, “Building and Sharing When You’re Supposed to be Teaching”
  • Shannon Mattern, “Beyond the Seminar Paper: Setting New Standards for New Forms of Student Work”

Watch the videos here.



What we ended up with was a new way of seeing and understanding the records — not as the remnants of bureaucratic processes, but as windows onto the lives of people. All the faces are linked to copies of the original certificates and back to the collection database of the National Archives. So this is also a finding aid. A finding aid that brings the people to the front.

According to Margaret Hedstrom the archival interface ‘is a site where power is negotiated and exercised’.[1] Whether in a reading room or online, finding aids or collection databases are ‘neither neutral nor transparent’, but the product of ‘conscious design decisions’. We would like to think that this interface gives some power back to the people within the records.

Editors’ Note: Many scholars working in the Digital Humanities are thinking about the theory, design, and social and pedagogical impact of games. The posts below cover some of the variety of issues within this field. Further discussion will occur at THATCamp Games, January 20-22, 2012 at the University of Maryland-College Park. Please Tweet @dhnow or email dhnow [at] pressforward [dot] org if you have more to suggest. *updated 12/1/11*

Ted Underwood has been talking up the advantages of the Mann-Whitney test over Dunning’s Log-likelihood which is currently more widely used. I’m having trouble getting M-W running on large numbers of texts as quickly as I’d like, but I’d say that his basic contention–that Dunning log-likelihood is frequently not the best method–is definitely true, and there’s a lot to like about rank-ordering tests.

Before I say anything about the specifics, though, I want to make a more general point first, about how we think about comparing groups of texts.The most important difference between these two tests rests on a much bigger question about how to treat the two corpuses we want to compare.

Collaboration is the lynchpin to supporting all of this productivity, learning, experimenting, and knowledge acquisition. This unwritten goal was reinforced by a few tech industry magnates at Stanford’s BiblioTech Symposium last year: the CEOs want liberal arts and humanities doctoral students who can command language, interpret technical jargon into metaphor and narrative, and work collaboratively in team situations. Humanities scholars often think of themselves as the lonely bibliophiles in the library stacks, quietly slaving over monographs. But, Digital Humanities has altered that paradigm — even required that Humanists consider exposing their collaborative work, even if it isn’t digitally-inclined.