Presentation from the 3rd EBLIDA-LIBER Workshop on Digitisation, October 2011
For our third interview, I am thrilled to have a chance to chat with Brett Bobley, the CIO and Director of the Office for Digital Humanities at the National Endowment for the Humanities. I wanted to catch up with him on how some of the work NEH is supporting under the Digging into Data grants might connect with issues around the preservation and access of digital content.
Like cognitive literary studies, digital humanities must draw on other disciplines, using methods and tools that many humanities scholars aren’t comfortable with. And digital humanities has witnessed similar debates about the extent to which we must immerse ourselves in these other disciplines. Do we, as Stephen Ramsay suggests, have to know how to code, and build things? Do we have to be trained statisticians so that the our text-mining results are “statistically significant? Are we more or less rigorous than the proponents of culturomics, whose work many humanities scholars seem skeptical about? These are questions about method, and interdisciplinarity, and collaboration. And they’re not particularly new questions. But I do think the comparison between digital humanities and cognitive literary studies is a useful one: how can tools and methods from other disciplines help us answer questions in our own?
Tim Hitchcock, another member of the ‘With Criminal Intent’ team, has described how online technologies can change the way we access archives. Instead of being forced to navigate the hierarchical structures that archives impose on records, which in turn tend to reflect the workings of the institutions that created the records, we can directly find the people whose lives were regulated, influenced, shaped or controlled by the policies of those institutions.
Instead of merely hearing ‘the institutional voice… in all its stentorian splendour’, he says, we can listen in to ‘the quieter tones uttered by the individual’.
This reminds us that search boxes, along with other digital tools, themselves embody arguments. There are assumptions built into their code about what is relevant, what is significant, what is necessary.
We can build our own tools of course, and we can critique other people’s algorithms. But what if we just want to collect and share stories?
Linked Data gives us a way to present an alternative to Google’s version of the world. We can argue back against the search engines, defining our own criteria for relevance, and building our own discovery networks.
Changing the way we access resources changes the sorts of stories we can tell.
As one of the first of a ‘new style’ of museum online collections, launching several internet generations ago in 2006, the Powerhouse Museum’s collection database has been undergoing a rethink in recent times. Five years is a very long time on the web and not only has the landscape of online museum collections radically changed, but so to has the way researchers, including curators, use these online collections as part of their own research practices.
Digging through five years of data has revealed a number of key patterns in usage, which when combined with user research paints a very different picture of the value and usefulness of online collections. Susan Cairns, a doctoral candidate at the University of Newcastle, has been working with us to trawl through oodles of data, and interviewing users to help us think about how the next iteration of an online museum collection might need to look like.
A number of our Web Science students are doing work analyzing people’s use of Twitter, and the tools available for them to do so are rather limited since Twitter changed the terms of their service so that the functionality of TwapperKeeper and similar sites has been reduced. There are personal tools like NodeXL (a plugin for Microsoft Excel running under Windows) that do provide simple data capture from social networks, but a study will require long-term data collection over many months that is independent of reboots and power outages.
They say that to a man with a hammer, the solution to every problem looks like a nail. And so perhaps it its unsurprising that I see a role for EPrints in helping students and researchers to gather, as well as curate and preserve, their research data. Especially when the data gathering requires a managed, long-term process that results in a large dataset.
As I wrote here a couple of weeks ago, I’m playing around with a variety of clustering techniques to identify patterns in legal records from the early modern Spanish Empire. In this post, I will discuss the first of my training experiments using Normalized Compression Distance (NCD). I’ll look at what NCD is, some potential problems with the method, and then the results from using NCD to analyze the Criminales Series descriptions of the Archivo Nacional del Ecuador’s (ANE) Series Guide. For what it’s worth, this is a very easy and approachable method for measuring similarity between documents and requires almost no programming chops.
Historians often hope that digitized texts will enable better, faster comparisons of groups of texts. Now that at least the 1grams on Bookworm are running pretty smoothly, I want to start to lay the groundwork for using corpus comparisons to look at words in a big digital library. For the algorithmically minded: this post should act as a somewhat idiosyncratic approach to Dunning’s Log-likelihood statistic. For the hermeneutically minded: this post should explain why you might need _any_ log-likelihood statistic.
Paradox Number One: Social media foments revolution, but a sudden removal of social media can increase mobilization and create even more unrest.
We can all stand witness to the ways in which social and news media can spread a movement within and across nations. I know an Egyptian who claimed that her family and friends knew that the revolution was going to occur in the weeks and days before it actually happened. How? Just by the messages on social media and between individuals. In a similar fashion, social media proposed and flamed the fires of the occupy wall street movement in the weeks before it emerged, grew, and took hold as a real story in mainstream media outlets.
Paradox Number Two: Social media brings networks of people with like interests together, but in doing so it can create information bubbles.
In May of this year Eli Pariser presented a TED Talk in which he warned about how Google, Facebook, and other online companies use algorithms that customize what information is presented to people based on their individual tastes: Thus, just by virtue of being ourselves, our internet is filtered. We go further to filter our own experience when we read websites that cater to our cultural background or to our political interests.
Paper delivered on 29 September 2011 in Special Collections of the University of Cardiff Library as the “Inaugural Annual Cardiff Rare Books and Music Lecture.”
The challenge to our engagement in and with the humanities today is the digital medium. This engagement moves into fresh light and focus in consequence of the medium, since, through the new mediality, ‘what we have always done’ is no longer a matter of course, hence remaining unreflected in itself, but demands instead reflection and questioning.