Like cognitive literary studies, digital humanities must draw on other disciplines, using methods and tools that many humanities scholars aren’t comfortable with. And digital humanities has witnessed similar debates about the extent to which we must immerse ourselves in these other disciplines. Do we, as Stephen Ramsay suggests, have to know how to code, and build things? Do we have to be trained statisticians so that the our text-mining results are “statistically significant? Are we more or less rigorous than the proponents of culturomics, whose work many humanities scholars seem skeptical about? These are questions about method, and interdisciplinarity, and collaboration. And they’re not particularly new questions. But I do think the comparison between digital humanities and cognitive literary studies is a useful one: how can tools and methods from other disciplines help us answer questions in our own?
Tim Hitchcock, another member of the ‘With Criminal Intent’ team, has described how online technologies can change the way we access archives. Instead of being forced to navigate the hierarchical structures that archives impose on records, which in turn tend to reflect the workings of the institutions that created the records, we can directly find the people whose lives were regulated, influenced, shaped or controlled by the policies of those institutions.
Instead of merely hearing ‘the institutional voice… in all its stentorian splendour’, he says, we can listen in to ‘the quieter tones uttered by the individual’.
This reminds us that search boxes, along with other digital tools, themselves embody arguments. There are assumptions built into their code about what is relevant, what is significant, what is necessary.
We can build our own tools of course, and we can critique other people’s algorithms. But what if we just want to collect and share stories?
Linked Data gives us a way to present an alternative to Google’s version of the world. We can argue back against the search engines, defining our own criteria for relevance, and building our own discovery networks.
Changing the way we access resources changes the sorts of stories we can tell.
As one of the first of a ‘new style’ of museum online collections, launching several internet generations ago in 2006, the Powerhouse Museum’s collection database has been undergoing a rethink in recent times. Five years is a very long time on the web and not only has the landscape of online museum collections radically changed, but so to has the way researchers, including curators, use these online collections as part of their own research practices.
Digging through five years of data has revealed a number of key patterns in usage, which when combined with user research paints a very different picture of the value and usefulness of online collections. Susan Cairns, a doctoral candidate at the University of Newcastle, has been working with us to trawl through oodles of data, and interviewing users to help us think about how the next iteration of an online museum collection might need to look like.
A number of our Web Science students are doing work analyzing people’s use of Twitter, and the tools available for them to do so are rather limited since Twitter changed the terms of their service so that the functionality of TwapperKeeper and similar sites has been reduced. There are personal tools like NodeXL (a plugin for Microsoft Excel running under Windows) that do provide simple data capture from social networks, but a study will require long-term data collection over many months that is independent of reboots and power outages.
They say that to a man with a hammer, the solution to every problem looks like a nail. And so perhaps it its unsurprising that I see a role for EPrints in helping students and researchers to gather, as well as curate and preserve, their research data. Especially when the data gathering requires a managed, long-term process that results in a large dataset.
As I wrote here a couple of weeks ago, I’m playing around with a variety of clustering techniques to identify patterns in legal records from the early modern Spanish Empire. In this post, I will discuss the first of my training experiments using Normalized Compression Distance (NCD). I’ll look at what NCD is, some potential problems with the method, and then the results from using NCD to analyze the Criminales Series descriptions of the Archivo Nacional del Ecuador’s (ANE) Series Guide. For what it’s worth, this is a very easy and approachable method for measuring similarity between documents and requires almost no programming chops.
Historians often hope that digitized texts will enable better, faster comparisons of groups of texts. Now that at least the 1grams on Bookworm are running pretty smoothly, I want to start to lay the groundwork for using corpus comparisons to look at words in a big digital library. For the algorithmically minded: this post should act as a somewhat idiosyncratic approach to Dunning’s Log-likelihood statistic. For the hermeneutically minded: this post should explain why you might need _any_ log-likelihood statistic.
Paradox Number One: Social media foments revolution, but a sudden removal of social media can increase mobilization and create even more unrest.
We can all stand witness to the ways in which social and news media can spread a movement within and across nations. I know an Egyptian who claimed that her family and friends knew that the revolution was going to occur in the weeks and days before it actually happened. How? Just by the messages on social media and between individuals. In a similar fashion, social media proposed and flamed the fires of the occupy wall street movement in the weeks before it emerged, grew, and took hold as a real story in mainstream media outlets.
Paradox Number Two: Social media brings networks of people with like interests together, but in doing so it can create information bubbles.
In May of this year Eli Pariser presented a TED Talk in which he warned about how Google, Facebook, and other online companies use algorithms that customize what information is presented to people based on their individual tastes: Thus, just by virtue of being ourselves, our internet is filtered. We go further to filter our own experience when we read websites that cater to our cultural background or to our political interests.
Paper delivered on 29 September 2011 in Special Collections of the University of Cardiff Library as the “Inaugural Annual Cardiff Rare Books and Music Lecture.”
The challenge to our engagement in and with the humanities today is the digital medium. This engagement moves into fresh light and focus in consequence of the medium, since, through the new mediality, ‘what we have always done’ is no longer a matter of course, hence remaining unreflected in itself, but demands instead reflection and questioning.
Elaboration on Nathan Yau recent posting about the different words used for visualization and infographics. Nathan’s definitions are interesting because they reveal quite a bit about his background and main focus, and his blind spots give some insights into the community he’s working in. While Robert does not claim that his view is better or more correct, he simply wanted to provide a second opinion. For example:
data visualization — Graph-like image or interactive, usually tied with data exploration and analysis.
First definition, first major difference: a lot of people in the visualization field would consider data visualization to denote scientific visualization (i.e., volume or flow visualization of data with spatial dimensions), for whatever reason. There may be simple historical reasons for this, or the assumption that scientific visualization deals with more data.
In any case, data visualization has a particular slant, and is not the same as visualization in general, and certainly does not refer to charts or ‘interactives.’
If there are two things that academia doesn’t need, they are another book about Darwin and another blog post about defining the digital humanities. But it’s always right around this time of year that I find myself preparing for my digital history course and being pulled down the contemplative rabbit hole about how describe the nature of the digital humanities to a new and varied audience. But rather than create my own definition, I wanted one cobbled together from everyone else.
For this kind of exercise, there’s no better resource than the TAPoR wiki on “How do you Define Humanities Computing / Digital Humanities?“, which presents pithy definitions from ~170 people, who for one reason or another were compelled (thankfully) to offer their own take on the nature of digital humanities or humanities computing. The format surely encouraged sound bites rather than nuanced formulations, but the quick take still reveals the sentiment of the community—perhaps better than longer essays would have. What follows is my categorization of the responses from 2011.