The purpose of this ebook is to provide a brief overview of the Ruby programming language and consider ways Ruby (or any other programming language) can be applied to the day-to-day operations of humanities scholars.  Once you complete this book, you should have a good understanding of Ruby basics, be able to complete basic tasks with Ruby, and hopefully leave with a solid basis that will allow you to continue learning.

Read ebook Here

For our third interview, I am thrilled to have a chance to chat with Brett Bobley, the CIO and Director of the Office for Digital Humanities at the National Endowment for the Humanities. I wanted to catch up with him on how some of the work NEH is supporting under the Digging into Data grants might connect with issues around the preservation and access of digital content.

Read Full Post Here

Like cognitive literary studies, digital humanities must draw on other disciplines, using methods and tools that many humanities scholars aren’t comfortable with. And digital humanities has witnessed similar debates about the extent to which we must immerse ourselves in these other disciplines. Do we, as Stephen Ramsay suggests, have to know how to code, and build things? Do we have to be trained statisticians so that the our text-mining results are “statistically significant? Are we more or less rigorous than the proponents of culturomics, whose work many humanities scholars seem skeptical about? These are questions about method, and interdisciplinarity, and collaboration. And they’re not particularly new questions.

Tim Hitchcock, another member of the ‘With Criminal Intent’ team, has described how online technologies can change the way we access archives. Instead of being forced to navigate the hierarchical structures that archives impose on records, which in turn tend to reflect the workings of the institutions that created the records, we can directly find the people whose lives were regulated, influenced, shaped or controlled by the policies of those institutions.

Instead of merely hearing ‘the institutional voice… in all its stentorian splendour’, he says, we can listen in to ‘the quieter tones uttered by the individual’.[8]

As one of the first of a ‘new style’ of museum online collections, launching several internet generations ago in 2006, the Powerhouse Museum’s collection database has been undergoing a rethink in recent times. Five years is a very long time on the web and not only has the landscape of online museum collections radically changed, but so to has the way researchers, including curators, use these online collections as part of their own research practices.

Digging through five years of data has revealed a number of key patterns in usage, which when combined with user research paints a very different picture of the value and usefulness of online collections.

A number of our Web Science students are doing work analyzing people’s use of Twitter, and the tools available for them to do so are rather limited since Twitter changed the terms of their service so that the functionality of TwapperKeeper and similar sites has been reduced. There are personal tools like NodeXL (a plugin for Microsoft Excel running under Windows) that do provide simple data capture from social networks, but a study will require long-term data collection over many months that is independent of reboots and power outages.

As I wrote here a couple of weeks ago, I’m playing around with a variety of clustering techniques to identify patterns in legal records from the early modern Spanish Empire. In this post, I will discuss the first of my training experiments using Normalized Compression Distance (NCD). I’ll look at what NCD is, some potential problems with the method, and then the results from using NCD to analyze the Criminales Series descriptions of the Archivo Nacional del Ecuador’s (ANE) Series Guide. For what it’s worth, this is a very easy and approachable method for measuring similarity between documents and requires almost no programming chops.

Read Full Post Here

Historians often hope that digitized texts will enable better, faster comparisons of groups of texts. Now that at least the 1grams on Bookworm are running pretty smoothly, I want to start to lay the groundwork for using corpus comparisons to look at words in a big digital library. For the algorithmically minded: this post should act as a somewhat idiosyncratic approach to Dunning’s Log-likelihood statistic. For the hermeneutically minded: this post should explain why you might need _any_ log-likelihood statistic.

Read Full Post Here

Paradox Number One:  Social media foments revolution, but a sudden removal of social media can increase mobilization and create even more unrest.

We can all stand witness to the ways in which social and news media can spread a movement within and across nations.  I know an Egyptian who claimed that her family and friends knew that the revolution was going to occur in the weeks and days before it actually happened.  How?  Just by the messages on social media and between individuals.  In a similar fashion, social media proposed and flamed the fires of the occupy wall street movement in the weeks before it emerged, grew, and took hold as a real story in mainstream media outlets.