Illustration of a graph

Editors’ Choice: Old Bailey Proceedings Part 1 – Offences

If you know me, the topic of this first post may come as unsurprising but also a bit eyebrow-raising. “Sharon, you’ve been working on the Old Bailey Online project (OBO) since forever. Aren’t you bored with it yet?”

Meanwhile, those who don’t know me might more likely be asking, “What are the Old Bailey Proceedings?” So, a bit of background. The Old Bailey Proceedings is the name most commonly given to a series of trial reports that were published from 1674-1913.

the largest body of texts detailing the lives of non-elite people ever published, containing 197,745 criminal trials held at London’s central criminal court.

I’ve been project manager for this and a number of spin-off projects (which I’ll undoubtedly write about in future posts; brace yourselves) since 2006. And yet I only started to really dig into the Proceedings data quite recently. This is because it consists of more than 2000 intimidatingly complicated XML files, reflecting the complexity of a criminal trial – there can be multiple defendants, multiple charged offences and multiple outcomes. The central aim of the project from its conception was to accurately represent this complexity as well as provide searchable full text.

In fact, I spent several years cheerfully encouraging others to use our data while I had no real idea how to go about doing so myself. In 2011, the project released the Old Bailey API, and I started to tinker with that, but I didn’t really get very far, until a couple of years ago I finally bought myself a book on XQuery and got down to it. And then I started to discover exactly how complicated the XML is. (So I’ve also been thinking about ways to make it more accessible; putting the XML files on our institutional repository is a good start but it’s really just a start.)

The data

There are two facets to the the Proceedings data: firstly, structured markup that enables searching and quantitative analysis of many aspects of the published trials (especially the characteristics of defendants, offences, jury verdicts, sentences); and secondly, the full text of the reports, amounting to more than 125 million words in total.

My first two posts are tasters of a few of the structured data categories: here, I’ll look at how offences tried at the Old Bailey changed over the 250 years documented in the Proceedings; in the second post, I’ll look at defendants’ gender and offending. In subsequent posts I’ll start to explore the text of trial reports.


Read the full post here.