Editors’ Choice: Distributions of Words Across Narrative Time in 27,266 Novels

Over the course of the last few months here at the Literary Lab, I’ve been working on a little project that looks at the distributions of individual words inside of novels, when averaged out across lots and lots of texts. This is incredibly simple, really – the end result is basically just a time-series plot for a word, similar to a historical frequency trend. But, the units are different – instead of historical time, the X-axis is what Matt Jockers calls “narrative time,” the space between the beginning and end of a book.

In a certain sense, this grew out of a project I worked on a couple years ago that did something similar in the context of an individual text – I wrote a program called Textplot that tracked the positions of words inside of novels and then found words that “flock” together, that show up in similar regions of the narrative. This got me thinking – what if you did this with lots of novels, instead of just one? Beyond any single text – are there general trends that govern the positions of words inside of novels at a kind of narratological level, split away from any particular plot? Or would everything wash out in the aggregate? Averaged out across thousands of texts – do individual words rise and fall across narrative time, in the way they do across historical time? If so – what’s the “shape” of narrative, writ large?

Read the full post here.