Editors’ Choice: On Archiving Tweets

A Ferguson Twitter Archive

Much has been written about the significance of Twitter as the recent events in Ferguson echoed round the Web, the country, and the world. I happened to be at the Society of American Archivists meeting 5 days after Michael Brown was killed. During our panel discussion someone asked about the role that archivists should play in documenting the event.

There was wide agreement that Ferguson was a painful reminder of the type of event that archivists working to “interrogate the role of power, ethics, and regulation in information systems” should be documenting. But what to do? Unfortunately we didn’t have time to really discuss exactly how this agreement translated into action.

Fortunately the very next day the Archive-It service run by the Internet Archive announced that they were collecting seed URLs for a Web archive related to Ferguson. It was only then, after also having finally read Zeynep Tufekci‘s terrific Medium post, that I slapped myself on the forehead … of course, we should try to archive the tweets. Ideally there would be a “we” but the reality was it was just “me”. Still, it seemed worth seeing how much I could get done.

On Archiving Tweets

After my last post about collecting 13 million Ferguson tweets Laura Wrubel from George Washington University’s Social Feed Managerproject recommended looking at how Mark Phillipsmade his Yes All Women collection of tweets available in the University of North Texas Digital Library. By the way, both are awesome projects to check out if you are interested in how access informs digital preservation.

If you take a look you’ll see that only the Twitter ids are listed in the data that you can download. The full metadata that Mark collected (with twarcincidentally) doesn’t appear to be there. Laura knows from her work on the Social Feed Manager that it is fairly common practice in the research community to only openly distribute lists of Tweet ids instead of the raw data.