Resource: Introducing Pypothesis, Part 1: to MarkDown

I’ve been working and writing a lot lately about using the web annotation tool for public scholarship. It has a lot of cool uses ― not only the collaborative annotation of individual web pages, but also the creation of a public research notebook, and the possibility of linking with other apps through the use of their open API.

Based on that work, I’ve created two tools to help people make fuller use of in their work as public scholars. This post is the first in a two-part series introducing and explaining those tools.

Resource: Macroetym: A Command-Line Tool for Macro-Etymological Textual Analysis

I’m proud to introduce macroetym, a command-line tool for macro-etymological textual analysis, which is now available for download with the Python package manager, pip. It’s a complete rewrite of The Macro-Etymological Analyzer, the web tool for macro-etymological analysis I wrote a few years ago, first described in this post, and presented at DH2014. It can now analyze any number of texts, and texts in 250 languages.

Resource: Oral History Digital Toolbox: My Current Favorites

Here are some of my favorite digital tools that may prove useful for core aspects of the oral history process.  I will be adding to this on a continuous basis, adding new tools and categories of tools periodically, so stay tuned.

Collection Management, Exhibit, User Experience

OHMS (enhancing online access, indexing, transcript synchronization, metadata, free, open source)

ArchivesSpace (collection management system, repository)

CollectiveAccess (collection management system, repository, free, open source)

Hydra (repository)

Omeka (online exhibit, collection management, free, open source)

WordPress (content management system, online exhibit, free, open source)

Reclaim Hosting (web host, commercial [but awesome])


Resource: R packages, secret keys, and testing on Appveyor

In the wake of defending my dissertation and landing a new job, I’ve been relaxing by polishing up an R package, PastecR that wraps the API for Pastec, an open-source fuzzy image matching engine.

The creator of Pastec recently launched a hosted version of the service of the service.
I’ve just updated PastecR to talk to both self-hosted instances of Pastec, as well as the SaaS instance at

Resource: 1.8 Million Free Works of Art from World-Class Museums

Since the first stirrings of the internet, artists and curators have puzzled over what the fluidity of online space would do to the experience of viewing works of art… Below the list of galleries, find links to online collections of several hundred art books to read online or download. Continue to watch this space: We’ll add to both of these lists as more and more collections come online.

Resource: Imj, A Web-based Tool for Visual Culture Macroanalytics

So-called “movie barcodes” are both elegant to look at and useful ways to explore how color schemes and designs shift throughout a film. Image montages can also demonstrate how a visual corpus changes over time, and plotting an image set into a graph based on values like hue and saturation could provide a stylistic fingerprint for particular set. I’ve written a post about using ffmpeg (avconv on Ubuntu) with imagemagick, but for someone to follow those steps they’d have to be comfortable working in a command line. I’ve had some success with ImagePlot in the past, but I’ve yet to get it working on my current laptop. Finding nothing satisfactory, I made it myself: Image Macroanalysis in Javascript.

Resource: DOI wrangling for WordPress

This post is a test, but it might also be an announcement. A while back I was working on a plugin that, among other things, added DOI information to a post. It’s handy if you’re using something like ScienceSeeker used to be though not quite enough for ResearchBlogging. I now have a new version of the plug in. It only works with self-hosted ( WordPress installations, and only with DOIs, because at the moment that’s all I need. So it doesn’t do a lot, but I’m hoping what it does it’ll do without breaking down often or causing too much strain on the servers.

Resource: Unix Pipes for Exploring and Cleaning Data

Do you have a spreadsheet, JSON, or plain text file filled with data that you haven’t come to terms with? If you’re using Mac OSX, you already have a powerful tool at your disposal for exploring and cleaning a text-based data set. By using terminal commands, you can get a feel for an otherwise unwieldy file or set of files, searching and sorting in a way that is fast, flexible, and (perhaps most importantly) reproducible.

