Job: Digital Media Theory and Practice (TT)

The School of Writing, Rhetoric and Technical Communication (WRTC) at James Madison University invites applications for a tenure-track assistant professor to begin August 2016.  We seek a colleague who specializes in digital media theory and practice, with a PhD (in hand by August 2016) in Technical and Professional Communication, Rhetoric and Composition or a related field.

Editors’ Choice: Introducing Git-Lit

Creative Commons Image by Douglas Edric Stanley via Flickr

A vibrant discussion followed my March 15th post, “A Proposal for a Corpus Sharing Protocol.”. Carrie Schroeder, Allen Riddel and others on Twitter pointed out that, especially in non-English DH fields, many corpora are already on GitHub. These include texts from the Chinese Buddhist Electronic Text Association, the Open Greek and Latin Project at Leipzig, and papyri from the Integrating Digital Papyrology Project. The Text Creation Partnership has released some 25,000 of their texts in January of this year, and uploaded them to GitHub. One of the more interesting Git corpus projects I became aware of following this discussion is GITenberg. Led by Seth Woodworth, the project scrapes a text from Project Gutenberg, initializes a git repository for it, adds README and CONTRIBUTING files generated from the text’s metadata, and uploads the resulting repository to GitHub. They have gitified around 43,000 works this way. The project also converts Project Gutenberg vanilla plain text into ASCIIDOC—a good example of this is the GITenberg edition of The Adventures of Huckleberry Finn. This is an amazingly ambitious project that holds the promise of wide-ranging applications for editing, versioning, and disseminating literature.

One such application might lie with the 68,000 digital texts recently created by the British Library. James Baker, a digital curator of the British Library, left a comment on my original post, suggesting that the method I describe might be used to parse and post the Library’s texts. He sent me a few sample texts of the ALTO XML documents that the Stanford Literary Lab had used. I adapted some of the GITenberg code to read these texts, generate README files for them, and turn them into GitHub repositories. I’m provisionally calling this project Git-Lit.


Job: Program Coordinator, University of Texas at Austin

Responsible for coordinating data management program at UT Libraries. Collaborates with staff, Texas Advanced Computing Center, Information Technology Services and other campus partners to ensure the UT community is making the best use of the services available to them. Reports to Scholary Communications Librarian.

Editors’ Choice: Ecosytems of People + Machines Can Help Crowdsourcing Projects

Back in September last year I blogged about the implications for cultural heritage and digital humanities crowdsourcing projects that used simple tasks as the first step in public engagement of advances in machine learning that mean that fun, easy tasks like image tagging and text transcription could be done by computers. (Broadly speaking, ‘machine learning’ is a label for technologies that allow computers to learn from the data available to them. It means they don’t have to specifically programmed to know how to do a task like categorising images – they can learn from the material they’re given.) One reason I like crowdsourcing in cultural heritage so much is that time spent on simple tasks can provide opportunities for curiosity, help people find new research interests, and help them develop historical or scientific skills as they follow those interests. People can notice details that computers would overlook, and those moments of curiosity can drive all kinds of new inquiries. I concluded that, rather than taking the best tasks from human crowdsourcers, ‘human computation‘ systems that combine the capabilities of people and machines can free up our time for the harder tasks and more interesting questions.

I’ve been thinking about ‘ecosystems’ of crowdsourcing tasks since I worked on museum metadata games back in 2010. An ecosystem of tasks – for example, classifying images into broad types and topics in one workflow so that people can find text to transcribe on subjects they’re interested in, and marking up that text with relevant subjects in a final workflow – means that each task can be smaller (and thereby faster and more enjoyable). Other workflows might validate the classifications or transcribed text, allowing participants with different interests, motivations and time constraints to make meaningful contributions to a project. The New York Public Library’s Building Inspector is an excellent example of this – they offer five tasks (checking or fixing automatically-detected building ‘footprints’, entering street numbers, classifying colours or finding place names), each as tiny as possible, which together result in a complete set of checked and corrected building footprints and addresses. (They’ve also pre-processed the maps to find the building footprints so that most of the work has already been done before they asked people to help.)

Job: Director of Services and Operations, HathiTrust

The Director of Services and Operations (DSO) has primary responsibility for the operations and services of the HathiTrust preservation and access repository, overseeing the day-to-day management of HathiTrust’s core services to users and members, defining priorities for teams that manage HathiTrust’s core infrastructure, and driving the development of related policy and standard processes.

