CFParticipation: Programming for Humanists at TAMU

From the course description:

This Spring, 2015, the Programming4Humanists course at Texas A&M University will offer a more in-depth view of XSLT than covered in previous courses, focusing on it for four 2-hour classes. Participants will learn to change an archive of TEI documents into multiple formats: XML (i.e., when changes need to be made to an archive’s TEI encoding), HTML, ePubs, and database files. Three classes will be spent learning R for text mining and analysis. Next, two classes on XPath and XQuery, one each. These classes will introduce participants to ways of manipulating an archive of TEI/XML documents. Finally, five 2-hour classes will introduce participants to the process of OCR’ing historical documents using open source tools developed by the Initiative for Digital Humanities, Media, and Culture (IDHMC) for eMOP, the Mellon-funded Early Modern OCR Project (http://emop.tamu.edu). OCR or Optical Character Recognition Engines allow transforming page images into mechanically-typed texts. For eMOP, IDHMC created training sets of fonts to be used for early modern texts. These training sets can be used with Tesseract, the open-source OCR engine used by Google Books.

Learn More: Programming for Humanists at TAMU.