Report: A Research Agenda for Historical and Multilingual Optical Character Recognition

About the report:

The Office of Digital Humanities (ODH) is excited to announce the publication of an important new report titled “A Research Agenda for Historical and Multilingual Optical Character Recognition.” The report, funded by The Andrew W. Mellon Foundation and authored by David Smith and Ryan Cordell of Northeastern University, outlines a set of 9 recommendations to improve historical and multilingual OCR. The full report may be found online here: https://ocr.northeastern.edu/report/

The report is the culmination of about two years of research, surveys, conversations, and in-depth interviews with scholars who work on OCR and rely on OCR’d texts to do their work, with computer and information scientists working toward improving OCR, with librarians who manage digital collections, and with funders who support projects that use and refine OCR methods. The recommendations in the report range from developing methods for improving statistical analysis of OCR output to exploiting existing digital editions for training and test data to convening OCR institutes in critical research areas.

Read more here.