Resource: Doing OCR within R

One way of doing OCR on your own machine with free tools, is to use Ben Marwick’s pdf-2-text-or-csv.r script for the R programming language. Marwick’s script uses R as wrapper for the Xpdf programme from Foolabs. Xpdf is a pdf viewer, much like Adobe Acrobat, but it comes with OCR bundled within. Using Xpdf on its own can be quite tricky, so Marwick’s script will feed your pdf files to Xpdf, and have Xpdf perform the OCR. There’s a second part to Marwick’s script that will pre-process the resulting text files for various kinds of text analysis, but you can ignore that part for now.

See full post here.