RESOURCE: Data Mining the Internet Archive Collection

A new lesson by Caleb McDaniel on The Programming Historian focuses on downloading and analyzing records from the Internet Archive.

In this lesson, you’ll learn how to download files from such collections using a Python module specifically designed for the Internet Archive. You will also learn how to use another Python module designed for parsing MARC XML records, a widely used standard for formatting bibliographic metadata.

Continue reading “Data Mining the Internet Archive Collection.”