About the workshop:
The goal of this workshop is to offer a very practical introduction to the many other methods of using large digitized archives only possible with direct access to the data. Participants will learn how to organize and analyze textual data and get an overview of advances in natural language processing and machine learning. Hands-on training will use textual data from History Lab, an NSF-funded project that has aggregated the largest database of declassified government documents in the world. Participants will also learn how to get their own data by “scraping” websites and downloading from online databases. More specifically, we will examine how to bring textual data into Python and R, how to use Python for web scraping, and how to explore textual data using string functions. These methods make it possible to grapple with old research problems with new rigor, and launch entirely new kinds of inquiries.