Hello again, everybody! I’m back this semester as a DH Prototyping Fellow, and together, Alyssa Collins and I are working on a project titled “Twitterature: Methods and Metadata.” Specifically, we’re hoping to develop a simple way of using Twitter data for literary research. The project is still in its early stages, but we’ve been collecting a lot of data and are now beginning to visualize it (I’m particularly interested in the geolocation of tweets, so I’m trying out a few mapping options). In this post, I want to layout our methods for collecting Twitter data.
Okay, Alyssa and I have been using a python based Twitter scraping script, which we modified to search Twitter without any time limitations (the official Twitter search function is limited to tweets of the past two weeks). So, to run the Twitter scraping script, I entered the following in my command line: python3 TwitterScraper.py. This command then prompted for the search term and the dates within which I wanted to run my search. For this post, I ran the search term #twitterature (and no, the python scraper has no problem handling hashtags as part of the search query!). After entering the necessary information, the command would create both a txt and a csv file with the results of my search.
Given Twitter’s strict regulations on data usage, the csv files created from my Twitter mining list only a limited amount of information about the tweet, while the txt files just contain the Tweet IDs (a distinct, identifying number that is assigned to each Tweet) that matched my search query.