Editors’ Choice: Mining Twitter data with R, TidyText, and TAGS

One of the best places to get your feet wet with text mining is Twitter data. Though not as open as it used to be for developers, the Twitter API makes it incredibly easy to download large swaths of text from its public users, accompanied by substantial metadata. A treasure trove for data miners that is relatively easy to parse.

It’s also a great source of data for those studying the distribution of (mis)information via digital media. This is something I’ve been working on a lot lately, both in independent projects and in preparation for my courses on Digital Storytelling, Digital Studies, and The Internet. It’s amazing how much data you can get, and how detailed a picture it can paint about how citizens, voters, and activists find and disseminate information. Most recently, Bill Fitzgerald and I have begun discussing a project analyzing the distribution of (mis)information in extremist, so-called “alt-right” circles on Twitter, and comparing the language and information sources of left- and right-leaning communities on Twitter.

Read full post here.