Editors’ Choice: A Text-Mining and Visualization Roundup

Editors’ Note: There were a number of recent posts discussing the use of text-mining and visualizations in humanities research. A few, offering a variety of perspectives, are presented below.

Lev Manovich, the meaning of statistics and digital humanities

“Given that production of summaries is the key characteristics of human culture, I think that such traditional summaries created manually should not be opposed to more recent algorithmically produced summaries such as a word cloud or a topic model, or the graphs and numerical summaries of descriptive statistics, or binary (i.e., only black and wite without any gray tones) summaries of photographs created with image processing (in Photoshop, use Image > Adjustments > Threshold). Instead, all of them can be situated on a single continuous dimension.

…

I don’t know if my arguments will help us when we are criticized by people who keep insisting on a wrong chain of substitutions: digital humanities=statistics=science=bad. But if we keep explaining that statistics is not only about inferences and numbers, gradually we will be misunderstood less often.”

Elijah Meeks, Good Data, Bad Data

“Exploratory analysis is typically construed to mean exploration of the material for its content as it applies to a research agenda, but much of what is done under the auspices of the digital humanities is exploration of the methods (either theoretically or enshrined in tools) for their suitability to a research area. It seems healthier to think one can sample and experiment with a variety of methods, and find methodological successes even when forced to retreat from claims that advance a particular research question. Or it could be that my position in supporting research allows me to value the methodological component higher than the traditional research agendas.”

Michael Simeone, Visualizing Topic Models with Force-Directed Graphs

“Force-directed graphs are tricky. At their best, the perspective they offer can be very helpful; data points cluster into formations that feel intuitive and look approachable. At their worst, though, they can be too cluttered, and the algorithms that make everything fall into place can deceive as much as they clarify.

But there’s still a good chance that, despite the problems that come along with making a network model of anything (and the problems introduced by making network models of texts), they can still be helpful for interpreting topic models. Visualizations aren’t exactly analysis, so what I share below is meant to raise more questions than answers. We also tried to represent as many aspects of the data as possible without breaking (or breaking only a little) the readability of the visualizations. There were some very unsuccessful tries before we arrived at what is below.

…

My hope is that these visualizations can be insightful and might help us work through the benefits and disadvantages of force-directed layouts for visualizing topic models.”

Jason Mittell, Caption Mining at the Crossroads of Digital Humanities & Media Studies

“There are lots of possibilities for making discoveries about the language of a film or television text, but this tool raises one large caution flag: we cannot mistakenly reduce a moving image work to its dialogue. There is a long tradition of scholars trained to study language & literature treating film texts just as they consider printed work, focusing on narrative structures, verbal style, metaphors, etc., but paying scant attention to visual style, music, performance, temporal systems, or other formal elements that make film essentially than literature. But with that caution in mind, we shouldn’t ignore a moving image text’s dialogue and verbal systems, and I hope that ccextractor offers a useful tool to provide some new access to these elements.”