We recently put out a paper on how racial bias functions in Hollywood films. This work was based on a few studies that came before it, namely this one, from USC Annenberg. We presented numerical analyses like the number of characters in different racial and ethnic groups and the number of words spoken by these groups, as well as who occupied the top roles in these films. These numbers give us tangible measures of the visual aspects of these films, but they exclude the entire other half of film: dialogue. We wanted to take this research a step further from other studies, aiming to learn more about racial bias in casting and writing through an analytical study of the dialogue spoken by these characters, to analyze the actual “quality” of the language as a stand-in for the “quality” of a role, and to answer questions like, are people of colour being relegated to the same kinds of roles in the disproportionately few times that they do appear on screen?
This was, predictably, much more difficult to carry out than we had initially thought when we started out last summer.
Using text mining and computational methods, the goal of this aspect of the study was to distance ourselves from any kind of subjective, close interpretation of the dialogue. One way we were able to do this is laid out in the paper. We found that characters who’s racial or ethnic identity could be mapped to a corresponding geographical location (e.g., Latinx characters and Latin America) were more likely to reference cities and countries in that region than white characters were.
This was a relatively straightforward and objective measure. We tried to present it equally objectively and not pull any far-fetched analyses from it. We felt comfortable putting this into our paper without causing any controversy. But we wanted to do more, and try to see whether, on a measurable, linguistic level, people of colour are pigeon-holed in ways that their white counterparts are not.