Category: Editors’ Choice

Editors’ Choice: How do we model stereotypes without stereotyping?

Box Plot

We recently put out a paper on how racial bias functions in Hollywood films. This work was based on a few studies that came before it, namely this one, from USC Annenberg. We presented numerical analyses like the number of characters in different racial and ethnic groups and the number of words spoken by these groups, as well as who occupied the top roles in these films. These numbers give us tangible measures of the visual aspects of these films, but they exclude the entire other half of film: dialogue. We wanted to take this research a step further from other studies, aiming to learn more about racial bias in casting and writing through an analytical study of the dialogue spoken by these characters, to analyze the actual “quality” of the language as a stand-in for the “quality” of a role, and to answer questions like, are people of colour being relegated to the same kinds of roles in the disproportionately few times that they do appear on screen?

This was, predictably, much more difficult to carry out than we had initially thought when we started out last summer.

Using text mining and computational methods, the goal of this aspect of the study was to distance ourselves from any kind of subjective, close interpretation of the dialogue. One way we were able to do this is laid out in the paper. We found that characters who’s racial or ethnic identity could be mapped to a corresponding geographical location (e.g., Latinx characters and Latin America) were more likely to reference cities and countries in that region than white characters were.

This was a relatively straightforward and objective measure. We tried to present it equally objectively and not pull any far-fetched analyses from it. We felt comfortable putting this into our paper without causing any controversy. But we wanted to do more, and try to see whether, on a measurable, linguistic level, people of colour are pigeon-holed in ways that their white counterparts are not.

Read the full post here.

Editors’ Choice: The Tate Uses Wikipedia for Artist Biographies, and I’m OK With It

Image of a computer keyboard

Recently, several folks on Twitter have noted their displeasure that the Tate appears to be linking to Wikipedia articles in lieu of authoring their own written biographies of artists represented in their collections.

I… actually don’t have a problem with what the Tate is doing.

A screenshot of the Tate’s citation of Wikipedia

A screenshot of the Tate’s citation of Wikipedia on an overview web page for Jackson Pollock.

Except for a few unique institutions founded around a single artist’s estate, very few art museums really have the authority, or, frankly, the mission, to be authorities on the biographies of the artist in their collections. It would be one thing if the Tate were deferring to Wikipedia articles about the unique objects within it’s collection. Bendor Grovesnor erroneously suggests that the Tate copying and pasting this for their collection catalog entries, but they are not. Instead, they’re using it for that most unsatisfying categories of copy expelled by art museums: the artist biography.

As a graduate student and curatorial fellow at the National Gallery of Art, I spent hours and hours of expert time drafting biographies of artists represented in that museum’s Dutch collections. This was almost always a secondary literature review (thank goodness, no responsible museum board will fund research trips to archives to write three-paragraph biographical blurbs!) I and my colleagues generated some quite rich and educational copy for the website, and it was a lovely learning experience… for us, the students.

However, except for the most minor artists, we were mostly just rewording and enriching well-covered biographies from the Benezit Dictionary of Artists or Grove® Art Online. Hours of expert research time was basically spent reinventing the wheel – something that absolutely did not have to be done for ridiculously well-biographied artists like Rembrandt. Any one of these hours could have been better applied researching and communicating what was unique to our museum: the specific objects in the collection itself.

Read the full post here.

Editors’ Choice: ‘Making such bargain’: Transcribe Bentham and the quality and cost-effectiveness of crowdsourced transcription

Image of an open laptop on a desk

We (Tim Causer, Kris Grint, Anna-Maria Sichani, and me!) have recently published an article in Digital Scholarship in the Humanities on the economics of crowdsourcing, reporting on the Transcribe Bentham project, which is formally published here:

Alack, due to our own economic situation, its behind a paywall there. Its also embargoed for two years in our institutional repository (!). But I’ve just been alerted to the fact that the license of this journal allows the author to put the “post-print on the authors personal website immediately”. Others publishing in DSH may also not be aware of this clause in the license!

So here it is, for free download, for you to grab and enjoy in PDF.

I’ll stick the abstract here. It will help people find it!

In recent years, important research on crowdsourcing in the cultural heritage sector has been published, dealing with topics such as the quantity of contributions made by volunteers, the motivations of those who participate in such projects, the design and establishment of crowdsourcing initiatives, and their public engagement value. This article addresses a gap in the literature, and seeks to answer two key questions in relation to crowdsourced transcription: (1) whether volunteers’ contributions are of a high enough standard for creating a publicly accessible database, and for use in scholarly research; and (2) if crowdsourced transcription makes economic sense, and if the investment in launching and running such a project can ever pay off. In doing so, this article takes the award-winning crowdsourced transcription initiative, Transcribe Bentham, which began in 2010, as its case study. It examines a large data set, namely, 4,364 checked and approved transcripts submitted by volunteers between 1 October 2012 and 27 June 2014. These data include metrics such as the time taken to check and approve each transcript, and the number of alterations made to the transcript by Transcribe Bentham staff. These data are then used to evaluate the long-term cost-effectiveness of the initiative, and its potential impact upon the ongoing production of The Collected Works of Jeremy Bentham at UCL. Finally, the article proposes more general points about successfully planning humanities crowdsourcing projects, and provides a framework in which both the quality of their outputs and the efficiencies of their cost structures can be evaluated.

 

Read the full piece here.

Editors’ Choice: Doing the work – Editing Wikipedia as an act of reconciliation

Image of a computer keyboard

Since its establishment in 2001, the English version of Wikipedia[1] has grown to host more than 5.6 million articles that reflect content ranging from culture and the arts to technology and the applied sciences. Consistently ranked as one of the top visited sites on the Internet, Wikipedia provides an open and freely accessible resource of interconnected information that anyone can edit. Unfortunately, not everyone actually does. Nine out of ten editors are male. The average Wikipedian is an educated, English-speaking citizen of a majority-Christian nation in the global north. They are technically proficient and likely hold, or are skilled enough to hold, white-collar employment. Not surprisingly, these commonalities have introduced systemic bias to the manner in which content is generated, updated, and, most critically, omitted from the site.

Pages about trans and cis women, gender non-conforming people, cultural communities in the global south, those living in poverty, and people without internet access are chronically underrepresented on Wikipedia. This includes groups in developing nations, as well as racialized and systemically marginalized groups in economically wealthy countries, such as the Black and Latinx communities in the United States. Equally absent are pages about Indigenous peoples[2], communities, and cultures. As of August 2018 there were 3,468 articles within the scope of the Indigenous Peoples of the Americas WikiProject. This number represents only 0.06% of the articles on English-language Wikipedia, with an even smaller percentage relating to First Nations, Inuit, and Métis peoples in what is currently known as Canada. Overall, representation of Indigenous-focused content is sorely lacking.

As settlers living and working as archivists on the traditional territories of ‎the Neutral, Anishnaabeg, Métis, and Haudenosaunee peoples — Danielle on the Haldimand Tract, land extending six miles from each side of the Grand River that was promised to the Six Nations, and Krista on Robinson-Huron Treaty territory — we have personally and professionally considered the Truth and Reconciliation Commission of Canada Calls to Action (TRC) that outline the responsibilities of cultural heritage workers to educate both themselves and the general public about the Canadian Indian Residential School System (Residential Schools). In working to do so, however, we recognize that Residential Schools were but one of the many horrific consequences of settler colonialism. Meaningful engagement with the reconciliation process and Indigenous communities in Canada means raising awareness about more than Residential Schools. It means understanding the need for cultural organizations to build relationships with Indigenous communities rooted in solidarity and allyship; centering an ethic that moves beyond rote territorial acknowledgements; and setting aside defensive dismissals of wrongs that happened before we were born in order to prioritize what Senator Murray Sinclair calls “a sense of responsibility for the future.” It also means acknowledging that colonialism continues to impact Indigenous communities and working to break down colonial systems that exist within cultural organizations. We believe that editing Wikipedia through a lens of reconciliation is one way to do so.

 

Read the full post here.

Editors’ Choice: Post-Custodial Archives and Minority Collections

Image of chained books

Last week (July 31, 2018), I had the honor of speaking at CLIR’s (Council on Library and Information Resources) summer seminar for new Postdoctoral Fellows. I was very excited to get the opportunity to meet a new cohort of fellows just as they are beginning their new positions at various institutions. (For more information on CLIR Postdoctoral Fellowships, visit their website! And keep an eye out for the next round of applications this fall/winter.)

My talk centered on the work we do at Recovering the US Hispanic Literary Heritage (aka “Recovery”), the importance of minority archives, and working toward inclusivity. For 27 years, Recovery has dedicated itself to recovering, preserving, and disseminating the lost written legacy of Latinas and Latinos in the United States. US Latina/o collections, like other minority collections, do not traditionally form part of a larger national historical narrative. Herein lies the importance of minority collections: the stories they tell give us a more nuanced understanding of US history and culture.

Let’s take a step back to think about the structure of archives, the inherent issues, and the questions that we—as archivists, scholars, students, and educators—should ask ourselves when engaging with historical collections. Archives help structure knowledge and history. Michel Foucault argues that history “now organizes the document” [with “document” being the archival] “divides it up, distributes it, orders it, arranges it in levels, establishes series, distinguishes between what is relevant and what is not, discovers elements, defines unities, describes relations” (146). Thus history, or perhaps more aptly, what we understand to be or call history, cannot be distinguished from the production and organization of the archive. Furthermore, national archives help to create an authoritative national narrative. The International Council on Archives, for example, describes archives on their webpage as follows:

Archives constitute the memory of nations and societies, shape their identity, and are a cornerstone of the information society. By proving evidence of human actions and transactions, archives support administration and underlie the rights of individuals, organisations and states. By guaranteeing citizens’ rights of access to official information and to knowledge of their history, archives are fundamental to identity, democracy, accountability and good governance.

Given this defined mission of archives, we can think about what archives do or are meant to do; they define:

  • “the nation,”
  • “history,”
  • what is—and what isn’t—considered “important,”
  • “knowledge.”

Read the full post here.

Editors’ Choice: Archivists as Peers in Digital Public History

Image of an open laptop on a desk

In the last 25 years we have seen the web enable new digital means for historians to reach broader publics and audiences. Over that same period of time, archives and archivists have been exploring and engaging with related strands of digital transformation. In one strand, similar focus on community work through digital means has emerged in both areas. While historians have been developing a community of practice around public history, archivists and archives have similarly been reframing their work as more user-centered and more closely engaged with communities and their records. A body of archival work and scholarship has emerged around the function of community archives that presents significant possibilities for further connections with the practices of history and historians. In a second strand, strategies for understanding and preserving digital cultural heritage have also taken shape. While historians have begun exploring using tools to produce new forms of digital scholarship, archivists and archives have been working to both develop methods to care for and make available digital material. Archivists have established tools, workflows, vocabulary and infrastructure for digital archives, and they have also managed the digitization of collections to expand access.

At the intersection of these two developments, we see a significant convergence between the needs and practices of public historians and archivists. Historians’ new forms of scholarship increasingly function as forms of knowledge infrastructure. Archivists work on systems for enabling access to collections are themselves anchored in longstanding commitments to infrastructure for enabling the use of records. At this convergence, there is a significant opportunity for historians to begin to connect more with archivists as peers, as experts in questions of the structure and order of sources and records.

In this essay we explore the ways that archives, archivists, and archival practice are evolving around both analog and digital activities that are highly relevant for those interested in working in digital public history.

 

Read the full piece here.

Editors’ Choice: Do topic models warp time?

Graph of the pace of change in fiction between 1885 and 1984 using topic models

Recently, historians have been trying to understand cultural change by measuring the “distances” that separate texts, songs, or other cultural artifacts. Where distances are large, they infer that change has been rapid. There are many ways to define distance, but one common strategy begins by topic modeling the evidence. Each novel (or song, or political speech) can be represented as a distribution across topics in the model. Then researchers estimate the pace of change by measuring distances between topic distributions.

In 2015, Mauch et al. used this strategy to measure the pace of change in popular music—arguing, for instance, that changes linked to hip-hop were more dramatic than the British invasion. Last year, Barron et al. used a similar strategy to measure the influence of speakers in French Revolutionary debate.

I don’t think topic modeling causes problems in either of the papers I just mentioned. But these methods are so useful that they’re likely to be widely imitated, and I do want to warn interested people about a couple of pitfalls I’ve encountered along the road.

One reason for skepticism will immediately occur to humanists: are human perceptions about difference even roughly proportional to the “distances” between topic distributions? In one case study I examined, the answer turned out to be “yes,” but there are caveats attached. Read the paper if you’re curious.

In this blog post, I’ll explore a simpler and weirder problem. Unless we’re careful about the way we measure “distance,” topic models can warp time. Time may seem to pass more slowly toward the edges of a long topic model, and more rapidly toward its center.

 

Read the full post here.

Editor’s Choice: Mapping search data from Google Trends in R

This is a quick introduction on how to get and visualize Google search data with both time and geographical components using the R packages gtrendsR, maps and ggplot2. In this example, we will look at search interest for named hurricanes that hit the U.S. mainland and then plot how often different states search for “guns.”

Source: Mapping search data from Google Trends in R

Editors’ Choice: Using pyLDAvis with Mallet

Image of a computer keyboard

One useful library for viewing a topic model is LDAvis, an R package for creating interactive web visualizations of topic models, and its Python port, PyLDAvis. This library is focused on visualizing a topic model, using PCA to chart the relationship between topics and between topics and words in the topic model. It is also agnostic about library you use to create the topic model, so long as you extract the necessary data in the correct formats.

While the python version of the library works very smoothly with Gensim, which I have discussed before, there is little documentation for how to move from a topic model created using MALLET to data that can be processed by the LDAvis library. For reasons that require their own blog post, I have shifted from using Gensim for my topic model to using MALLET (spoilers: better documentation of output formats, more widespread use in the humanities so better documentation and code examples generally). But I still wanted to use this library to visualize the full model as a way of generating an overall view of the relationship between the 250 topics it contains.

The documentation for both LDAvis and PyLDAvis relies primarily on code examples to demonstrate how to use the libraries. My primary sources were a python exampleand two R examples, one focused on manipulating the model data and one on the full model to visualization process. The “details” documentation for the R library also proved key for trouble-shooting when the outputs did not match my expectations. (Pro tip: word order matters.)

Looking at the examples, the data required for the visualization library are:

  • topic-term distributions (matrix, phi)
  • document-topic distributions (matrix, theta)
  • document lengths (number vector)
  • vocab (character vector)
  • term frequencies (number vector)

One challenge is that the order of the data needs to be managed, so that the terms columns in phi, the topic-term matrix, are in the same order as the vocab vector, which is in the same order as the frequencies vector, and the documents index of theta, the document-topic matrix, is in the same order of the document lengths vector.

 

Read the full post here.

Editors’ Choice: Knowledge in 3D – How 3D data visualization is reshaping our world

Image of an open laptop on a desk

­How is humanities and social science knowledge impacted by the introduction of three-dimensional visualization technologies? While 3D visualization may seem far removed from the everyday work of scholars in the social sciences and humanities, it has great potential to change how we conduct and communicate our work.

Three-dimensional visualizations can be used for creating models, supplementing maps, developing games, printing objects, developing virtual environments, enhancing telecommunications, and housing simulations. They can be used to support retrospective and prospective analysis, exploration of counterfactuals, and representation of hybrid or alternate realities, particularly when they combine objects in 3D contexts. An art historian might want to understand how an artifact was perceived in context, or how a built structure looked in earlier eras, or to document an installation or exhibition. An archeologist might use 3D models or prints to complete a broken artifact or to reassemble a ruin. A sociologist might develop agent-based modeling in a 3D space to understand the social dynamics in a given location. A historian might explore 3D viewsheds to determine lines of sight and power. A linguist might construct a virtual environment for language learning. A literary scholar might build out a navigable imagined space as a form of nonlinear literary criticism. A statistician might display data in 3D infographics to aid in interpretation. And of course, artists, architects, and designers of all stripes might use 3D to create new objects and environments as well as use such techniques as a way to study those that already exists. All of these researchers in turn might communicate their work through multimodal, immersive, affective visualizations for public outreach, policy impact, or funding solicitations.

Although the technologies used to create them are daunting at first, these visualizations are becoming increasingly accessible to nonspecialist users, and the underlying conceptual approaches that they highlight are not new to the disciplines where they’re now being used. Designing and representing 3D space and objects in 2D images, text, and other forms comes naturally to us in many fields. Maps, plans, and networks fill the pages of social science research. Where and how people think, live, work, and interact are contextualized in historical and contemporary places, spaces, environments, and geographies. Artists and architects build their maquettes and design their structures and installations. The dimensional space of the stage, performance hall, or theater is a key component of the production. Lighting, acoustics, and movement are all part of the process. Museums and cultural heritage institutions have taken advantage of the rhetorical power of 3D for their interpretative exhibits for years.

 

Read the full post here.