By Mia Ridge, Patrick Murray-John | June 21, 2012
Mia Ridge explores the shape of Cooper-Hewitt collections. Or, “what can you learn about 270,000 records in a week?”
by Mia Ridge
Museum collections are often accidents of history, the result of the personalities, trends and politics that shaped an institution over its history. I wanted to go looking for stories, to find things that piqued my curiosity and see where they lead me. How did the collection grow over time? What would happen if I visualised materials by date, or object type by country? Would showing the most and least exhibited objects be interesting? What relationships could I find between the people listed in the Artist and Makers tables, or between the collections data and the library? Could I find a pattern in changing sizes of different types of objects over time – which objects get bigger and which get smaller over time? Which periods have the most colourful or patterned objects?
I was planning to use records from the main collections database, which for large collections usually means some cleaning is required. Most museum collections management systems date back several decades and there’s often a backlog of un-digitised records that need entering and older records that need enhancing to modern standards. I thought I’d iterate through stages of cleaning the data, trying it in different visualisations, then going back to clean up more precisely as necessary.
I wanted to get the easy visualisations like timelines and maps out of the way early with tools like IBM’s ManyEyes and Google Fusion Tables so I could start to look for patterns in the who, what, where, when and why of the collections. I hoped to find combinations of tools and data that would let a visitor go looking for potential stories in the patterns revealed, then dive into the detail to find out what lay behind it or pull back to view it in context of the whole collection.
Hacking on Cooper-Hewitt’s data release at THATCamp, Or, How to get me to work for free (*)
by Patrick Murray-John
For THATCamp Prime V, we tried out having a hackathon on a dataset. One suggested dataset was the Cooper-Hewitt data on github. I tried out putting it into an Omeka site and seeing what possibilities were there.
There were a couple things that I wanted to do as I pulled the data into Omeka. First, I wanted to map the Cooper-Hewitt data onto Dublin Core as best as I could. Sometimes this was a little tricky, since I wasn’t entirely sure of the correct mappings, and was working from just a small sample of what I saw in a tiny handful of real data that I browsed through.
…With an idea for a site and how it could display the data, I needed to start making the data work well with my ideas. Basically, I just wanted, for example, links to items related to the nineteenth century to show up together alphabetically. Clearly, starting with “Mid” would produce some problems when we talk about more than one century. “Mid 16th century” would show up close to “Mid 20th century”. Not the desired outcome.
This turned out to be a great chance to use Google Refine for real. I’d played with it on fake data, but now I had data that I cared about because it would go into a site with my name on it — and Cooper-Hewitt’s.