Editor’s Choice: SherlockNet: tagging and captioning the British Library’s Flickr images

This is an update on SherlockNet, our project to use machine learning and other computational techniques to dramatically increase the discoverability of the British Library’s Flickr images dataset. Below is some of our progress on tagging, captioning, and the web interface.

When we started this project, our goal was to classify every single image in the British Library’s Flickr collection into one of 12 tags — animals, people, decorations, miniatures, landscapes, nature, architecture, objects, diagrams, text, seals, and maps. Over the course of our work, we realised the following:

  • We were achieving incredible accuracy (>80%) in our classification using our computational methods.
  • If our algorithm assigned two tags to an image with approximately equal probability, there was a high chance the image had elements associated with both tags.
  • However, these tags were in no way enough to expose all the information in the images.
  • Luckily, each image is associated with text on the corresponding page.

We thus wondered whether we could use the surrounding text of each image to help expand the “universe” of possible tags. While the text around an image may or may not be directly related to the image, this strategy isn’t without precedent: Google Images uses text as its main method of annotating images! So we decided to dig in and see how this would go.

Source: Read the full post here.