When I made my first edit to Wikipedia a few years ago I can remember watching the recent changes page to see my contribution pop up. I was shocked to see just how quickly my edit was swept up in the torrent of edits that are going on all the time. I think everyone who googles for topical information is familiar with the experience of having Wikipedia articles routinely appear near the top of their search results. In hindsight it should’ve been obvious, but the level of participation in the curation of content at Wikipedia struck me as significant…and somehow different. It was wonderful to see living evidence of so many people caring to collaboratively document our world.

The Obsession

I work as a software developer in the cultural heritage sector, and often find myself building editing environments for users to collaboratively create and edit content. These systems typically get used here and there; but they in no way compare to the sheer volume of edit activity that Wikipedia sees from around the world, every single day. I guess I’d read about crowdsourcing, but had never been provided with a window into it like this before. My wife encourages her 5th grade students to think critically about Wikipedia as an information source. One way she has done this in the past was by having them author an article for their school, which didn’t have an article previously. I wanted to help her and her students see how they were part of a large community of Wikipedia editors; and to give them a tactile sense of the amount of people who are actively engaged in making Wikipedia better.

A few months later Georgi Kobilarov let me know about the many IRC channels where various bits of metadata about recent changes in Wikipedia are announced. Georgi told me about a bot that the BBC run to track changes to Wikipedia, so that relevant article content can be pulled back to the BBC. I guess a light bulb turned on. Could I use these channels to show people how much Wikipedia is actively curated, without requiring them to reload the recent changes page, connect to some cryptic IRC channels, or dig around in some (wonderfully) detailed statistics. More importantly, could it be done in a playful way?

The Apps

Some more time passed and I came across some new tools (more about these below) that made it easy to throw together a Web visualization of the Wikipedia update stream. The tools proved to be so much fun that I ended up making two.

wikistream displays the edits to 38 language wikipedias as a river of flowing text. The content moves by so quickly that I had to add a pause button (the letter p) in order to test things like clicking on the update to see the change that was made. The little icons to the left indicate whether the edit was made by a registered Wikipedia user, an anonymous user, or a bot (there are lots of them). After getting some good feedback on the wikitech-l discussion list I added some knobs to limit updates to specific languages and types of user, or size of the edit. I also added a periodically updating background image based on uploads to the Wikimedia Commons.

The second visualization app is called wikipulse. Dario Taraborelli of the Wikimedia Foundation emailed me with the idea to use the same update data stream I used in wikistream to fuel a higher level view of the edit activity using the gauge widget in Google’s Chart API. To the left is one of these gauges which displays the edits per minute on 36 wikipedia properties. If you visit wikipulse you will also see individual gauges for each language wikipedia. It’s a bit overkill seeing all the gauges on the screen, but it’s also kind of fun to see them update automatically every second relative to each other, based on the live edit activity.

The Tools


For both of these apps I needed to log into the wikimedia IRC server, listen on ~30 different channels, push all the updates through some code that helped visualize the data in some way, and then get this data out to the browser. I had heard good things about node for high concurrency network programming from several people. I ran across a node library called socket.io that reported to make it easy to stream updates from the server to the client, in a browser independent way, using a variety of transport protocols. Instinctively it felt like the pub/sub model would also be handy for connecting up the IRC updates with the webapp. I had been wanting to play around with the pub/sub features in redis for some time, and since there is a nice redis library for node I decided to give it a try.

Like many web developers I am used to writing JavaScript for the browser. Tools like jQuery and underscore.js successfully raised the bar to the point that I’m able to write JavaScript and still look myself in the mirror in the morning. But I was still a bit skeptical about JavaScript running on the server side. The thing I didn’t count on was how well node’s event driven model, the library support (socket.io, redis, express), and the functional programming style fit the domain of making the Wikipedia update stream available on the Web.

For example here’s is the code to connect to the ~30 IRC chatrooms stored in the channels variable, and send all the messages to a function processMessage:

var client = new irc({server: 'irc.wikimedia.org', nick: config.ircNick});

client.connect(function () {
  client.join(channels);
  client.on('privmsg', processMessage);
});

The processMessage function then parses the IRC message into a JavaScript dictionary and publishes it to a ‘wikipedia’ channel in redis:

function processMessage (msg) {
  m = parse_msg(msg.params);
  redis.publish('wikipedia', m);
}

Then over in my wikistream web application I set up socket.io so that when a browser goes to my webapp it negotiates for the best way to get updates from the server. Once a connection is established the server subscribes to the wikipedia channel and sends any updates it receives out to the browser. When the browser disconnects, the connection to redis is closed.

var io = sio.listen(app);

io.sockets.on('connection', function(socket) {
  var updates = redis.createClient();
  updates.subscribe('wikipedia');
  updates.on("message", function (channel, message) {
    socket.send(message);
  });
  socket.on('disconnect', function() {
    updates.quit();
  });
});

Each update is represented as a JavaScript dictionary, which socket.io and node’s redis client transparently serialize and deserialize. In order to understand the socket.io protocol a bit more, I wrote a little python script that connects to wikistream.inkdroid.org, negotiates for the xhr-polling transport, and prints out the stream JSON to the console. It’s a demonstration of how a socket.io instance like wikistream can be used as an API for creating a firehose like service. Although I guess the example might’ve been a bit cleaner to negotiate for a websocket instead.

{
  'anonymous': False,
  'comment': '/* Anatomy */  changed statement that orbit was the eye to saying that the orbit was the eye socket for accuracy',
  'delta': 7,
  'flag': '',
  'namespace': 'article',
  'newPage': False,
  'page': 'Optic nerve',
  'pageUrl': 'http://en.wikipedia.org/wiki/Optic_nerve',
  'robot': False,
  'unpatrolled': False,
  'url': 'http://en.wikipedia.org/w/index.php?diff=449570600&oldid=447889877',
  'user': 'Moearly',
  'userUrl': 'http://en.wikipedia.org/wiki/User:Moearly',
  'wikipedia': '#en.wikipedia',
  'wikipediaLong': 'English Wikipedia',
  'wikipediaShort': 'en',
  'wikipediaUrl': 'http://en.wikipedia.org'
}

This felt so easy, it really made me re-evaluate everything I thought I knew about JavaScript. Plus it all became worth it when Ward Cunningham (the creator of the Wiki idea) wrote on the wiki-research list:

I’ve written this app several times using technology from text-to-speech to quartz-composer. I have to tip my hat to Ed for doing a better job than I ever did and doing it in a way that he makes look effortless. Kudos to Ed for sharing both the page and the software that produces it. You made my morning.

Ward is a personal hero of mine, so making his morning pretty much made my professional career.

I guess this is all a long way of saying what many of you probably already know…the tooling around JavaScript (and especially node) has changed so much, that it really does offer a radically new programming environment, that is worth checking out, especially for network programming. The event driven model that is baked into node, and the fact that v8 runs blisteringly fast, make it possible to write apps that do a whole lot in one low memory process. This is handy when deploying an app to an EC2 mini instance or Heroku, which is where wikipulse is running…for free.

Of course it helped that my wife and kids got a kick out of wikistream and wikipulse. I suspect that they think I’m a bit obsessed with Wikipedia, but that’s ok … because I kinda am.