Two years ago, when I first grabbed the transcripts of the TED talks, using wget, I relied upon the wisdom and generosity of Padraic C on StackOverflow to help me use Python’s BeautifulSoup library to get the data out of the downloaded HTML files that I wanted. Now that Katherine Kinnaird and I have decided to add talks published since then, and perhaps even go so far as to re-download the entire corpus so that everything is as much the same as possible, it was time for me to understand how BeautifulSoup (hereafter BS4) works for myself.

