From the resource:
Two years ago, when I first grabbed the transcripts of the TED talks, using
wget, I relied upon the wisdom and generosity of Padraic C on StackOverflow to help me use Python’s
BeautifulSouplibrary to get the data out of the downloaded HTML files that I wanted. Now that Katherine Kinnaird and I have decided to add talks published since then, and perhaps even go so far as to re-download the entire corpus so that everything is as much the same as possible, it was time for me to understand how
BeautifulSoup(hereafter BS4) works for myself.
Read the full resource here.