Resource: Writing a Simple Web Spider Using Command Line Tools in Linux

A spider (or ‘crawler’ or ‘bot’) is a program that downloads a page from the Internet, saves some or all of the content, extracts links to other webpages, then retrieves and processes those in turn…Here we will develop a surprisingly simple Bash script to explore and visualize a tiny region of the WorldCat Identities database.

View full post here.