Recently I undertook the task of making thunbnails of each and every website in the Newcastle Music Directory. It turned out to be a non-trivial excercise, taking a lot longer than I originally thought. Let me draw the distiction here between making a thumbnail of an image, and making a thumbnail of a website. The former involves reducing the dimensions of a single image so that it can be previewed at a smaller size whilst the latter involves rendering the componants of a webpage to produce a single image that represents what the user sees when browsing that page.
I explored several different options for doing this including the Thumbnail Grabber Application from the The University of Illinois Open Archives Initiative Metadata Harvesting Project, however this proved to be tricky to set up and I abandoned the idea of using that piece of software.
Another option was to manually use the Print Screen key while each webpage is being shown in the browser, but with 6500 links in the directory, this would be very time consuming indeed, especially if I were to repeat the task periodically as I intend to.
After almost giving up on the idea I stumbled upon the Page Snapshot class at phpclasses.org. I found that code to be bloated and cumbersome, but it did lead me along a path that lead to a solution. The Page Snapshot class executes a command vie the exec command in PHP that requests a given URL in a web browser, it then waits a predetermined amount of time and takes a screen capture using Hypersnap. I used this same process, rewriting the code to suit my needs.
Hypersnap can be invoked via the command line and the file save format and capture area can be specified via command line options, making it a suitable tool for my needs. Used in conjunction with NMBot my screen capture process obeys robots.txt rules.
My PHP script allowed 20 seconds for all of the pages elements to be rendered, so the entire process took about 36 hours to complete. The thumbnails will be used on the Newcastle Music Directory website in the near future.
Here’s a video showing the first 4000 or so thumbnails:
Posted on January 29th, 2006 in Site Notices