Newcastle Music Directory

Automating the Creation of Website Thumbnails

Recently I undertook the task of making thunbnails of each and every website in the Newcastle Music Directory. It turned out to be a non-trivial excercise, taking a lot longer than I originally thought. Let me draw the distiction here between making a thumbnail of an image, and making a thumbnail of a website. The former involves reducing the dimensions of a single image so that it can be previewed at a smaller size whilst the latter involves rendering the componants of a webpage to produce a single image that represents what the user sees when browsing that page.

I explored several different options for doing this including the Thumbnail Grabber Application from the The University of Illinois Open Archives Initiative Metadata Harvesting Project, however this proved to be tricky to set up and I abandoned the idea of using that piece of software.

Another option was to manually use the Print Screen key while each webpage is being shown in the browser, but with 6500 links in the directory, this would be very time consuming indeed, especially if I were to repeat the task periodically as I intend to.

After almost giving up on the idea I stumbled upon the Page Snapshot class at I found that code to be bloated and cumbersome, but it did lead me along a path that lead to a solution. The Page Snapshot class executes a command vie the exec command in PHP that requests a given URL in a web browser, it then waits a predetermined amount of time and takes a screen capture using Hypersnap. I used this same process, rewriting the code to suit my needs.

Hypersnap can be invoked via the command line and the file save format and capture area can be specified via command line options, making it a suitable tool for my needs. Used in conjunction with NMBot my screen capture process obeys robots.txt rules.

I decided that Firefox was the best web browser to use for the rendering process because of the incredible amount of control users have over the application compared to other browsers such as Internet Explorer or Opera. For example I removed the status bar, address bar and links bar to mazimise the render area. I used the fantastic Web Developer plugin to set the render area to be exactly 1000 pixels wide. Javascript was left running on but I prevented popup windows, resizing of windows, image animations, marquee text and blinking text using Firefox’s extensive options. Although I have Adblock installed I decided to leave advertising on (could have turned off using adblock) so that the thumbnails represent the true look and feel of the original website. Finally I disabled the vertical and horizontal scrollbars so that they would not appear in the screen capture.

My PHP script allowed 20 seconds for all of the pages elements to be rendered, so the entire process took about 36 hours to complete. The thumbnails will be used on the Newcastle Music Directory website in the near future.

Here’s a video showing the first 4000 or so thumbnails:

Posted on January 29th, 2006 in Site Notices

Comments are closed.

News Categories


© 1999-2013 : Legal : Privacy : Contact : About : Sitemap : RSS