Work Package 8

BIBLINK.Checksum Algorithm

The BIBLINK Project

This document proposes an algorithm for calculating the BIBLINK.Checksum MD5 message digest for HTML resources.

In HTML the following tags are considered as 'inline', i.e. the things that they point to can be considered to be part of the current resource (in some sense).

There is also the issue of external style-sheets that might be referenced via a LINK tag however these are not considered here.

The following algorithm is proposed:

  1. Retrieve the HTML page from the Web.
  2. Strip out any embedded META tags (everything between opening '<META' and closing '>').
  3. Compute MD5 hash.
  4. Retrieve any 'inline' resources referenced by the tags above.
  5. Compute MD5 hash for each.
  6. Combine all hashes by concatenating them together in the order that they appear in the page (the page's MD5 hash first).
  7. Compute MD5 hash of the combination.

A Web CGI based tool that implements this algorithm is available.


Page maintained by: Andy Powell
Last updated: 8-June-1998

[BIBLINK]