Work Package 8
BIBLINK.Checksum Algorithm
|
|
This document proposes an algorithm for calculating the
BIBLINK.Checksum
MD5 message digest for HTML resources.
In HTML the following tags are
considered as 'inline', i.e. the things that they point to can be
considered to be part of the current resource (in some sense).
-
<APPLET>
-
<EMBED>
-
<OBJECT>
-
<IMG>
-
<IMAGE>
There is also the issue of external style-sheets that might be referenced
via a LINK tag however these are not considered here.
The following algorithm is proposed:
-
Retrieve the HTML page from the Web.
-
Strip out any embedded META tags
(everything between opening '<META' and closing '>').
-
Compute MD5 hash.
-
Retrieve any 'inline' resources referenced by the tags above.
-
Compute MD5 hash for each.
-
Combine all hashes by concatenating them together in the order that they appear
in the page (the page's MD5 hash first).
-
Compute MD5 hash of the combination.
A
Web CGI based tool
that implements this algorithm is available.
Page maintained by: Andy Powell
Last updated: 8-June-1998
[BIBLINK]